Showing posts with label Windows Azure. Show all posts
Showing posts with label Windows Azure. Show all posts

Thursday, October 25, 2012

HOW TO INSTALL AND USE MICROSOFT HDINSIGHT (HADOOP ON WINDOWS)

HDInsight is Microsoft’s 100% Apache compatible Hadoop distribution, supported by Microsoft. HDInsight, available both on Windows Server or as an Windows Azure service, empowers organizations with new insights on previously untouched unstructured data, while connecting to the most widely used Business Intelligence (BI) tools on the planet. In this post we'll directory jump into the hands-on. But, if you want more on HDInsight, you can visit my another post here.

NOTE : OS used - Windows 7

So let's get started.

First of all go to the Microsoft Big Data page, and click on the Download HDInsight Server link (shown in the blue eclipse). You will see something like this :



Once you click the link it will guide you to the Download Center. Now, go to the Instructions heading and click on Microsoft Web Platform Installer.



This will automatically download and install all the required thing.

Once the installation is over open the  Microsoft Web Platform Installer and go to the Top Right corner of the Microsoft Web Platform Installer UI where you will find a Search Box. Type Hadoop in there. This will show you Microsoft HDInsight for Windows Server Community Technology Preview bar. Select it and click on install. And you are done.

NOTE : It may take some time to install all the necessary components depending upon your connection speed.

On successful completion of HDInsight you can find the Hadoop Command Line icon on your desktop. Also you will find a brand new directory named Hadoop inside your C drive. This indicates that everything was OK and you are good to go.

TESTING TIME

It's time now to test HDInsight.

Step1. Go to the C:\Hadoop\hadoop-1.1.0-SNAPSHOT\bin directory :
c:\>cd Hadoop\hadoop-1.1.0-SNAPSHOT\bin

Step2. Now, start the daemons using start_daemons.cmd :
c:\Hadoop\hadoop-1.1.0-SNAPSHOT\bin>start_daemons.cmd

It will show you something like this on your terminal :


This means that your Hadoop processes have been started successfully and you are all set.

Let us use few of the Hadoop Commands to get ourselves familiar with Hadoop.

1. List all the directories, sub-directories and file present in Hdfs. And we do it using fs -ls :



2. Create a new directory inside Hdfs.We use fs -mkdir to do that :



You would have become familiar with the Hadoop shell by now. But I would suggest to go to the official Hadoop Page and try more in order to get a good grip. HAPPY HADOOPING..!!


HDInsight, Hadoop For Windows

Now, here is a treat for those who want to go Hadoop way but don't love Linux that much. Microsoft is rolling out the first preview editions of its Apache Hadoop integration for Windows Server and Azure in a marriage of open source and commercial code, after a year of beta testing. And they call it HDInsight. HDInsight is available both on Windows Server ( or Windows 7) or as an Windows Azure service. HDInsight will empower organizations with new insights on previously untouched unstructured data, while connecting to the most widely used Business Intelligence (BI) tools.

Microsoft collaborated with Hortonworks to make this happen. Last year Microsoft had announced that it will integrate Hadoop into its forthcoming SQL Server 2012 release and Azure platforms, and had committed to full compatibility with the Apache code base. The first previews have been shown off at the Hadoop World show in New York and are open for download. HDInsight delivers Apache Hadoop compatibility for the enterprise and simplify deployment of Hadoop-based solutions. In addition, delivering these capabilities on the Windows Server and Azure platforms enables customers to use the familiar tools of Excel, PowerPivot for Excel and Power View to easily extract actionable insights from the data.

Microsoft also announced that it is going to expand partnership with Hortonworks, to give customers access to an enterprise-ready distribution of Hadoop with the newly released solutions. Having said that, I hope this Microsoft+Hortonwork relationship gets growing so that we keep on getting great things like HDInsight.

You can find more about HDInsight here. And if you are planning to give HDInsight a shot you can visit my post on this which shows how to install and start using Hadoop on windows using HDInsight.

How to work with Avro data using Apache Spark(Spark SQL API)

We all know how cool Spark is when it comes to fast, general-purpose cluster computing. Apart from the core APIs Spark also provides a rich ...