My First Hadoop Installation with Cloudera CDH4

Though it will be a bold step for me to talk about Big Data in this early stage of my learning but I believe my last few days experience on learning Big Data will be helpful for those newcomers in this Big Data world.

Big Data is a hot topic now and there are tons of resources available in the Internet, but it is very easy to get confused in your early stage of learning – meaning, where to start?

And I am sure for those millions of Windows Geeks like me!! It will be bit confusing to start learning this new topic. Though Microsoft has just started working on this topic with Hortonworks but they are far behind than the other Big Data solution provider exists in the market (especially solution provided in Windows environment).

Where to start

1. At first you should have some conceptual idea on Big Data technology like Hadoop and MapReduce and for this reason reading Hadoop in Practice and Hadoop The Definitive Guide are must read books.
2. Have a 64bit PC or Server. In my case I have used Windows 2008 R2 Hyper-V and installed Ubuntu 64 bit version as a virtual machine
3. My recommendation is to install the GUI mode Linux in your test environment

Installing Hadoop

Hadoop is an open source project from Apache foundation and there are several Hadoop distributions available on the market. You can either install Hadoop manually by downloading it from Apache (which is very complex process to configure) or you can use other automated Hadoop distribution available in the market.

Which Hadoop distribution to use?
Most of us who are dealing with Windows for a long time and avoided Linux like me, I am sure they will be interested to look into HortonWorks Hadoop distribution which has a confusing and yet dimmed tag with Microsoft. Though Microsoft is claiming they are working on Big Data with SQL Server 2012, I have some doubt whether they have a clear plan yet.

On the other hand Cloudera distribution is much more defined and they have the remarkable contribution in Hadoop development.
That is why my suggestion is to go with Cloudera.

Steps to follow
1. After installing Linux go to Cloudera.com and download Cloudera manager (Install CDH4 Automatically via Cloudera Manager) free edition with the installation documentation. Though there is a fully configured VM available but my suggestion is install it using the bin file so that at least you will have some idea what it is really doing.
2. This distribution is only compatible with 64 bit Linux version thus 64 bit Linux is must. And you should have a stable internet connection for installation purpose
3. After downloading the cloudera-manager-installer.bin, Open the terminal window and enter the below command. You should read the Cloudera Manager Free Edition Installation Guide – here I have only summarized the important topics.

sudo ./cloudera-manager-installer.bin

4. After completing the preliminary step the system will prompt you as above. Click on Close and open your browser and enter http://localhost:7180

5. Enter the user name and password (both are admin)

6. Select the host name for CDH cluster installation and click on continue

7. The system will start installing as below

8. After successful installation click on Continue

9. The System will show you which services it will install

10. For single node cluster choose All Services option

11. Review the configuration summery and click on Continue

12. The system will start configuring the services

13. While installing Hive the system will prompt you for Database details. In my case I have used embedded database. Please click on Test Connection button otherwise the Continue button will not be enabled.

14. After successful installation the system will show you the page as below

15. As you can see the above image my Hive installation was failed due to some unknown reason. And this is the first time Google failed to suggest me what is the reason behind this failure. 😉
what you can do in this situation is –
a) Delete the hive1 and its dependent services. You can do it by clicking the Action button available in the right side of the window
b) Click on the Action button associated with Cluster1-CDH4 and click on Add a Service

c) Select the service you want to install and click continue. The system will download the service installer from internet and will automatically install the service.

Well, seems my CDH4 cluster is in good health now

Online resource you can read

1. Big Data University
2. Coursera for online free courses
3. Apache Hadoop
4. Learn how to learn Hadoop

This is my learning experience on Hadoop so far, yet to learn many more things. Surely, I will share my learning progress in the coming days. And by the way, as I have mentioned at the beginning of my writing, the recommendations/comments I have expressed here are just my personal thought and any advice or correction will be very much appreciated.

One thought on “My First Hadoop Installation with Cloudera CDH4

Leave a comment