BIG DATA Opportunity for Bangladeshi IT Service Providers

We missed BPO wave. We are still missing the cloud wave. Can we ride the Big Data wave, the next big opportunity in IT industry?
As per the research Big Data will reach $46.34 Billion by 2018.

The big player like India is already in the market, they are using their existing outsourcing channels to take a stake of this industry from their allies. InfoSys has created a BigData platform called BigDataEdge, similarly Wipro has developed a Big Data framework Called– ‘3B framework’. HCL, Happiest Minds, TATA all are grooming their skill to ride this wave.

What about Bangladesh?

We still don’t have a single professional training center in Bangladesh for IT training, who can deliver courses in a professional manner. Not a single University in the country is offering any Big Data courses. While this opportunity is massive, the shortage of skill people is a very big challenge.

On the other hand we have a strong young educated computer professions whom we can use to learn this topic. And the TELCOs- who are really dealing with huge amount of data should start sending their IT personnel for a training on this topic.

Huge opportunity of Skill Labor export

According to McKinsey, the U.S. alone faces a shortage of 140,000 to 190,000 analysts and 1.5 million managers who can analyze Big Data. To catch this opportunity, countries local universities should start offering Big Data courses, they can also join with other universities in the world who are offering those courses, to get an expert help.

Local IT training provider – like AMS, New Horizon should offer Big Data training to attract IT professionals who are already experienced and can jump into this new profession after getting a professional training. Training providers should look for a join venture with EMC, IBM, Cloudera, Hortonworks to make their Big Data courses available in Bangladesh.

Huge opportunity for IT Service Industry

The local IT Service companies like Brac IT Service Ltd., Computer Services Ltd. and many more are still struggling in IT Service industry. Those companies have a huge financial backup from their big brothers like Brac NGO, or Computer Services PC sales division but wonder to see still they are struggling to have a sustainable business portfolio in IT Service business.

I would request you all to explore outside world, have some joint business deal with market leaders in Big Data industry, like Teradata, IBM, EMC, Cloudera and Hortonworks. We don’t have the time or justification to reinvent the wheel, it is time to work with the big brothers shoulder by shoulder to learn and get a stake from them. Wipro the IT giant also did the same in 1988 by creating a joint venture with United States’ General Electric.

Hoping for a shiny Bangladesh!!

My First Hadoop Installation with Cloudera CDH4

Though it will be a bold step for me to talk about Big Data in this early stage of my learning but I believe my last few days experience on learning Big Data will be helpful for those newcomers in this Big Data world.

Big Data is a hot topic now and there are tons of resources available in the Internet, but it is very easy to get confused in your early stage of learning – meaning, where to start?

And I am sure for those millions of Windows Geeks like me!! It will be bit confusing to start learning this new topic. Though Microsoft has just started working on this topic with Hortonworks but they are far behind than the other Big Data solution provider exists in the market (especially solution provided in Windows environment).

Where to start

1. At first you should have some conceptual idea on Big Data technology like Hadoop and MapReduce and for this reason reading Hadoop in Practice and Hadoop The Definitive Guide are must read books.
2. Have a 64bit PC or Server. In my case I have used Windows 2008 R2 Hyper-V and installed Ubuntu 64 bit version as a virtual machine
3. My recommendation is to install the GUI mode Linux in your test environment

Installing Hadoop

Hadoop is an open source project from Apache foundation and there are several Hadoop distributions available on the market. You can either install Hadoop manually by downloading it from Apache (which is very complex process to configure) or you can use other automated Hadoop distribution available in the market.

Which Hadoop distribution to use?
Most of us who are dealing with Windows for a long time and avoided Linux like me, I am sure they will be interested to look into HortonWorks Hadoop distribution which has a confusing and yet dimmed tag with Microsoft. Though Microsoft is claiming they are working on Big Data with SQL Server 2012, I have some doubt whether they have a clear plan yet.

On the other hand Cloudera distribution is much more defined and they have the remarkable contribution in Hadoop development.
That is why my suggestion is to go with Cloudera.

Steps to follow
1. After installing Linux go to Cloudera.com and download Cloudera manager (Install CDH4 Automatically via Cloudera Manager) free edition with the installation documentation. Though there is a fully configured VM available but my suggestion is install it using the bin file so that at least you will have some idea what it is really doing.
2. This distribution is only compatible with 64 bit Linux version thus 64 bit Linux is must. And you should have a stable internet connection for installation purpose
3. After downloading the cloudera-manager-installer.bin, Open the terminal window and enter the below command. You should read the Cloudera Manager Free Edition Installation Guide – here I have only summarized the important topics.

sudo ./cloudera-manager-installer.bin

4. After completing the preliminary step the system will prompt you as above. Click on Close and open your browser and enter http://localhost:7180

5. Enter the user name and password (both are admin)

6. Select the host name for CDH cluster installation and click on continue

7. The system will start installing as below

8. After successful installation click on Continue

9. The System will show you which services it will install

10. For single node cluster choose All Services option

11. Review the configuration summery and click on Continue

12. The system will start configuring the services

13. While installing Hive the system will prompt you for Database details. In my case I have used embedded database. Please click on Test Connection button otherwise the Continue button will not be enabled.

14. After successful installation the system will show you the page as below

15. As you can see the above image my Hive installation was failed due to some unknown reason. And this is the first time Google failed to suggest me what is the reason behind this failure. 😉
what you can do in this situation is –
a) Delete the hive1 and its dependent services. You can do it by clicking the Action button available in the right side of the window
b) Click on the Action button associated with Cluster1-CDH4 and click on Add a Service

c) Select the service you want to install and click continue. The system will download the service installer from internet and will automatically install the service.

Well, seems my CDH4 cluster is in good health now

Online resource you can read

1. Big Data University
2. Coursera for online free courses
3. Apache Hadoop
4. Learn how to learn Hadoop

This is my learning experience on Hadoop so far, yet to learn many more things. Surely, I will share my learning progress in the coming days. And by the way, as I have mentioned at the beginning of my writing, the recommendations/comments I have expressed here are just my personal thought and any advice or correction will be very much appreciated.