Hive is one of the most popular components of the Hadoop ecosystem…a Hadoop system seems almost bare without it. It provides a good jump start with Hadoop, especially for those with previous SQL experience; however, as you grow in your experience with Hadoop, you’ll come to realize that it isn’t the most optimal tool for your Hadoop jobs. But that’s a story for another post…it remains a great way to get started with any kind of job in Hadoop. On to the instructions! Continue reading
By now I’ve shown you how to install a single node Hadoop cluster. This configures the cluster with HDFS and YARN functionality, but you may have noticed that submitting a MapReduce job doesn’t show anything in the YARN resource manager. If you are trying to understand how MapReduce interacts with YARN, this doesn’t help you…and it breaks the principle we’ve been trying to follow of trying to set up a cluster that works like a regular cluster that just happens to be on one node.
This post will show you the steps you need to set up MapReduce support in YARN in your cluster. Continue reading
Here’s how you can get started with your first Hadoop cluster. These instructions will walk you through the process of getting started with Hadoop using:
- A Linux server with OpenSuSE installed (12.3 was used here) in text mode
- Apache Hadoop 2.2.0
This will get you started using a single node cluster in pseudo distributed mode. The benefits of this approach is that it is quite similar to how a fully distributed Hadoop cluster will work, except it just happens to be running on only one server. Continue reading