Be a Dooper!

  • Home
  • Hadoop
  • Cassandra
  • Hadoop Reference
    • core-site.xml (Hadoop 2.7)
    • core-site.xml (Hadoop 2.6)
    • hdfs-site.xml (Hadoop 2.7)
    • hdfs-site.xml (Hadoop 2.6)
    • hive-site.xml (1.1)
    • hive-site.xml (1.0)
    • hive-site.xml (0.14)
    • mapred-site.xml (Hadoop 2.7)
    • mapred-site.xml (Hadoop 2.6)
    • oozie-site.xml (4.0)
    • yarn-site.xml (Hadoop 2.7)
    • yarn-site.xml (Hadoop 2.6)
    • Older Versions
  • About

Tag Archives: cloud

Accessing Google Storage from Spark

Posted on January 21, 2018 by dmontroy

It’s been a while since my last post at beadooper.  I have had my head in the clouds…really.  Cloud computing has transformed the big data landscape.  Originally the cloud was AWS, and then Azure came along, and now there’s not only the OpenStack stack, but also Google, Rackspace, IBM, and the list goes on.

I’ve already covered Azure and OpenStack in previous posts.  This post I’ll cover how to get to Google Storage from Google Cloud.  The pattern is similar to Azure and OpenStack in that you need a “connector” library and some configuration settings.  In this case, the connector doesn’t come with Hadoop, and you need to get it separately.

And now…Google Storage…

Continue reading →

Posted in Cloud, Spark | Tags: cloud, google, spark |

Accessing Swift storage from Spark

Posted on May 11, 2016 by dmontroy

The rise of cloud technologies has resulted in a drive toward the separation of compute resources from storage resources.  Two popular options are Swift (the OpenStack object storage platform) and Azure Blob Storage.  The combination of these technologies allows an organization to think about storage and compute as two different items and plan their budget spends accordingly, since the costs of compute resources tends to be way more expensive than storage resources.

This is the second of two posts discussing Spark and cloud storage.  This post will discuss using Spark to access OpenStack Swift storage.

Continue reading →

Posted in Cloud, Spark | Tags: cloud, openstack, spark |

Accessing Azure Blob Storage from Spark

Posted on April 14, 2016 by dmontroy

The rise of cloud technologies has resulted in a drive toward the separation of compute resources from storage resources.  Two popular options are Swift (the OpenStack object storage platform) and Azure Blob Storage.  The combination of these technologies allows an organization to think about storage and compute as two different items and plan their budget spends accordingly, since the costs of compute resources tends to be way more expensive than storage resources.

This is the first of two posts discussing Spark and cloud storage.  This post will discuss using Spark to access Azure blob storage, and the second will be focused on OpenStack Swift storage.

Continue reading →

Posted in Cloud, Spark | Tags: azure, cloud, spark |

How to Connect Hadoop to an Azure storage account

Posted on June 21, 2015 by dmontroy

With the 2.7.0 release of Hadoop, Hadoop now includes the ability to connect to a Windows Azure storage account.  There are plenty of advantages to this, including the ability to tap into the resources of Azure for storage.  To be fair to other cloud providers, there is also support for Amazon S3 and OpenStack Swift, but this post is specifically to discuss Azure. Continue reading →

Posted in Cloud, Hadoop | Tags: azure, cloud | Leave a comment |

Categories

  • Building (1)
  • Cassandra (2)
  • Cloud (4)
  • Flume (1)
  • Hadoop (14)
  • Hive (2)
  • MapReduce (1)
  • Oozie (1)
  • Other (1)
  • Spark (5)
  • YARN (3)

Archives

  • January 2018 (1)
  • May 2016 (1)
  • April 2016 (1)
  • December 2015 (2)
  • July 2015 (1)
  • June 2015 (1)
  • April 2015 (2)
  • January 2015 (2)
  • November 2014 (1)
  • October 2014 (1)
  • July 2014 (1)
  • June 2014 (1)
  • April 2014 (2)
  • March 2014 (1)
  • February 2014 (5)
© Be a Dooper!