The rise of cloud technologies has resulted in a drive toward the separation of compute resources from storage resources. Two popular options are Swift (the OpenStack object storage platform) and Azure Blob Storage. The combination of these technologies allows an organization to think about storage and compute as two different items and plan their budget spends accordingly, since the costs of compute resources tends to be way more expensive than storage resources.
This is the first of two posts discussing Spark and cloud storage. This post will discuss using Spark to access Azure blob storage, and the second will be focused on OpenStack Swift storage.