Since its first deployment at Yahoo in 2006, HDFS has established itself as the defacto scalable, reliable and robust file system for Big Data. It has addressed several fundamental problems of distributed storage at unparalleled scales and with enterprise grade robustness.
As more and more enterprises adopt Apache Hadoop, it is becoming a unified central storage aka Data Lake for all kinds of enterprise data. Many of these storage use cases are for file storage for classic big data applications, where HDFS is the perfect fit. However, there are other use cases that need to extend Hadoop’s storage into new dimensions. An object store or a key-value store, for example, is such a use case that shares fundamental requirements of reliability, consistency and availability with HDFS, but needs different semantics, API and scalability. It is natural for Hadoop’s storage to evolve into a multi-faceted storage system that addresses such new storage use cases as well.
In this blog, we introduce a new initiative called Ozone, an object store, which extends HDFS beyond a file system, toward a more complete enterprise storage layer. Below we first cover a few common object store use cases, and then describe our approach.
An object store requires simpler semantics and API than a file system, but requires a much higher scale for number of objects. Object stores are particularly useful for users who need to store a large number of relatively smaller data objects. Haystack and Windows Azure Storage are examples of some work in this area.
The requirement for an object store emerges in many use cases, e.g., the backend storage for all the photos uploaded on Facebook, or for all the mail attachments in gmail would require a store that can scale up to 100s of billions of objects. Many enterprises have a large user base and need to store different objects of varying sizes for each user.
Object stores are popular in cloud environments to provide a persistent storage to services running on virtual machines with ephemeral local storage. Amazon’s S3 and Azure Storage are examples of popular object stores in public clouds. This model of public cloud usage is being replicated in the private clouds as well.
The IT CIO sees the Hadoop cluster’s storage and compute resources as a valuable infrastructure for running both Hadoop and non-Hadoop applications and services. This emerging trend of PaaS-on-Hadoop, propelled by YARN, opens up the Hadoop infrastructure to new use cases. Object store is a natural fit for the storage component of PaaS model.
Three distinctive properties of storage use cases make them apt for an object store, rather than the file system. First, they don’t need a file system’s arbitrarily nested directory structure. Second, they don’t need rich metadata per object like ACLs or access times. And third, the individual data objects vary greatly in size.
The project will be driven by the following guiding objectives:
HDFS architecture separates metadata management from data storage substratum as two independent layers. The file data is stored in the storage layer which consists of a cluster of thousands of storage servers called datanodes, while the metadata is stored in the file metadata layer which is a smaller set of federated servers called namenodes. This separation allows immense scale up in HDFS throughput where data reads and writes go directly from the storage disks to the application without involving the metadata servers.
HDFS’s distributed storage layer that stores file blocks is reliable, fault tolerant and highly scalable for both IO bandwidth and capacity. This layer uses replication for reliability. Originally, this storage layer supported only one (metadata) subsystem on the top, a single file system namespace. A few years ago, we generalized this powerful layer to allow many subsystems on top by a notion of a block pool. Each subsystem utilizing the storage layer can use one or more block pools for storing its data blocks. By introducing this abstraction to HDFS in Hadoop-2 to support federated namespaces, where each namespace uses a dedicated block pool, the block storage layer can serve several block pools and thus support a large number of namespaces.
Ozone takes HDFS block storage layer one-step further by supporting non-file system data as well.
Ozone stores objects identified by keys. The keys and objects are organized into independent collections called buckets. Each bucket has a user provided name. The keys are unique within a bucket.
Ozone’s metadata maps a bucket-name plus key to a block storing the object. Similar to the file system’s namespace metadata, the Ozone’s metadata system is built on top of the block storage layer, however with one distinction that Ozone’s metadata will be dynamically distributed to support large number of buckets. This is illustrated in the picture below:
The block structure in HDFS will be enhanced to allow storing hundreds of keys and objects. Such a structure will require an indexed look up within a block. A data block belongs exclusively to a bucket, which simplifies the authorization semantics.
This work will be done in Apache open source, and can be followed in HDFS-7240
We envision that HDFS will naturally evolve into a complete big-data centric enterprise storage system. We are thrilled to embark on this new project, which will add new capabilities to HDFS and Hadoop. Hortonworks remains committed to open source and to the mission and revolution of Apache Hadoop.