S3 and HDFS04 May 2020 •
Cluster storage systems have, over the past decade, moved their gold standards from directory-oriented file-systems such as HDFS to object-stores such as AWS S3. The two storage models have been dissected & compared over & again from multiple perspectives 1 2. Again, based on your use-case, you might be more interested in a certain cross-section of differences between S3 & HDFS than other differences. I am not trying here to repeat the analyses.
I wrote this short, bullet-style compilation as a quick refresher for myself on ways S3 differs from HDFS; it is focused on the APIs & interactions Hadoop-like data-processing systems (such as Hadoop, Spark, or Flink ) might have with storage systems.
The S3 consistency model promises read-after-write consistency 3. The relaxed constraints of this model include:
- File delete and update operations may not immediately propagate. Old copies of the file may exist for an indeterminate time period.
- Directory operations: delete() and rename() are implemented by recursive file-by-file operations. They take time at least proportional to the number of files, during which time partial updates may be visible. If the operations are interrupted, the filesystem is left in an intermediate state.
Also, as an object store, there is no directory structure in S3. Hadoop-S3 clients mimic directory structure by:
- Creating a stub entry after a mkdirs call, deleting it when a file is added anywhere underneath
- When listing a directory, searching for all objects whose path starts with the directory path, and returning them as the listing.
- When renaming a directory, taking such a listing and asking S3 to copying the individual objects to new objects with the destination filenames.
- When deleting a directory, taking such a listing and deleting the entries in batches.
- When renaming or deleting directories, taking such a listing and working on the individual files.
The above discussion might give an impression that HDFS is POSIX-compliant. Neither HDFS or S3 are POSIX-compliant.
For HDFS, append-only semantics are the best known exception, but there are many other. For example, it also seems to lack support for extended attributes, and does not honour POSIX durability semantics (for instance, it buffers writes at the client when it should not).
The consequences of the above-listed differences include:
- Directory listing can be slow. 4
- The time to rename a directory is proportional to the number of files underneath it (directly or indirectly) and the size of the files. 5
- Directory renames are not atomic: they can fail partway through, and callers cannot safely rely on atomic renames as part of a commit algorithm.
- Directory deletion is not atomic and can fail partway through.
Use listFiles(path, recursive) for high performance recursive listings, whenever possible. ↩
The copy is executed inside the S3 storage, so the time is independent of the bandwidth from client to S3. ↩