The Nimble File Format

Nimble (formerly known as ”Alpha”) is a new columnar file format for large
datasets created by Meta. Nimble is meant to be a replacement for file formats
such as Apache Parquet and ORC.
Watch this talk to learn more
about Nimble's internals.
Nimble has the following design principles:
-
Wide: Nimble is better suited for workloads that are wide in nature, such
as tables with thousands of columns (or streams) which are commonly found in
feature engineering workloads and training tables for machine learning.
-
Extensible: Since the state-of-the-art in data encoding evolves faster
than the file layout itself, Nimble decouples stream encoding from the
underlying physical layout. Nimble allows encodings to be extended by library
users and recursively applied (cascading).
-
Parallel: Nimble is meant to fully leverage highly parallel hardware by
providing encodings which are SIMD and GPU friendly. Although this is not
implemented yet, we intend to expose metadata to allow developers to better
plan decoding trees and schedule kernels without requiring the data streams