OpenData – Open-Source and Object Store Native Databases

OpenData is an open-source database system designed natively for object stores like S3, providing high performance without local caching. It supports SQL, ACID transactions, and integrates with analytics tools Hive, Presto, and Spark.

Background

- OpenData is a new category of database designed to run natively on **object stores** (like Amazon S3, Cloudflare R2, or MinIO) instead of traditional block storage or local disks. This lets them scale to petabytes cheaply, read and write data directly as files/objects, and be fully open source. - They aim to replace or complement services like **Snowflake** (a proprietary cloud data warehouse) and **Databricks** (a unified analytics platform) by offering similar performance but without vendor lock-in or per-compute pricing. - The "OpenData" stack includes multiple databases (e.g., analytics query engines, time-series databases, document stores) all using object storage as their single source of truth. This is similar to the concept of a **data lakehouse** but with native object-store integration rather than relying on a separate storage layer like HDFS or Apache Iceberg. - Key projects include **ParadeDB** (Elasticsearch-compatible search for object stores), **Hydra** (columnar Postgres), **Lakekeeper** (a catalog service), and **GooseFS** (a caching layer for object stores).