Benchmarks and Obscurantism: A "red" line that should not be crossed
ClickHouse criticizes Databricks for using non-reproducible benchmarks in its "Redshift 8x faster" claim, arguing that lack of full transparency—code, data, and configurations—misleads the industry and erodes trust.
Background
- ClickHouse is an open-source, column-oriented database management system (DBMS) designed for real-time analytics on large datasets. It competes with proprietary systems like Databricks' SQL engine.
- Databricks is a major data and AI company, built on Apache Spark, that offers a unified analytics platform. It recently published a benchmark called "Reyden" claiming strong performance for its query engine.
- ClickHouse is accusing Databricks of publishing a benchmark that is not reproducible, does not follow standard benchmarking practices, and is designed to mislead rather than inform — a practice sometimes called "benchmarketing" (benchmark as marketing).
- The "red line" in the title refers to Databricks' corporate color (its logo is red) and is a play on a "red line" as a boundary not to be crossed in ethical benchmarking.
- This matters because benchmarks heavily influence which database technologies enterprises adopt; if they are opaque or biased, buyers can make very expensive wrong decisions.