List of Open source tools for Data Engineering

Amit Singh Rathore
1 min readFeb 10, 2024

Top-ranked OSS in DE

Data Integration

Apache Nifi
Airbyte
Meltano
Apache Inlong
Apache SeaTunnel

Storage

HDFS
Apache Ozone
Ceph
MinIO

Data Lake Platform

Apache Hudi
Apache Iceberg
Delta
Paimon

Note: Unified Data Lake — OneTable
Note: Lakehouse — Dremio

Event Processing

Kafka
Redpanda
Pulsar

Data Processing & Computation

Apache Spark
Apache Flink
Vaex
Ray
Dask
Polars

Database

OLTP

SQL — RDBMS(MySQL, Postgres), In Memory(Apache Ignite)
NoSQL — KV(Aerospike), Document (MongoDB), Graph(Neo4J), Multimodel(ArangoDb)

HTAP

NewSQL — stonedb, TiDB

OLAP

Oflline — Columnar(Databend), Time Series (TimeScale)
Realtime — Realtime OLAP (Druid, Pinot, Clickhouse, StarRocks), Search Engine, Streaming Database (Materialize, RisingWave)

Other notables: Doris, Kylin

Vector Databases

Chroma
Milvus
Weaviate
FAISS
Qdrant

Visualization

Superset
Rath
Redash
Metabase

Data Infrastructure

Kubernetes
Ambari

Workflow Management & DataOps

Airflow
Dagster
Kestra
Temporal
Mage
Windmill
DolphineScheduler

Monitoring

Prometheus + Mimir & Grafan +Loki
EFK

Metadata Management

Datahub
Amundsen
Marquez

--

--

Amit Singh Rathore

Staff Data Engineer @ Visa — Writes about Cloud | Big Data | ML