List of Open source tools for Data Engineering
Top-ranked OSS in DE
Data Integration
Apache Nifi
Airbyte
Meltano
Apache Inlong
Apache SeaTunnel
Storage
HDFS
Apache Ozone
Ceph
MinIO
Data Lake Platform
Apache Hudi
Apache Iceberg
Delta
Paimon
Note: Unified Data Lake — OneTable
Note: Lakehouse — Dremio
Event Processing
Kafka
Redpanda
Pulsar
Data Processing & Computation
Apache Spark
Apache Flink
Vaex
Ray
Dask
Polars
Database
OLTP
SQL — RDBMS(MySQL, Postgres), In Memory(Apache Ignite)
NoSQL — KV(Aerospike), Document (MongoDB), Graph(Neo4J), Multimodel(ArangoDb)
HTAP
NewSQL — stonedb, TiDB
OLAP
Oflline — Columnar(Databend), Time Series (TimeScale)
Realtime — Realtime OLAP (Druid, Pinot, Clickhouse, StarRocks), Search Engine, Streaming Database (Materialize, RisingWave)
Other notables: Doris, Kylin
Vector Databases
Chroma
Milvus
Weaviate
FAISS
Qdrant
Visualization
Superset
Rath
Redash
Metabase
Data Infrastructure
Kubernetes
Ambari
Workflow Management & DataOps
Airflow
Dagster
Kestra
Temporal
Mage
Windmill
DolphineScheduler
Monitoring
Prometheus + Mimir & Grafan +Loki
EFK
Metadata Management
Datahub
Amundsen
Marquez