#ducklake

[ follow ]
Data science
fromInfoQ
10 hours ago

DuckLake 1.0: Data Lake Format with SQL Catalog Metadata

DuckLake 1.0 introduces a data lake format that stores metadata in a SQL database, enhancing performance and simplifying operations compared to file-based systems.
fromTheregister
2 weeks ago

DuckDB uses RDBMS to tackle lakehouse 'small changes' issue

You make a small change to your table, adding a single row, and it affects data lake performance because, due to the way they work, a new file has to be written that contains one row, and then a bunch of metadata has to be written. This is very inefficient, because formats like Parquet really don't want to store a single row, they want to store a million rows.
Data science
[ Load more ]