adastra22 6 days ago

What is a data lake?

2
szarnyasg 6 days ago

The YouTube video “Apache Iceberg: What It Is and Why Everyone’s Talking About It” by Tim Berglund explains data lakes really well in the opening minutes: https://www.youtube.com/watch?v=TsmhRZElPvM

adastra22 6 days ago

Thanks but I don’t have the time to watch YouTube.

dsp_person 5 days ago

he explains

~40y ago invented data warehouse, where an ETL process overnight would collect data from smaller dbs into a central db (the data warehouse)

~15y ago, data lake (i.e. hadoop) emerged to address scaling and other things. Same idea but ELT instead of ETL: less focus on schema, collect the data into S3 and transform it later

adastra22 5 days ago

Thank you!

simlevesque 6 days ago

It's your db but on s3.

CyberDildonics 5 days ago

A network file system with a database which can be done with a PC and sqlite, but you need a new term to sell the new thing so now it's data lake and data warehouse and 'blob storage'. How are the 'blobs' stored? Probably with blocks of a constant size linked together to form the the 'full blob'. If this sounds like file systems invented 50 years ago, you just don't understand, the difference is that this is very expensive.