What is a data lake?
The YouTube video “Apache Iceberg: What It Is and Why Everyone’s Talking About It” by Tim Berglund explains data lakes really well in the opening minutes: https://www.youtube.com/watch?v=TsmhRZElPvM
Thanks but I don’t have the time to watch YouTube.
he explains
~40y ago invented data warehouse, where an ETL process overnight would collect data from smaller dbs into a central db (the data warehouse)
~15y ago, data lake (i.e. hadoop) emerged to address scaling and other things. Same idea but ELT instead of ETL: less focus on schema, collect the data into S3 and transform it later
A network file system with a database which can be done with a PC and sqlite, but you need a new term to sell the new thing so now it's data lake and data warehouse and 'blob storage'. How are the 'blobs' stored? Probably with blocks of a constant size linked together to form the the 'full blob'. If this sounds like file systems invented 50 years ago, you just don't understand, the difference is that this is very expensive.