What is a Data Lake?

A Data Lake (“DL”) is storage that can store large amounts of data. It stores every type of data in its native format with no fixed limits on size or number of files.

A data lake can hold structured data such as rows and columns from a relational databases. It can also hold semi-structured data, for example, CSV, logs, XML and JSON. Finally it can store unstructured data, for example, emails, documents, PDFs and binary data like images, audio and video. Current DL solutions include Azure Data Lake, Amazon S3’s cloud storage services or Apache Hadoop’s distributed file system.

Benefits of Using a Lake?

Data richness. Ability to store many sources and types. For example, text, audio, images and video.
Data Democratization. This is due to the lake making data available to the whole organization.
Storage in native format. A lake doesn’t need modeling when data is loaded. Instead the data is molded when being explored for analytics. Consequently, lakes offer flexibility to ask business questions and to gain insight.
Scalability. Lakes offer scalability at a modest price when compared to a traditional data warehouse.
Advanced Analytics A lake links large amounts of data to deep learning algorithms. As a result it helps with real-time decisions.
Complementary to existing data warehouse. Warehouses and lakes can work together resulting in an integrated data strategy.

How do Warehouses Compare to Lakes?

Depending on the requirements, an organization may require a data warehouse or a data lake or both. They serve different needs.

Characteristics	Traditional Data Warehouse	Modern Data Lake
Type of Data	Relational data from transactional systems, databases, and business applications.	Non-relational and relational data from many sources. For example, IoT devices, web sites, mobile apps, social media, and others.
Schema	Designed prior to the warehouse implementation.	Written at the time of analysis.
Price Performance	Medium speed query results using high cost storage.	Query results faster due to using low-cost storage.
Data Quality	Highly curated data that serves as the one version of the truth.	Any data that may or may not be curated.
Users	Business analysts.	Data scientists, Data developers, and Business analysts.
Analytics	Batch reporting, BI and visualizations.	Machine learning, predictive analytics, data discovery and profiling.

If you would like to know some more then read about How JTA The Data Scientists does its work or have a look at some other FAQs. You might also like to read Wikipedia’s article on Data lakes

You could also explore our case studies or whitepapers.

What is a Data Lake?

Benefits of Using a Lake?

How do Warehouses Compare to Lakes?

Latest Articles

Categories

See how we can make your data speak

Send an enquiry to us below

Get in touch today to discuss your data related requirements