Enter your search above to see Results

Latest News

What is a Data Lake?

A Data Lake (“DL”) is storage that can store large amounts of data. It stores every type of data in its native format with no fixed limits on size or number of files.

A data lake can hold structured data such as rows and columns from a relational databases. It can also hold semi-structured data, for example, CSV, logs, XML and JSON.   Finally it can store unstructured data, for example, emails, documents, PDFs and binary data like images, audio and video.  Current DL solutions include Azure Data Lake, Amazon S3’s cloud storage services or Apache Hadoop’s distributed file system.

Benefits of Using a Lake?

  • Data richness.  Ability to store many sources and types.  For example, text, audio, images and video.
  • Data Democratization. This is due to the lake making data available to the whole organization.
  • Storage in native format. A lake doesn’t need modeling when data is loaded.  Instead the data is molded when being explored for analytics.  Consequently, lakes offer flexibility to ask business questions and to gain insight.
  • Scalability. Lakes offer scalability at a modest price when compared to a traditional data warehouse.
  • Advanced Analytics A lake links large amounts of data to deep learning algorithms. As a result it helps with real-time decisions.
  • Complementary to existing data warehouse.  Warehouses and lakes can work together resulting in an integrated data strategy.

How do Warehouses Compare to Lakes?

Depending on the requirements, an organization may require a data warehouse or a data lake or both.  They serve different needs.

Characteristics Traditional Data Warehouse Modern Data Lake
Type of Data Relational data from transactional systems, databases, and business applications. Non-relational and relational data from many sources. For example, IoT devices, web sites, mobile apps, social media, and others.
Schema Designed prior to the warehouse implementation. Written at the time of analysis.
Price Performance Medium speed query results using high cost storage. Query results faster due to using low-cost storage.
Data Quality  Highly curated data that serves as the one version of the truth. Any data that may or may not be curated.
Users Business analysts. Data scientists, Data developers, and Business analysts.
Analytics Batch reporting, BI and visualizations. Machine learning, predictive analytics, data discovery and profiling.

 

 

If you would like to know some more then read about How JTA The Data Scientists does its work or have a look at some other FAQs.  You might also like to read Wikipedia’s article on Data lakes

You could also explore our case studies or whitepapers.

Latest Articles

JTA: The Data Scientists attends ENEMath’21 Power BI Best Practices Daniela Costa is a finalist for PWIT’s Data & Analytics Expert Award Ana and Daniela nominated for the PWIT Data & Analytics Expert Award

Categories

Latest News Case Studies Computer Conservation Events Latest News Whitepapers / Publications
Enquiry

See how we can make your data speak

Send an enquiry to us below

reCAPTCHA