Enter your search above to see Results

View all FAQs

What is ETL?

We load data to a store using ETL.  It is a series of steps to collect data and to transform it according to business rules.  These are the three steps:

  1. Extraction. In other words, taking data from the source systems and importing it into a staging area. Each data source has its own set of characteristics that need to be managed.
  2. Transformation. In other words, cleaning and other procedures applied to the data to obtain accurate, complete, and unambiguous data.
  3. Loading. In other words, data is written from the staging area into the databases or warehouses.

Why use ETL?

ETL is the most effective approach to provide fast access to information. It allows organizations to analyze data that resides in multiple locations in a variety of formats. It increases efficiency and drives better business decisions.

What is the Best Tool for ETL?

There are several tools available. However, at JTA, we believe that using R programming instead of other classic ETL tools, provides significantly better data manipulation and is more efficient.

You might be interested to read the Wikipedia article on ELT which you can find here.

If you would like to know some more then read about How JTA The Data Scientists does its work or have a look at some other FAQs.

You could also explore our case studies or whitepapers.

What is a Data Repository?

A data repository is a collection of databases that manage and store varying data sets for analysis, sharing and reporting.

There are many different ways to store data that could all be described a data repository:

If you would like to know some more then read about How JTA The Data Scientists does its work or have a look at some other FAQs.

You could also explore our case studies or whitepapers.

What is data visualization?

Data visualization is the graphical and pictorial representation of information and data.  Using tables, graphs and maps, data visualization tools provide a comprehensive method to understand trends, correlations or patterns in data.

Why is visualization important?

Data proliferation has made it difficult to manage and benefit from it.  Data visualization is essential to portray this massive amount of information and make data-driven decisions.  Of course, data is only as good as your ability to understand and communicate it, which is why choosing the right visualization is essential.

What are the main benefits of using visualization in data science?

There are several benefits to using visualization, including:

  • Improved Insights. Data visualization allows us to spot data patterns, and correlations. Identifying these data relationships helps organizations focus on areas most likely to influence their most important goals.
  • Better and Faster Decision Making. By using graphical representations of information, businesses can draw conclusions from large amounts of data. And since, it’s significantly faster to analyze information in graphical format, businesses can address problems in a timelier manner.
  • Pinpoint emerging trends. By discovering trends visualization can provide businesses an edge over the competition, and ultimately affect the bottom line. It gives an easier path to identify outliers that affect product quality or customer churn, and address issues before they become bigger problems.
  • Meaningful storytelling. Using visual elements with an engaging narrative, will get the message across to your audience.

If you would like to know some more then read about How JTA The Data Scientists does its work or have a look at some other FAQs.

You could also explore our case studies or whitepapers.

What is a Data Warehouse?

A data warehouse is a repository of information.  It can hold logs, internal data or external data.  The records represent events or facts of a current or past period.

The benefits of using a Warehouse

Although it takes considerable time to design and implement a Data Warehouse, there are several benefits:

  • Enhanced Business Intelligence. Data from multiple sources in a single database, enable a central view across the organization.
  • Time efficient. Since users can quickly access data from several sources, they can rapidly make informed decisions. Besides that, executives are empowered and can query the data themselves with little or no support from IT, hence saving more time and money.
  • Enhanced Data Quality and Consistency.  Providing consistent descriptions, standards or even fixing incoherent or missing data, will improve data quality and consistency.
  • Historical Intelligence. A data warehouse stores large amounts of historical data, hence allowing the possibility for users to analyze trends and make future predictions.
  • High Return of Investment. Organizations that have implemented data warehouses and complementary BI systems can generate more revenue and save more money when compared to organizations that haven’t invested.

If you would like to know some more then read about How JTA The Data Scientists does its work or have a look at some other FAQs.

You could also explore our case studies or whitepapers.

What is OLAP?

OLAP or online analytical processing is a technology to organize data.  Its aim, therefore, is to support understanding and to create reports.

It is simply a way of making a new copy of the business data and storing it in a cube.  The cube stores the data differently.  It is stored in a way that is optimized for reporting.  Creating a new copy of the data means that reporting work won’t impact transactional systems.

Online Analytical Processing is the technology behind many Business Intelligence applications.  It allows users to analyze data in multiple dimensions, and provide the insight and understanding they need for better decision making.

OLAP

OLAP technology is one part of a larger ecosystem.  Data comes from a warehouse into the OLAP system. Subsequently, data flows from the OLAP system to mining and visualization tools.

You can read the Wikipedia article on online analytical processing here.

If you would like to know some more then read about How JTA The Data Scientists does its work or have a look at some other FAQs.

You could also explore our case studies or whitepapers.

 

 

Can we use databases for data science?

Data Science uses databases but there are other more modern options.  Examples include, data lakes and data warehouses.  It can be confusing to know what to do.

The main difference between a warehouse,  a lake and a database is easy to explain.    A relational database stores and organizes structured data from a single source.  For example, a transactional system.  By comparison data warehouses hold structured data from multiple sources.  Data lakes differ from both in that they store unstructured, semi-structured and structured data.

Additionally, databases are strictly controlled.  They have to be like this to guarantee that they don’t make mistakes in processing transactions.  For example, a database must always be able to reverse a transaction and, in the event of a power failure, recover perfectly.  These are great features but they add complexity to the system.  When we experiment with data we don’t want this complexity as it can slow down the work.  Lakes are much less controlled.

Relational databases are easy to build. However, relational databases don’t support unstructured data, or the vast amount of data being generated today.   Hence the emergence of the data warehouse and data lake options.

We still need databases for data science, however. For example, in JTA we use databases to store Master Data and to help us with data cleaning.  We also store the nicely structured output in a database before we generate reports.

If you would like to know some more then read about How JTA The Data Scientists does its work or have a look at some other FAQs.

You could also explore our case studies or whitepapers.

What is a Data Lake?

A Data Lake (“DL”) is storage that can store large amounts of data. It stores every type of data in its native format with no fixed limits on size or number of files.

A data lake can hold structured data such as rows and columns from a relational databases. It can also hold semi-structured data, for example, CSV, logs, XML and JSON.   Finally it can store unstructured data, for example, emails, documents, PDFs and binary data like images, audio and video.  Current DL solutions include Azure Data Lake, Amazon S3’s cloud storage services or Apache Hadoop’s distributed file system.

Benefits of Using a Lake?

  • Data richness.  Ability to store many sources and types.  For example, text, audio, images and video.
  • Data Democratization. This is due to the lake making data available to the whole organization.
  • Storage in native format. A lake doesn’t need modeling when data is loaded.  Instead the data is molded when being explored for analytics.  Consequently, lakes offer flexibility to ask business questions and to gain insight.
  • Scalability. Lakes offer scalability at a modest price when compared to a traditional data warehouse.
  • Advanced Analytics A lake links large amounts of data to deep learning algorithms. As a result it helps with real-time decisions.
  • Complementary to existing data warehouse.  Warehouses and lakes can work together resulting in an integrated data strategy.

How do Warehouses Compare to Lakes?

Depending on the requirements, an organization may require a data warehouse or a data lake or both.  They serve different needs.

Characteristics Traditional Data Warehouse Modern Data Lake
Type of Data Relational data from transactional systems, databases, and business applications. Non-relational and relational data from many sources. For example, IoT devices, web sites, mobile apps, social media, and others.
Schema Designed prior to the warehouse implementation. Written at the time of analysis.
Price Performance Medium speed query results using high cost storage. Query results faster due to using low-cost storage.
Data Quality  Highly curated data that serves as the one version of the truth. Any data that may or may not be curated.
Users Business analysts. Data scientists, Data developers, and Business analysts.
Analytics Batch reporting, BI and visualizations. Machine learning, predictive analytics, data discovery and profiling.

 

 

If you would like to know some more then read about How JTA The Data Scientists does its work or have a look at some other FAQs.  You might also like to read Wikipedia’s article on Data lakes

You could also explore our case studies or whitepapers.

What are the benefits of big data?

The benefits of big data can differ by industry. There are, however, common benefits from using big data.  For example, lower cost, reduced time and better competitive advantage.  Other benefits which may be possible include:

  • Recognize opportunities
  • Reduce customer churn
  • More business insights
  • Better planning and forecasting
  • Identify the root causes of cost

Unfortunately, there are also challenges with big data:

Integrating Data Sources for Big Data

Big data comes from a lot of different places.  Applications, social media, email, employee-created documents and others. It is very difficult to combine all that data effectively.  Unfortunately, most machine analysis algorithms expect homogeneous data to work properly.

Data Inconsistency

Big Data usually has information from many sources. Furthermore, the sources may be of varying reliability. Much of that data is unstructured, meaning that it doesn’t come from a database. Documents, photos, audio, videos and other unstructured data can be difficult to analyze.

Data Storage

As data grows in volume we need real-time techniques to decide what should be stored.  It is often not economically viable to store all the raw data. Companies must be good at curating their data.

Staffing for Benefits of Big Data

Many organizations are still new to big data. The skill set is not the same as that for business intelligence and data warehousing, for which most organizations have developed their skills.

Privacy and Data Ownership

Managing privacy effectively is both a technical and a sociological problem.  Also, the value of the data owned by an organization becomes important. Organizations are concerned with how to leverage this data, while keeping their data advantage.  Questions such as how to sell data without losing control are becoming important.

If you would like to know some more then read about How JTA The Data Scientists does its work or have a look at some other FAQs.

You could also explore our case studies or whitepapers.

Big Data is mentioned a lot. What exactly is it?

Big Data is more than just a large volume of data. It is a technology that allows you to capture, store, process, analyze and discern value. For example, Big Data allows one to acquire new knowledge at high speed.

The main characteristics inherent in Big Data are volume, variety and velocity. We call these three characteristics the three Vs:

  • Volume refers to the quantity of generated and stored data
  • Variety refers to the type and nature of the data, and
  • Velocity refers to the high speed at which the data is processed

However, there are researchers who claim that the three Vs are a too simplistic view of the concept.   Possible new Vs are:

  • Veracity which refers to data quality and value, and
  • Value which refers to the economic value of the data

All industries have applications for big data.

If you would like to know some more then read about How JTA The Data Scientists does its work or have a look at some other FAQs.

You could also explore our case studies or whitepapers.

Do you really make data speak?

Do we make data speak? Well, not literally of course!  All data of a certain size has a story to tell.  It will have trends, sudden movements, be able to show cause and effect interactions and perhaps explain human behaviour.  It is not always easy to discover the stories that the data can tell, but we have over 20 years’ experience in doing exactly that.  That is why we say, with confidence, that we make data speak.

If you would like to know some more then read about How JTA The Data Scientists does its work or have a look at some other FAQs.

You could also explore our case studies or whitepapers.

 

 

Can you facilitate or host offsite events?

When planning a complex Data Science project, it often helps to spend a few days preparing a detailed plan. One of the best ways to do this is by facilitated offsite events. We have facilitated offsite events in many parts of the world.  Should your team wish to come and visit us in Porto for an offsite event, we can offer conference facilities and accommodation in our beautiful winery and manor house.

If you would like to know some more then read about How JTA The Data Scientists does its work or have a look at some other FAQs.

You could also explore our case studies or whitepapers.

How large is a typical JTA team?

The choice of JTA Team depends on the problem at hand however our average project has a team of four to six people.  There is always the involvement of one of our partners and then we will have a core with a Data Engineering lead, Data Analysis Lead and other specialist resources.  The JTA team members will engage and disengage accordingly and we will call in additional help as needed.

If you would like to know some more then read about How JTA The Data Scientists does its work or have a look at some other FAQs.

You could also explore our case studies or whitepapers.

Do you offer an onsite service?

We are happy provide an onsite service by sending teams to work at our clients’ premises or we can work remotely from our offices. When clients have sensitive data and do not wish to risk data being misused or stolen, we will often suggest working onsite.  It is also common for JTA teams to travel to visit our clients anywhere in the world.  When we provide the majority of our service from our offices we will often have status meetings and planning sessions onsite.

If you would like to know some more then read about How JTA The Data Scientists does its work or have a look at some other FAQs.

You could also explore our case studies or whitepapers.

R Versus Python: Which is better for data analysis?

The choice of R versus Python is largely academic.  At JTA we prefer to use R although both languages are perfectly acceptable.  There are a few differences between the two which we can summarize here:

  • R has a much more extensive library of statistical packages and specialized techniques.
  • You can find R packages for a wide variety of disciplines, from Finance to Medicine to Meteorology.
  • Python is a general-purpose programming language, which can be used to write websites and applications whereas R is a Data Science tool.
  • R builds in data analysis functionality by default, whereas Python relies on packages.
  • Python currently has more packages for deep learning although this is changing.
  • R is better for data visualization with plotting being more customizable.
  • R is being integrated in a lot of mainstream products such as SQL and Power BI.

We also recommend using Microsoft Open R because of its multi threading features.

If you would like to know some more then read about How JTA The Data Scientists does its work or have a look at some other FAQs.

You could also explore our case studies or whitepapers.

 

How Do I Become a Data Scientist?

It is necessary to undertake a journey to become a data scientist.  One of the best ways to become a data scientist is to join a reputable data science provider like JTA.  Data Scientists need to solve problems in a logical and analytical way.  Mathematical ability is important, and you will need to understand some algebra, statistics and probability.  If you can handle calculus, then so much the better.

The main languages used for Data Science are Python and ‘R’.  You will need to learn at least one of these languages.

Next, you will need to understand how data are stored and manipulated, however pay careful attention to big data concepts and techniques.

If you would like to know some more then read about How JTA The Data Scientists does its work or have a look at some other FAQs.

You could also explore our case studies or whitepapers.

What is Data Science?

In 2012 the Harvard Business Review called it “The sexiest job of the 21st century”.  Some claim that it is nothing more than a sexed-up term for statistics and so a lot of confusion reigns.  We believe that Data Science is a merger of many traditional disciplines, bringing together statistics, processes, algorithms and machine learning.  This means that it can have different interpretations but at its heart data science is the extraction of knowledge from data.

https://en.wikipedia.org/wiki/Data_science

If you would like to know some more then read about How JTA The Data Scientists does its work or have a look at some other FAQs.

You could also explore our case studies or whitepapers.

What is the future of Data Science?

In the future of data science we will discover Causality without needing to understand the “why” or “how”.  As data volumes increase, we discover patterns that may trigger us to investigate why.  Data Science finds patterns so that humans solve problems that we didn’t know we had.  This has an immense impact on our lifestyles.  We will start to truly understand the impact on our lives, diets and behaviour.

If you would like to know some more about the future of data science then read about How JTA The Data Scientists does its work or have a look at some other FAQs.

You could also explore our case studies or whitepapers.

Enquiry

See how we can make your data speak

Send an enquiry to us below

reCAPTCHA