Introduction to Data Warehousing: Definition, Concept, and Techniques

by | Jun 30, 2018 | Data Science

8 Min Read. |

We live in an age when technology is fast outpacing our thinking. We now think of newer tools and technologies to take care of our future needs. The data industry has come a long way since the earlier days of Data Warehousing. Today, data comes to us in various forms, and from multiple sources, unlike earlier days. The sources are not often disclosed, and the data needs to be sifted for meaningful information. The data engineer has taken the place of ETL developers, and DevOps has made its way into the data strategy. Data engineers work on platforms like Spark Architecture and Python. Algorithms have already forayed into Business Intelligence and decision making. Now, we can also extract data from multiple sources, before finding a pattern out of it. But before delving further, one should know what Data Warehousing is. This discussion is about the introduction to Data Warehousing and how it influences our lives.

Data Streaming

After a formal Introduction to Data Warehousing, I aim to offer an in-depth discussion of data warehousing concepts, including:

  • Data Warehousing Definition
  • Data Warehousing types and architecture
  • Introduction to Data mining and Data Warehousing (differences and inter-relation)
  • Introduction to Data Warehousing and Business Intelligence
  • Data Warehouse Appliances
  • The future of Data Warehouse

Data Warehousing Definition

Data Warehousing may be defined as a collection of corporate information and data derived from operational systems and external data sources. A data warehouse is designed with the purpose of inducing business decisions by allowing data consolidation, analysis, and reporting at different aggregate levels. Data is populated into the DW by extraction, transformation, and loading.

Register For a Free Webinar

Date: 13th Feb, 2021 (Saturday)
Time: 10:30 AM - 11:30 AM (IST/GMT +5:30)
  • This field is for validation purposes and should be left unchanged.

Data Warehousing incorporates data stores and conceptual, logical, and physical models to support business goals and end-user information needs. Creating a DW requires mapping data between sources and targets, then capturing the details of the transformation in a metadata repository. The data warehouse provides a single, comprehensive source of current and historical information.

Data mining vs Data Warehousing

Quite often people confuse between Data mining and Data Warehousing. Well, the two concepts are similar, they are not the same. The primary difference between data warehousing and data mining is that Data Warehousing is the process of compiling and organizing data into one common database, whereas data mining refers the process of extracting meaningful data from that database. The two concepts are interrelated; data mining begins only after data warehousing has taken place.

Data warehousing techniques and tools include DW appliances, platforms, architectures, data stores, and spreadmarts; database architectures, structures, scalability, security, and services; and DW as a service.

Introduction to Data Warehousing

How It Works

Data Warehousing combines information collected from multiple sources into one comprehensive database. To cite an example from the business world, I might say that data warehouse incorporates customer information from a company’s point-of-sale systems (the cash registers), its website, its mailing lists, and its comment cards. Data Warehousing may also consider confidential information about employee details, salary information, etc.

Companies use this information to analyze their customers. Data warehousing also related to data mining which means looking for meaningful data patterns in the huge data volumes and devise newer strategies for higher sales and profits.

Why It Matters

Companies with a dedicated Data Warehousing team think way ahead of others in product development, marketing, pricing strategy, production time, historical analysis, and forecasting and customer satisfaction. Though a slightly pricey option, it pays in the long run. However, data warehouses can also be very expensive to design and implement, and sometimes their construction makes them unwieldy.

Introduction to Data Warehousing Types

The three main types of Data Warehouses are:

  • Enterprise Data Warehouse
  • Operational Data Store
  • Data Mart

Enterprise Data Warehouse: Enterprise Data Warehouse is a centralized warehouse, which provides decision support service across the enterprise. It offers a unified approach to organizing and representing data. It also provides the ability to classify data according to the subject and give access according to those divisions.

Operational Data Store: Operational Data Store, also called ODS, is data store required when neither Data warehouse nor OLTP systems support organizations reporting needs. In ODS, Data warehouse is refreshed in real time. Hence, it is widely preferred for routine activities like storing records of the Employees.

Data Mart: A Data Mart is a subset of the data warehouse. It specially designed for specific segments like sales, finance, sales, or finance. In an independent data mart, data can collect directly from sources.

Data Warehouses and data marts are mostly built on dimensional data modeling where fact tables relate to dimension tables. This is useful for users to access data since a database can be visualized as a cube of several dimensions. A data warehouse allows a user to splice the cube along each of its dimensions.

For more insights, you may download discussions on introduction to Data Warehousing and data mining pdf online.

Register For a Free Webinar

Date: 13th Feb, 2021 (Saturday)
Time: 10:30 AM - 11:30 AM (IST/GMT +5:30)
  • This field is for validation purposes and should be left unchanged.

Data Warehousing Architecture

In general, Data Warehouse architecture is based on a Relational database management system server that functions as the central repository for informational data. In the data warehouse architecture, operational data and processing are separate from data warehouse processing. This central information repository is surrounded by several key components designed to make the entire environment functional, manageable, and accessible by both the operational systems that source data into the warehouse and by the end-user query and analysis tools.

Usually, a Data Warehouse adopts a three-tier architecture. The three-tier architecture of a data warehouse is discussed below.

  • Bottom Tier: The bottom tier of the architecture represents the data warehouse database server, also known as the relational database system. Back-end tools and utilities are made use of to feed data into the bottom tier. These back-end tools and utilities perform the Extract, Clean, Load, and refresh functions.
  • Middle Tier: The middle tier of a data warehouse lies the OLAP Server which is an extended relational database management system. The ROLAP maps the operations on multidimensional data to standard relational OLAP (MOLAP) model, which directly implements the multidimensional data and operations.
  • Top-Tier: This tier represents the front-end client layer. This layer holds the query tools and reporting tools, analysis tools and data mining tools.

The following diagram depicts the three-tier architecture of data warehouse:


data warehouse

Data Warehouse Appliances

Data Warehouse Appliances are a set of hardware and/or software tools for storing data. Businesses use data warehouse appliances to build a comprehensive and centralized data warehouse, which is a functional destination for all kinds of business data.

Data warehouse appliances and corporate data warehouses serve a number of common purposes related to competitive modern business. Storing huge volumes of customer data in data warehouses has a number of business benefits:

  • Cross-account indexing
  • Quick-recall customer histories
  • Better functional interactive voice response technology
  • More customized direct mailing or digital communications

Data Warehouse appliances provide building blocks for more capable business data warehouse systems.

Data management appliances offload data-intensive operations from a host computer. The offloaded workload may involve operational, specialized analytics, or archival processing. Examples of vendors providing data management appliances include ParAccel and Dataupia.

Introduction to Data Warehousing and Business Intelligence

A Data Warehouse may be described as a consolidation of data from multiple sources that is designed to support strategic and tactical decision making for organizations.  The primary purpose of DW is to provide a coherent picture of the business at a point in time.

Business Intelligence (BI), on the other hand, describes a set of tools and methods that transform raw data into meaningful patterns for actionable insights and improving business processes. This usually involves data preparation, data analytics, and data visualization.

However, BI tools greatly vary in capabilities, and while full-stack solutions are aimed to provide all three of these, many tools labeled as BI offers only analytics and visualization. BI tools require a data warehouse to work with unstructured data, as the tools have very limited data preparation capabilities. However certain full-stack Business Intelligence Analytics & Dashboard Software, such as Sisense, can provide end users with an end to end solution that does not require additional investment in data warehousing.

Business Intelligence

Business Intelligence is an umbrella term that is used interchangeably with Data Analytics or to describe a process which includes data preparation, analytics, and visualization. Data warehousing describes tools that take care of joining disparate data sources, cleaning the data and preparing it for analysis.

Register For a Free Webinar

Date: 13th Feb, 2021 (Saturday)
Time: 10:30 AM - 11:30 AM (IST/GMT +5:30)
  • This field is for validation purposes and should be left unchanged.

Know more about Business Intelligence tools. Read my earlier post on top Business Intelligence tools.

Data Analytics is often used for processing data, whether from a single or multiple sources, using statistical and mathematical tools in order to generate insights. Both data warehousing and data analytics can be seen as parts or stages of Business Intelligence, although BI and DA are often used interchangeably.

A career in data warehousing becomes more promising when you have a degree in Data Analytics. Your knowledge of both the worlds (of data analytics, which is related to business intelligence) and data warehousing (related to data management) sets you apart. You may apply for roles like data analyst, business analyst or technical program manager in top-notch companies.

Enroll for a Data Analytics course today, and find yourself in your dream company within a year or two. You may sign up or a basic or an advanced degree course in Data Analytics. Digital Vidya offers advanced courses in Data Science. Industry-relevant curriculums, pragmatic market-ready approach, hands-on Capstone Project are some of the best reasons for choosing Digital Vidya. In addition, one can also go for Data Scientist Course.

At the end of the day, I must say that organizations should adapt to the changing technology and demands of their customers. Companies need to focus more on being more agile, having a cloud adoption strategy and partner with an industry ETL expert that knows innovative data processes, as well as you, know your business objectives.

Register for FREE Orientation Class on Data Science & Analytics for Career Growth

Date: 13th Feb, 2021 (Saturday)
Time: 10:30 AM - 11:30 AM (IST/GMT +5:30)

  • This field is for validation purposes and should be left unchanged.

You May Also Like…


Submit a Comment

Your email address will not be published. Required fields are marked *