An introduction to data warehousing

Before actually getting started, there is an important concept that needs to be covered and that is data warehouse modeling. The data warehouse is a concept that has been around since the 1970s, it is a central database built for reporting that integrates data from disparate sources to a common location and a common schema. It also removes the direct connection between the source systems and the historical data. This means that if you change your source system, you will retain historical records and can decommission the old system.

The schema in a data warehouse in many cases is built in Third normal form (3NF) to ensure that data is only stored once to minimize the storage cost and make it easier to maintain.

Tip

For more information about 3NF, refer to http://en.wikipedia.org/wiki/3NF.

This is a strategy commonly referenced as an Inmon data warehouse coming from the father of data warehousing, Bill Inmon. Also have a look at http://inmoninstitute.com/about/index.

There is a drawback with 3NF, that is, it is not a model built for querying; it requires many joins in the queries to write the simplest report. So a simpler model of the data is often necessary. A common model built for querying is the dimensional model defined by Ralph Kimball and is available at http://www.kimballgroup.com/. The dimensional model defines those things that you want to measure and should be stored in a fact table. Around the fact table, you will have multiple dimension tables containing the things that you would like to slice the facts by.

In the preceding figure, you have the fact table defined containing each order row, you will have the measures such as order quantity and sales amount, as well as the keys referencing Product, Seller, Time, and Customer on each row of the table. The dimension tables contain attributes such as year, month, and day in the Time table, Customer name, address, and customer number in the Customer table. This design is often referenced to as a star schema.

Analysis Services is built with multidimensional modeling in mind and works best with data warehouses or data marts that use this technique; however, as you will later see, there is a possibility to work with all kinds of schemas in the database through the use of data source views in Analysis Services.

During the course of this book we will work with the AdventureWorks2012DW database. This is a data warehouse built for the fictitious company called Adventure Works Cycles. They have a data warehouse built using dimensional modeling with several fact tables containing the things that they want to measure in their business.

Tip

For a background on Adventure Works and their business, refer to the following description:

http://technet.microsoft.com/en-us/library/ms124825(v=SQL.100).aspx

An in-depth description of their data warehouse can be found at http://technet.microsoft.com/en-us/library/ms124623(v=sql.100).aspx.