Understanding the data source view

The data source view is a very important concept in Analysis Services. You can think of it as an abstraction layer between the relational database and the multidimensional cube. You have the ability to define objects that do not exist in the database. The data source view provides a logical view of the database, so to Analysis Services, the objects appear to exist in the database. This includes new tables, columns, and even referential integrity or relationships between objects.

Tip

In many cases you do not define referential integrity down in the database layer when you work with data warehouses. The reason for this is that loading of the data is much easier without them. If you have defined foreign keys and primary keys, you need to ensure that you always load your objects in the correct order.

If your data warehouse does not contain referential integrity, it has to be created in the data source view.

You can also add calculations to existing tables. They could be simple concatenations of existing columns or contain more complex logic. Sometimes there is an advantage to add the calculations to the data source view as you will actually store the information in the multidimensional store. However, you also need to bear in mind that the calculations will be executed as part of the process step by the underlying data source, so it may cause an additional burden on the source system if they are very complex.

When you create your data source view you can either bind to tables or views that exist in the data warehouse itself; or you can create a named query. Think of a named query as a view down in the database but it only resides in Analysis Services. The reason why you would not just create a view could be that you are working against a database where you are not allowed to change or add any objects. Also remember, Analysis Services work against not just a SQL Server, it could also be that you are working against a source that does not have the concepts of views.

As you understand, this gives you a lot of flexibility and allows you to create a multidimensional model on top of a database schema that may be in the 3NF form.

A data source view can also be used to combine data from different data sources. This means that you can fetch data from several databases.

This brings up the question, do you really need a data warehouse in the first place? As described earlier in the chapter, one of the beneficial things with a data warehouse is to keep a history of data that will survive changes in the source systems. If you connect Analysis Services directly to the operational source systems, you will not be able to keep the history if one of the source systems is changed. In addition to this, you will have the issues of reporting load on the OLTP system, you will work against a schema that is non-optimal for BI use, and so on.

All Analysis Services objects use the data source view as their data source, this means that everything that other objects need access to need to be added to the data source view.

Creating a new data source view

To create a data source view against the AdventureWorksDW2012 connection, you need to follow these steps:

  1. Right-click on the Data Source Views folder in the Solution Explorer window.
  2. Choose to create a new data source view and click on the Next button.
  3. Specify that you want to use the data source that you created in the earlier step as the source and click on Next.
  4. Select to add the following tables to your data source view:
    • FactInternetSales
    • DimSalesTerritory
    • DimProduct
    • DimDate
    • DimCustomer
    • DimPromotion
    • DimProductSubcategory
      Tip

      An easy way to do this is to add the fact table first and then click on the button called Add Related Tables. If the database contains references between the tables, they will be added automatically.

  5. Click on the Next button and name the data source view, in this case you can leave the default name and click on Finish.

Now you should have a data source view that looks like the following screenshot:

As you can see, the data source view contains a simple star schema with one fact table called FactInternetSales that contains the different measures related to the internet sales. You have five dimension tables that are related to the fact table; the relationships are fetched from the database. As you can see, there are three different lines between the FactInternetSales table and the DimDate table. The reason for this is that you have three different dates that identify the record in the fact table; one when the order happened, one when it was due, and one when it was shipped.

There are also two tables related to the Product dimension, DimProduct and DimProductSubcategory. This means that this is actually not a true star schema but instead a snowflake schema.