SSAS 2008 Unleashed - Chapter 8. Advanced Modeling

2012-8-16 15:10| 发布者: demo| 查看: 1890| 评论: 0

摘要: Chapter 8. Advanced Modeling In This Chapter Parent-Child RelationshipsAttribute DiscretizationIndirect DimensionsMeasure ExpressionsLinked Measure Groups In the previous chapters, we covered basic a ...

Indirect Dimensions

All the measure group dimensions we’ve discussed so far have been direct. Not only are they direct, they are also regular. Their granularity attributes directly define how data is loaded into the fact. (See the section “Granularity of Fact” in Chapter 7, “Measures and Multidimensional Analysis.”)

However, Analysis Services supports two types of measure group dimensions:

Direct, which are directly related to the fact
Indirect, which are bound to the fact through other dimensions or measure groups

Indirect dimensions do not define how data is loaded into the fact. Whether you include an indirect dimension in the fact affects only the logical representation of data in the cube. Adding or removing an indirect dimension doesn’t require reloading data in the fact. Because you don’t have to reload the data into the fact, you can change the number of dimensions you use to analyze the data in the measure group without reloading that data. We have three types of indirect dimensions:

Referenced dimensions
Many-to-many dimensions
Data-mining dimensions (not discussed in this book)

Referenced Dimensions

A dimension is a referenced one if its granularity attribute is in reality an attribute of another dimension. A common example might involve the Customer dimension in a measure group in which you need to analyze customer data according to the Geography dimension, which contains data about where customers live. Your measure group doesn’t contain any geographic data, but the Customer dimension contains the ZIP Code of each customer. You can use the ZIP Code attribute to bind the customer data to the Geography dimension, as long as the Geography dimension contains a ZIP Code attribute as shown in Figure 8.4. In this case, the Geography dimension is a referenced dimension of the Sales measure group by the Customer dimension.

Figure 8.4. The `Geography` dimension is a referenced dimension to the `Sales` measure group by the `Customer` dimension.

The most important difference in the definition of a referenced dimension and a regular (direct) dimension is the addition of two properties:

IntermediateCubeDimensionID defines the cube dimension that the referenced dimension can use to bind to the fact data (in our example, the Customer dimension).
IntermediateGranularityAttributeID defines an IntermediateCubeDimension attribute you can use as a source of data for the granularity attribute (ZIP Code) in the intermediate dimension.

These two properties, which distinguish a referenced dimension, make it possible to bind the dimension to the fact even though the granularity attribute is not bound to the fact.

In addition to indirect dimensions that are referenced, there are also direct referenced dimensions. A referenced dimension is one that has a value of Regular for its Materialization property. In the model, therefore, the data from the intermediate granularity attribute is loaded into the fact, thus extending the fact data. Analysis Services materializes the granularity attribute of the referenced dimension during measure group processing. In our example, the data for the ZIP Code attribute is added to the fact, making the Geography dimension bound to the fact just like any regular dimension (see Figure 8.5).

Figure 8.5. The `Geography` dimension is a materialized referenced dimension to the `Sales` measure group.

[View full size image]

When the value of the Materialization property is Indirect, data for the ZIP Code attribute isn’t loaded into the fact. Every time we need data about the geographical location of sales, the system searches the Customer dimension and maps the data in the Geography dimension to the value of the ZIP Code attribute of the Customer dimension. Now information about sales can be enriched by the information about country and province where the sale occurred.

In Listing 8.2 , we define a referenced dimension from the HR cube in our FoodMart 2008 database. The HR cube contains data about employees and their salaries. This cube has a Salary measure group that contains information about the salaries of employees by time. However, we also want to find data about salaries paid to employees according to the store where they work.

To serve both these needs, the system uses the Store dimension as a referenced dimension. The Employees dimension has data for each employee describing the store each employee works in. Therefore, the Employees dimension serves as the intermediate dimension through which information about the salaries of the employees is bound to information about the stores in which they work. We can now see how much employee salaries cost the company per store.

Note

We excluded from our example the parts of the definition that aren’t related to the point we’re making in this section. For example, we omitted the definitions of the dimensions.

Listing 8.2. Defining a Referenced Dimension

<Cube>
     <ID>HR</ID>
     <Name>HR</Name>
    <Dimensions />
    <MeasureGroups>
          <MeasureGroup>
          <ID>Salary By Day</ID>
          <Name>Salary</Name>
          <Measures>
               <Measure>
               <ID>Salary Paid - Salary By Day</ID>
               <Name>Salary Paid</Name>
               <Source>
                    <DataType>Double</DataType>
                    <Source xsi:type="ColumnBinding">
                    <TableID>salary_by_day</TableID>
                    <ColumnID>salary_paid</ColumnID>
                    </Source>
                </Source>
               </Measure>
          </Measures>
          <StorageMode>Molap</StorageMode>
          <ProcessingMode>Regular</ProcessingMode>
          <Dimensions>
               <Dimension xsi:type="RegularMeasureGroupDimension">
               <CubeDimensionID>Employees</CubeDimensionID>
               <Attributes>
                    <Attribute>
                    <AttributeID>Employee</AttributeID>
                    <KeyColumns>
                <KeyColumn>
                    <NullProcessing>UnknownMember</NullProcessing>
                         <DataType>Integer</DataType>
                         <Source xsi:type="ColumnBinding">
                              <TableID>salary_by_day</TableID
                              <ColumnID>employee_id</ColumnID>
                         </Source>
                         </KeyColumn>
                    </KeyColumns>
                    <Type>Granularity</Type>
                    </Attribute>
               </Attributes>
          </Dimension>
          <Dimension xsi:type="ReferenceMeasureGroupDimension">
               <CubeDimensionID>Store</CubeDimensionID>
               <Attributes>
                    <Attribute>
                    <AttributeID>Store</AttributeID>
                    <Type>Granularity</Type>
                    </Attribute>
               </Attributes>
     <IntermediateCubeDimensionID>Employees</IntermediateCubeDimensionID>
<IntermediateGranularityAttributeID>StoreId</IntermediateGranularityAttributeID>
               </Dimension>
               </Dimensions>
          </MeasureGroup>
     </MeasureGroups>
</Cube>

It’s possible to get to a point where you have a chain of referenced dimensions, when a dimension that is bound to the fact through another dimension is itself a referenced dimension, and so on and on. However, you need to be careful about creating loops of referenced dimensions.

Note

The creation of chains of indirect referenced dimensions is not supported by Analysis Services.

Indirect referenced dimensions don’t define how data is loaded into the fact, making it possible to change a dimension, or change whether the dimension is included in the measure group, without reloading the data. Members of the dimension are resolved at the time of the query, rather than upon loading the data. Even if you completely change the data of the dimension or change the structure of the dimension, the system won’t have to reload the data. The system also doesn’t index the measure group data by the indirect referenced dimension. Therefore, the performance of queries working with indirect referenced dimensions is worse than the performance of direct referenced or materialized dimensions. On the other hand, if the dimension data changes, it doesn’t require a re-indexing of the fact.

Many-to-Many Dimensions

The ability to define many-to-many dimensions significantly enriches modeling capabilities of Analysis Services. Dimension of this type can be bound to fact through the other fact (thus enabling you to overcome a known limitation of multidimensional modeling that prevents assignment of the same data to multiple members of one dimension). Many-to-many dimensions in Analysis Services let you solve multiple business problems (for example, managing a single account with multiple co-owners, making a demographic analysis of sales data where a single customer can belong to multiple demographic groups, revenue allocation between multiple producers, and many others. Let’s take a look at an example of a many-to-many dimension based on the FoodMart 2008 database. In this example, we want to analyze the sales of products to customers based on the warehouse availability of these products. In other words, we want to know how fast products currently stored in a certain warehouse are sold (and therefore might need to be requested from the supplier).

In Figure 8.6 , we look at the sales of products to customers in our store over a period of time. We define a simple measure group Sales with the following dimensions:

The Product dimension— Products sold in the store
The Customer dimension— Customers who bought a product
The Time dimension— Date of the purchase

Figure 8.6. Relationship between `Warehouse` and `Sales` facts by the `Product` dimension.

The measures available for analysis describe the count of the units of the product bought by customers and the amount paid for the products. This model makes it possible to produce an analysis of the products bought by our customers and a count of those products, and we can analyze all of that based on the date of that purchase. Now, let’s assume that we have another fact, Warehouse, that contains information about products in the warehouses from which the store gets its products. Therefore, the Warehouse fact will also have three dimensions:

The Product dimension— Products in the warehouses
The Warehouse dimension— Warehouse that contains the products
The Time dimension— Date the products arrived at the warehouse

The measure of this fact contains the count of a specific product that was delivered to the warehouse at a specific time.

Because our task is to analyze sales of products by the Warehouse dimension, we need to include this dimension in the Sales measure group. We can do so by using a many-to-many dimension. All we have to do is define the cube dimension (Warehouse) that we want to include in the measure group (Sales) as a many-to-many dimension, and then to specify the Warehouse measure group through which the Warehouse dimension joins with Sales data. We call such a measure group an intermediate fact. The intermediate fact, Warehouse, is specified by the MeasureGroupID property of the dimension definition (see Listing 8.3).

Listing 8.3. DDL for the `Warehouse` Dimension Included in the `Sales` Fact as a Many-to-Many Dimension

<Dimension xsi:type="ManyToManyMeasureGroupDimension">
     <CubeDimensionID>Warehouse</CubeDimensionID>
     <MeasureGroupID>Warehouse</MeasureGroupID>
</Dimension>

Defining the many-to-many dimension is easy; finding out what happens in the system then is not so easy. Let’s explore what happens. First, Analysis Services determines a common dimension for both the Sales and Warehouse facts. Through this dimension, Analysis Services will join two facts together. Analysis Services determines common dimensions by looking for measure group dimensions that have the same value for the CubeDimensionID property, and that aren’t many-to-many dimensions. In the scenario we’re working with, we have two common dimensions: Product and Time. The Product dimension defines this joint between the Sales fact and the Warehouse fact. However, we don’t want to use the Time dimension as a common dimension because the time of the purchase and the time of the product’s arrival at the warehouse are not related. Therefore, we have to use a role-playing dimension to define two different cube dimensions: Time of Sale and Time of Arrival. Now we have two different dimensions, one to use in each of the facts.

As it turns out, the Product dimension is common to both facts. The Time and Customer dimensions are regular dimensions of the Sales measure group. The Warehouse dimension is a regular dimension of the Warehouse measure group and a many-to-many dimension for the Sales measure group. The dimension that’s included as a many-to-many dimension should be present in the intermediate fact (the one the relationship is defined through). The common dimensions have to be either regular or referenced dimensions. Many-to-many dimensions can create complex chains, with facts being related to one fact through many other facts. Analysis Services determines whether there are loops in chains, and lets you know about it by generating an error message.

When you define relationships between measure groups that are defined by a many-to-many dimension, you can end up with common dimensions that have different granularity for different measure groups. If this happens, Analysis Services uses the deepest common granularity for the relationships between the measure groups.

Now let’s take a look how Analysis Services generates a response to the query that involves many-to-many dimension. For example, we introduce a question to determine the value of the sales of products delivered by a warehouse. If we ask for the total sales supplied by just one warehouse, Analysis Services iterates the sales by product, selects only records with the products the particular warehouse supplied, and sums the sales of those products.

If you were to ask for the sum of sales for each warehouse, Analysis Services would go through the same process for each warehouse. If you were to then add the sums of the warehouse sales to get the total sales of all warehouses, you would get a strange result. The sum of sales by warehouses will be larger than sales by the stores. Results differ because data from some records is included in the sum multiple times. For example, multiple warehouses can supply the same product, and sales of that product will be applied to sales of all the warehouses. Therefore, the summing (aggregating) of a measure using a many-to-many dimension doesn’t follow the rules of aggregation—that is, it’s not aggregatable, and not additive.

It’s not always easy to interpret the results of a query that uses many-to-many dimensions; in many cases, you can get quite unexpected results. If you look at the result of sales in our FoodMart example by the Warehouse dimension, the results don’t make sense. That’s because we built only one Time dimension in to the sample. You actually need two Time dimensions for products in relation to warehouses: the time the product arrives at the warehouse, and the time it was sold in a store. If you have only one Time dimension, that dimension is common and is applied to only the sales of products on the day those products arrived at the warehouse. That sales figure would be much less than expected; the data is unusable.

Using many-to-many dimensions for queries requires more system resources, especially when you have many common dimensions and their granularity is at a low level. Resolution of a query under these conditions requires that Analysis Services get all the records of the measure group at a granularity level common to all the common dimensions—and doing so can really eat up your system resources. For more information about query resolution involving many-to-many dimensions, see Chapter 30, “Architecture of Query Execution—Retrieving Data from Storage.”

The performance of queries that use many-to-many dimensions is worse than the performance of queries that use regular dimensions. Even if you have not explicitly used a many-to-many dimension in your query, Analysis Services uses the default member from this dimension. (For more information about the default members, see Chapter 5 , “Dimensions in the Conceptual Model,” and Chapter 10 , “MDX Concepts.”) There is a way to query a measure group that has a many-to-many dimension without performance degradation: by defining the direct slice for your many-to-many dimension. We explain the concept of the direct slice in the next section, “Measure Expressions.”

When it comes to loading data into a measure group and indexing it, many-to-many dimensions work pretty much the same way that indirect referenced dimensions work: The data in the measure group isn’t loaded by the dimension, and the measure group isn’t indexed by the dimension. Indeed, a many-to-many dimension is an indirect dimension. When an indirect dimension is changed, or added to, or removed from a measure group, the measure group doesn’t require reloading and re-indexing. There is no need to reload and re-index your measure group if there is a change in the intermediate measure group that binds the dimension to your measure group.

1 234 5 / 5 页下一页

鲜花

握手

雷人

路过

鸡蛋

本文导航

收藏分享

账号		自动登录	找回密码
密码			注册

SSAS 2008 Unleashed - Chapter 8. Advanced Modeling

Indirect Dimensions

Referenced Dimensions

Figure 8.4. The `Geography` dimension is a referenced dimension to the `Sales` measure group by the `Customer` dimension.

Figure 8.5. The `Geography` dimension is a materialized referenced dimension to the `Sales` measure group.

Listing 8.2. Defining a Referenced Dimension

Many-to-Many Dimensions

Figure 8.6. Relationship between `Warehouse` and `Sales` facts by the `Product` dimension.

Listing 8.3. DDL for the `Warehouse` Dimension Included in the `Sales` Fact as a Many-to-Many Dimension

本文导航

相关阅读

相关分类

SSAS 2008 Unleashed - Chapter 8. Advanced Modeling

Indirect Dimensions

Referenced Dimensions

Figure 8.4. The Geography dimension is a referenced dimension to the Sales measure group by the Customer dimension.

Figure 8.5. The Geography dimension is a materialized referenced dimension to the Sales measure group.

Listing 8.2. Defining a Referenced Dimension

Many-to-Many Dimensions

Figure 8.6. Relationship between Warehouse and Sales facts by the Product dimension.

Listing 8.3. DDL for the Warehouse Dimension Included in the Sales Fact as a Many-to-Many Dimension

本文导航

相关阅读

相关分类

Figure 8.4. The `Geography` dimension is a referenced dimension to the `Sales` measure group by the `Customer` dimension.

Figure 8.5. The `Geography` dimension is a materialized referenced dimension to the `Sales` measure group.

Figure 8.6. Relationship between `Warehouse` and `Sales` facts by the `Product` dimension.

Listing 8.3. DDL for the `Warehouse` Dimension Included in the `Sales` Fact as a Many-to-Many Dimension