设为首页收藏本站

 找回密码
 注册

QQ登录

只需一步,快速开始

BCM 门户 IT世界 资料备录 查看内容

SSAS 2008 Unleashed - Chaper 7 Measures and Multidimensional Analysis

2012-8-16 15:05| 发布者: demo| 查看: 825| 评论: 0

摘要: Chapter 7. Measures and Multidimensional Analysis In This Chapter Measures in a Multidimensional CubeMeasure GroupsMeasure Group Dimensions Measures are the values that you are going to analyze; they ...

Chapter 7. Measures and Multidimensional Analysis

In This Chapter

Measures are the values that you are going to analyze; they are united in the measure groups by their relationships to dimensions in the cube. Measures are ultimately the most important semantic part of the cube because they are based on the fact data (sometimes called facts) that you’re going to analyze. They usually contain numeric metrics that describe the results of business activity. In our Warehouse and Sales cube, we have created Store Sales, Unit Sold, and Store Sales measures that enable the analysis of sales in different stores.

Measures define which data is available for analysis, and in what form and by what rules that data can be transformed. The measure group defines how the data is bound to the multidimensional space of the cube.

All the measures in a cube constitute the Measures dimension. The Measures dimension is the simplest dimension in the cube. It has only one attribute—Measure—whose members are measures. The Measure attribute is not aggregatable and doesn’t have the ALL member. The Measure dimension has just one hierarchy. It exists in every cube and doesn’t require definition other than the definitions of the measures.

Measures in a Multidimensional Cube

Measures define the following:

  • How and what you will analyze in your model

  • The resources that will be used for your analysis

  • The precision of analysis that will be available

  • The speed with which the data will be loaded into the model and retrieved from it

In Data Definition Language (DDL), a measure is a minor object of the model, defined by the major object, the measure group. (We discuss measure groups later in this chapter.) When you edit a measure, the specifications for the whole measure group are sent to Analysis Services.

Making sure that you have the right set of measures in the model and the correct definition for each one is very important for the performance of the system. Table 7.1 describes the measure properties available in the conceptual model.

Table 7.1. Measure Properties
PropertyDescription
NameThe name of the measure. The name of a measure should be unique not only in the measure group, but also in the cube. As you would for any other name in the system, avoid using very long strings because they make the system less convenient and slow it down.
IDThe unique identifier used to reference the measure. The ID defines the key of the measure and, as opposed to the Name, can’t be changed when you edit the model.
DescriptionA description of the measure.
TranslationsDefines the translations for the name of the measure and the corresponding member in the Measure dimension to languages different from the language of the cube.
VisibleDetermines whether the measure is visible to the user of the application model. Turn off the Visible property when the measure is used only for internal computations.
SourceThe data specification for the measure (see Chapter 5, “Dimensions in the Conceptual Model”). The Source property defines all the information needed for the correct loading of measure data into the model.
AggregateFunctionDetermines how data is calculated for the multidimensional space from the data values of the fact space. The system supports four types of additive aggregations (Sum, Count, Min, and Max), nonadditive DistinctCount and None functions, and six types of semi-additive measures (FirstChild, LastChild, FirstNonEmpty, LastNonEmpty, AverageOfChildren, and ByAccount). (You can find information about semi-additive aggregations in Chapter 13, “Dimension-Based MDX Calculations.”)
DataTypeFor the most part, the Source property defines the data specification for the measure. However, for the Count and DistinctCount aggregation types, you need more to your definition. When you count something in a measure, you need to know the data type for the count. This data type is likely to differ from the type of data used for the items counted. For example, if you’re calculating the distinct count of a certain string, the count of strings would have an integer data type.
MeasureExpressionDefines the formula used to calculate the value of the measure, if this value is based on other measures. This formula can reference data from the same measure group or from a different measure group. We cover the MeasureExpression property in Chapter 8, “Advanced Modeling.”
DisplayFolderDetermines in which folder the measure should be included when it appears in the user application.
FormatStringDefines the format for displaying the measure data to the end user.
BackColorDefines the background color used in displaying the data.
ForeColorDefines the foreground color used in displaying the data.
FontNameDefines the font used in displaying the data.
FontSizeDefines the size of the font used in displaying the data.
FontFlagsDefines characteristics of the fonts used to display the data.

In your applications, you can also apply rules for the data format. For example, you could specify that values of sales over 200 should appear in bold. The client application retrieves the data formatting information through cell properties. For more information about how to extract cell properties, see Chapter 11, “Advanced MDX.” For more information about how to define custom formatting, see Chapter 12, “Cube-Based MDX Calculations,” and Chapter 13, “Dimension-Based MDX Calculations.”

Listing 7.1 shows the definition of a measure in XML. The default value for the AggregateFunction property is SUM. (It’s the one that is most commonly used.) If you use a different aggregation than SUM, you must specify it; you don’t need to specify SUM.

Note

Analysis Services has default values for all the measure properties.


Listing 7.1. A Definition of the Store Sales Measure

<Measure>
     <ID>Store Sales</ID>
     <Name>Store Sales</Name>
     <Source>
          <DataType>Double</DataType>
          <Source xsi:type="ColumnBinding">
               <TableID>dbo_sales_fact_1997</TableID>
               <ColumnID>store_sales</ColumnID>
          </Source>
     </Source>
</Measure>

The AggregateFunction and DataType measure properties define the behavior of the measure. These two properties are related to each other, so it’s a good idea to keep one property in mind when you define the other. Let’s look at each aggregation and the data type for it.

SUM

For the SUM aggregation, the value of the measure is calculated by adding together the data on the various levels. For example, the sales in our sample model are defined by summing; to determine the sum of sales for this year, we sum the sales made during each quarter of the year.

Any integer data type works for SUM, but it’s important to pay attention to the size of the data for this type so that you don’t get overflow from the summing of large numbers. When you use float numbers, you don’t usually get overflow, but performance can suffer and you can lose precision to the point that it affects the results of your analysis. For our Store Sales measure, the data type is Currency. The data maintains greater precision, and it won’t overflow; the Currency data type can hold huge numbers, enough to hold the sum of sales in most models.

Avoid using Date/Time types with SUM because the results can be unexpected and hard to understand. You can’t use the String data type with SUM aggregation.

MAX and MIN

The MAX and MIN aggregations calculate the minimum and maximum values, respectively, for the measure. For example, if you create the measure Min Units Sold with aggregation function MIN, to calculate its value for the month of December, Analysis Services will find the lowest Min Units Sold value of each day. These two types of aggregation can take the same data types that SUM does. What’s different is that these two never lose precision and never overflow. Other data types, such as Date/Time, work better with MAX and MIN than with SUM. You can’t add dates and times, but you can easily determine the minimum and maximum (that is, earliest and latest). When it’s calculating the maximum or minimum of the values of a measure, the system ignores NULL values for the MAX and MIN aggregations.

You can’t use the String data type with MAX and MIN aggregation.

COUNT

The COUNT aggregation calculates the number of records in which the measure value is not NULL. When it comes to the data type of the records you’re counting, it doesn’t really matter what that data type is. This aggregation function is merely counting the records (so long as they’re not NULL). Therefore, 100 records of one data type are equivalent to 100 records of any other data type. When you define the Source property of a measure, it’s best not to mess with the data type of the values; just leave it the same as the original data type in the relational table; it won’t affect the aggregation result.

You can use the String data type for measure values with COUNT aggregation, unlike with SUM, MIN, and MAX aggregations. It won’t cost any more resources because the system stores the number of records with no consideration for their values. When it comes to the data type of the count itself, the integer data types are preferred; choose one that minimizes the chance of overflow.

Our sample cube has a count of sales: Sales Count measure. It doesn’t matter that the value of the measure has a Currency data type as defined in the relational database.

DISTINCT COUNT

The DISTINCT COUNT aggregation calculates the number of records with unique values within the measure. For example, you can define DISTINCT_CUSTOMER measure that calculates the number of unique customers your store has had. Consider the fact table shown in Figure 7.1. It has three columns Customer_ID, Product_ID, and Time_ID. The Product_ID and Time_ID columns are used to build Product and Time dimensions. The Customer_ID column is used as a source of Distinct_Customer measure. To calculate value of this measure, Analysis Services scans over the Customer_ID column and counts the unique customer IDs. In our example, there are two unique customers: 1 and 2.

Figure 7.1. Fact table with column used to calculate a DISTINCT_COUNT measure.


One of the most important characteristics of the DISTINCT_COUNT measure is that you cannot calculate the value on the top level by summing values on the level below; the DISTINCT_COUNT measure is not aggregatable. In our example, our store has two unique customers, and therefore the value of the measure corresponding to the member full year of 1997 is 2. However, the value of the DISTINCT_CUSTOMER measure for January 1997 is also 2, and for February 1997 is 1 (see Figure 7.2).

Figure 7.2. DISTINCT_CUSTOMER measure by Time dimension.
 DISTINCT_CUSTOMER
19972
Jan 972
Feb 971

The value of a measure is stored in the format defined by the data specification in the Source property. The String data type is acceptable for this aggregation function, along with others we’ve discussed. The rules for comparing string data to determine whether a value is unique within the measure are defined by Collation in the data specification.

With DISTINCT_COUNT aggregation, your cost per record is greater than for other aggregation types because the system has to store the actual value for each record that has a unique value, including string values (which take 2 bytes per character and 6 bytes per string). This cost is multiplied by the number of records because the system has to keep the records for each unique value; that is, the system has not only to keep more in the record, but it also has to keep more records.

If the data specification dictates that NULL values be counted in the measure, the system reports all the records with NULL values as a distinct value. It takes a great deal of resources to store data for a DISTINCT COUNT aggregation, and many resources to access the data.

This can affect the performance of the system, as compared to other aggregation types such as COUNT. In a measure group, you can have only one measure with the aggregation function DISTINCT COUNT. If you have a measure with a DISTINCT COUNT aggregation in a measure group, we recommend that you avoid having a measure with any other aggregation function in the same measure group.

Measure Groups

Measure groups—in other words, fact data—define the fact space of the data in the cube. In many respects, they define the behavior of the physical data model. Facts lie on the border between the physical and conceptual models and define the following:

  • What data will be loaded into the system

  • How the data will be loaded

  • How the data is bound to the conceptual model of the multidimensional cube

Note

Don’t worry if you’re wondering when we decided that measure groups are the same as facts. We introduced you to the term measure group first, but it is essentially the same as a fact. Different uses, however, require one term or another. DDL, for example, uses measure group.


Measures of the same granularity are united in one measure group (fact). That granularity defines the dimensionality of the fact space and the position of the fact in the attribute tree of each dimension.

Granularity is a characteristic of a measure group that defines its size, complexity, and binding to the cube. In our sample cube Warehouse and Sales, we have two facts: One contains data about sales, the other one about the warehouses. Each of these facts has a set of dimensions, which define how the measures of a fact are bound to the cube. The Warehouse measure group has Product, Time, Store, and Warehouse dimensions. The Sales measure group has Product, Time, Store, Promotion, and Customer dimensions, as shown in Figure 7.3. Each measure group can aggregate data starting from a different level. For example, you can choose to keep daily sales data, but weekly or monthly warehouse information. We go into greater detail about granularity in a later section, “Measure Group Dimensions.” Having different measure groups within the same cube allows you to analyze data related to each measure group independently and data shared between the measure groups together in the same query. For example, you can request information about units of a product sold in a store and units ordered from warehouse for this store.

Figure 7.3. Cube with two measure groups


In DDL, a measure group (fact) is a major object and, therefore, is defined and edited independently of the other objects in the system. As any other major object, a measure group has ID, Name, Description, Annotations, and Translations properties. A measure group doesn’t have its own language; it takes on the language property of the cube to which it belongs. In addition, a measure group contains time and date stamps that indicate when it was created and loaded with data.

Of the many properties of a measure group, some belong to the physical model: those that define what data is loaded into the cube and how it is stored there. We discuss those properties in Chapter 20, “Physical Data Model.” Here, we go into detail about the properties that define the conceptual data model. Table 7.2 describes those properties.

Table 7.2. Measure Group Properties
PropertyDescription
TypeThe type of measure group, such as Exchange Rate or Inventory. Analysis Services doesn’t use the Type property internally, but the property can be passed to the client application for better visualization of the model.
MeasuresA list of the measures that make up the fact. The fact must contain at least one measure.
DimensionsThe collection of dimensions that define the granularity and the binding of the fact to the multidimensional space of the cube. Each dimension can have a different relationship to the fact. We cover those relationships in more detail in Chapter 8.
IgnoreUnrelatedDimensionsDefines the behavior of the model when data is retrieved from the measure group by dimensions that are not used in the fact. In this case, there are two possible behaviors: Such dimensions are ignored and data is retrieved using other dimensions; or such dimensions are used, and the data is considered missing for such request.
EstimatedRowsOn the boundary between the physical and conceptual models. The number of rows the creator of the model estimates that the fact will hold. Data is loaded into the fact by portions, called partitions. The number of rows that can be loaded into the fact is unlimited except by the physical limitations of the hardware and the volume of data accumulated by the user. However, if you know the number of the rows that exist in the fact, you can help the system make better decisions when it chooses internal data structures for data storage and algorithms for their processing.

Listing 7.2 is the definition of the measure group Sales for our Warehouse and Sales cube.

Listing 7.2. Data Definition Language for a Measure Group
<MeasureGroup>
     <ID>Sales</ID>
     <Name>Sales</Name>
     <Measures>
          <Measure>
               <ID>Store Sales</ID>
               <Name>Store Sales</Name>
               <Source>
                    <DataType>Double</DataType>
                    <Source xsi:type="ColumnBinding">
                         <TableID>dbo_sales_fact_1997</TableID>
                         <ColumnID>store_sales</ColumnID>
                    </Source>
               </Source>
          </Measure>
          <Measure>
               <ID>Sales Fact 1997 Count</ID>
               <Name>Sales Count</Name>
               <AggregateFunction>Count</AggregateFunction>
               <Source>
                    <DataType>Integer</DataType>
                    <DataSize>4</DataSize>
                    <Source xsi:type="RowBinding">
                         <TableID>dbo_sales_fact_1997</TableID>
                    </Source>
               </Source>
          </Measure>
     </Measures>
     <Dimensions />
     <Partitions />
</MeasureGroup>

This example is not a full example. Some properties are missing from the DDL, such as the definitions of dimensions (to be explained later) and partitions (part of the physical model). In addition, certain properties, such as AggregateFunction, have default values assigned when they are missing from the DDL.

Measure Group Dimensions

A measure group dimension is a list of the cube dimensions that belong to the fact (measure group). The data space (dimensionality) for a fact is a subset of data space of the cube, defined by the measure group dimensions (that are a subset of the cube dimensions). If the list of dimensions in the cube is the same as the list of those in the measure group, you could say that the measure group has the same dimensionality as the cube. In that case, all the dimensions in the cube define how the data is loaded into the measure group. However, most cubes have multiple measure groups, and each measure group has its own list of dimensions.

In our example, the Warehouse measure group contains only four cube dimensions: Product, Time, Store, and Warehouse. All the other cube dimensions are unrelated to the Warehouse measure group (or to the measures of the measure group). Unrelated dimensions don’t define how the data is loaded into the fact. When users request data using an unrelated dimension in a query, Analysis Services based on the IgnoreUnrelatedDimensions measure group property either returns a NULL value or uses other measure group dimensions to resolve the request. For example, suppose a user requests the value of Units Ordered measure by specifying coordinates on the Time, Product, and Customer dimensions. Whereas Time and Product dimensions are related to the Warehouse measure group, the Customer dimension is not related.

Figure 7.4. Using the IgnoreUnrelatedDimensions dimensions property.


A measure group dimension doesn’t have its own name or ID. A measure group dimension is referenced by the ID of the cube dimension. The Annotations property is the only property that a measure group dimension has.

Granularity of a Fact

Each dimension of a measure group defines the rules for loading the fact data for that dimension into that measure group. By default, the system loads data according to the key attribute of the dimension. In our sample, the day defines the precision of the data for our Sales measure group. In this case, we would say that the Date attribute of the Time dimension is the one that defines the granularity (or the depth) of data for the fact.

The attributes that define granularity in each dimension of the measure group, taken together, make up a list called the measure group granularity. In our sample, the measure group granularity of the Warehouse measure group is a list that contains the Product, Date, Store, and Warehouse attributes. The Units Ordered measure of the Warehouse measure group will be loaded by only those four attributes. Because the Date attribute is the granularity of the Warehouse measure group, you can’t drill down to the hour when a product will arrive at a warehouse or at a store from a warehouse. You can drill down only to the day.

If the granularity of the measure group dimension is the key attribute, you don’t have to specify anything else. If not, you have to define a list of attributes for the measure group dimension and to specify which of them is a granularity attribute; otherwise, you’ll get an error.

To define a measure group dimension attribute, you specify the ID of that attribute in the Database dimension, its Type property, and its Annotations property. To mark an attribute as a granularity attribute, set its Type property to Granularity. It is theoretically possible to have several granularity attributes for each dimension, but the current version of the system allows only one granularity attribute for each dimension.

When you change a granularity attribute, the volume of data loaded into the measure group changes. In our sample, if we change the granularity attribute of the Time dimension from the Date attribute to the Month attribute, the fact data will be summarized by month. The volume will be 30 times smaller than it would be when using the Date attribute as the granularity attribute. After you’ve specified granularity by month, you can’t go back and analyze the data by day.

Now that the Month attribute is the granularity attribute, the Date is unrelated to the fact, in the same way that a cube dimension that isn’t included in a measure group is unrelated. Similar to an unrelated dimension, data won’t exist in the measure group for unrelated attributes. If a user requests data by an unrelated dimension or unrelated attribute, the IgnoreUnrelatedDimensions property of the measure group defines the behavior. Listing 7.3 shows an XML definition of the dimensions of our Warehouse measure group.

Listing 7.3. Dimensions of the Warehouse Measure Group
<Dimensions>
     <Dimension xsi:type="RegularMeasureGroupDimension">
          <CubeDimensionID>Customer</CubeDimensionID>
          <Attributes>
               <Attribute>
                    <AttributeID>Customer</AttributeID>
               <KeyColumns />
                    <Type>Granularity</Type>
               </Attribute>
          </Attributes>
     </Dimension>
     <Dimension xsi:type="RegularMeasureGroupDimension">
          <CubeDimensionID>Product</CubeDimensionID>
          <Attributes>
               <Attribute>
                    <AttributeID>Product</AttributeID>
                    <KeyColumns />
                    <Type>Granularity</Type>
               </Attribute>
          </Attributes>
     </Dimension>
     <Dimension xsi:type="RegularMeasureGroupDimension">
          <CubeDimensionID>Store</CubeDimensionID>
          <Attributes>
              <Attribute>
                    <AttributeID>Store</AttributeID>
                    <KeyColumns />
                    <Type>Granularity</Type>
               </Attribute>
          </Attributes>
     </Dimension>
     <Dimension xsi:type="RegularMeasureGroupDimension">
          <CubeDimensionID>Time</CubeDimensionID>
          <Attributes>
               <Attribute>
                    <AttributeID>Time By Day</AttributeID>
                    <KeyColumns />
                    <Type>Granularity</Type>
               </Attribute>
          </Attributes>
     </Dimension>
</Dimensions>

Measure Group Dimension Attributes and Cube Dimension Hierarchies

Take a look at Figure 7.5. Suppose that you change the granularity attribute for the Customer dimension in our sample from the Customer attribute to the City attribute. You’ll lose sales data not only for the Customer attribute, but also for the Gender attribute.

Figure 7.5. The key attribute Customer is used as the granularity attribute and in the user-defined hierarchy Geography.


When the granularity of a measure group changes, it’s not easy to figure out which attributes are related and which are unrelated to the measure group. To make a decision, Analysis Services uses dimension hierarchies defined for a cube. By default, when the granularity attribute is the key attribute, any cube dimension hierarchy can be used to analyze measure group data. When an attribute other than the key attribute defines the granularity of a measure group for a dimension, only the hierarchies that pass through this attribute are available for analysis. For a precise definition of which hierarchies and attributes are available for analyzing measures of a measure group, we introduce a new term: native hierarchy. A native hierarchy is a foundation for every dimension hierarchy. It includes all the attributes that are included in the dimension hierarchy and any attributes that are part of the uninterrupted path, through attribute relationships, from the top attribute of a hierarchy to the key attribute. If there is no uninterrupted path, the hierarchy is an unnatural hierarchy. When it encounters an unnatural hierarchy, Analysis Services divides it into two or more native hierarchies. Analysis Services builds one native hierarchy for each user-defined hierarchy when it processes a database dimension. It uses native hierarchies to make a decision whether an attribute is related to a measure group.

In Figure 7.5, you can see an example of the native hierarchy for the hierarchy Geography: CountryCityCustomer. The CountryCityCustomer path doesn’t include the State/Province attribute, but the native hierarchy includes the State/Province attribute to provide an uninterrupted path from the top attribute (Country) to the key attribute (Customer). Existence of the State/Province attribute in the native hierarchy makes the State/Province attribute related to the measure group. Therefore, even if you change the granularity attribute for the Customer dimension to the State/Province, users of the multidimensional model will be able to access the measure group data using the Geography hierarchy (Country and State/Province levels).

You can also have alternative paths from one attribute to the key attribute. Figure 7.6 shows an example of the hierarchy Time: YearQuarterDate, based on an attribute tree that has an alternative path from the Date attribute to the Year attribute.

Figure 7.6. An alternative path from the Year attribute to the Date attribute.


The presence of alternative paths can cause ambiguity when Analysis Services builds a native hierarchy. Analysis Services chooses one of the two paths to build the natural hierarchy and can choose the undesired path. If your user-defined hierarchy is between the Year and Date attributes, Analysis Services can’t distinguish between the two alternative paths. To build a native hierarchy, Analysis Services chooses the shortest uninterrupted path to the key attribute. So, to query data in the measure group, you can use only attributes from that path; other attributes will be treated as unrelated. For example, if Analysis Services chooses YearWeekDate attributes for native hierarchy, you won’t be able to use this hierarchy if the granularity attribute of the Time dimension is Month or Quarter attributes. This means that if you use Year, Week, or Date attributes in a query, they will be treated as unrelated attributes.

It’s a best practice to avoid alternative paths in the attribute tree altogether. If you can’t avoid alternative paths, at least avoid ambiguity by defining your hierarchy with attributes that fully define the path for the native hierarchy.

When you change the granularity attribute from one attribute to another, the hierarchies and attributes available for analysis can also change. Only the native hierarchies that pass through the new granularity attribute are available for analysis. On those hierarchies, only the attributes that are above the granularity attribute are related to the measures of the measure group.

Figure 7.7 shows fact data for sales in different countries. When you choose State/Province as the granularity attribute of the Customer dimension, the only hierarchy available for analysis is the Geography hierarchy from the previous example (shown in Figure 7.7; CountryCityCustomer), and the only attribute that is related to the fact is Country, because it’s the only attribute from this hierarchy that is above granularity attribute State/Province. For the Time dimension, the only hierarchy that is accessible in our fact is the Time hierarchy (YearQuarterDate), because we didn’t create the hierarchy passing through the Week attribute.

Figure 7.7. Province and Date as granularity attributes.


In Figure 7.7, the State/Province attribute is the granularity attribute, but not the key attribute, for the Customer dimension. Therefore, members of the State/Province attribute are leaf members of the Customer measure group dimension. It is into these leaf members that data is loaded. If the Time dimension is tied to the fact by the granularity attribute Date, even if the Time dimension has the Minute attribute, the data is loaded to the fact by days and not minutes. The members of the Date attribute are leaf members of the Time dimension in this case.



鲜花

握手

雷人

路过

鸡蛋

相关阅读

手机版|小黑屋|BC Morning Website ( Best Deal Inc. 001 )  

GMT-8, 2025-12-13 19:26 , Processed in 0.013196 second(s), 16 queries .

Supported by Best Deal Online X3.5

© 2001-2025 Discuz! Team.

返回顶部