Chapter 7. Measures and Multidimensional Analysis
Measures are the values that you are going to analyze; they are united in the measure groups by their relationships to dimensions in the cube. Measures are ultimately the most important semantic part of the cube because they are based on the fact data (sometimes called facts) that you’re going to analyze. They usually contain numeric metrics that describe the results of business activity. In our Warehouse and Sales cube, we have created Store Sales, Unit Sold, and Store Sales measures that enable the analysis of sales in different stores. Measures define which data is available for analysis, and in what form and by what rules that data can be transformed. The measure group defines how the data is bound to the multidimensional space of the cube. All the measures in a cube constitute the Measures dimension. The Measures dimension is the simplest dimension in the cube. It has only one attribute—Measure—whose members are measures. The Measure attribute is not aggregatable and doesn’t have the ALL member. The Measure dimension has just one hierarchy. It exists in every cube and doesn’t require definition other than the definitions of the measures. Measures in a Multidimensional CubeMeasures define the following: In Data Definition Language (DDL), a measure is a minor object of the model, defined by the major object, the measure group. (We discuss measure groups later in this chapter.) When you edit a measure, the specifications for the whole measure group are sent to Analysis Services. Making sure that you have the right set of measures in the model and the correct definition for each one is very important for the performance of the system. Table 7.1 describes the measure properties available in the conceptual model.
In your applications, you can also apply rules for the data format. For example, you could specify that values of sales over 200 should appear in bold. The client application retrieves the data formatting information through cell properties. For more information about how to extract cell properties, see Chapter 11, “Advanced MDX.” For more information about how to define custom formatting, see Chapter 12, “Cube-Based MDX Calculations,” and Chapter 13, “Dimension-Based MDX Calculations.” Listing 7.1 shows the definition of a measure in XML. The default value for the AggregateFunction property is SUM. (It’s the one that is most commonly used.) If you use a different aggregation than SUM, you must specify it; you don’t need to specify SUM. Listing 7.1. A Definition of the Store Sales Measure
The AggregateFunction and DataType measure properties define the behavior of the measure. These two properties are related to each other, so it’s a good idea to keep one property in mind when you define the other. Let’s look at each aggregation and the data type for it. SUMFor the SUM aggregation, the value of the measure is calculated by adding together the data on the various levels. For example, the sales in our sample model are defined by summing; to determine the sum of sales for this year, we sum the sales made during each quarter of the year. Any integer data type works for SUM, but it’s important to pay attention to the size of the data for this type so that you don’t get overflow from the summing of large numbers. When you use float numbers, you don’t usually get overflow, but performance can suffer and you can lose precision to the point that it affects the results of your analysis. For our Store Sales measure, the data type is Currency. The data maintains greater precision, and it won’t overflow; the Currency data type can hold huge numbers, enough to hold the sum of sales in most models. Avoid using Date/Time types with SUM because the results can be unexpected and hard to understand. You can’t use the String data type with SUM aggregation. MAX and MINThe MAX and MIN aggregations calculate the minimum and maximum values, respectively, for the measure. For example, if you create the measure Min Units Sold with aggregation function MIN, to calculate its value for the month of December, Analysis Services will find the lowest Min Units Sold value of each day. These two types of aggregation can take the same data types that SUM does. What’s different is that these two never lose precision and never overflow. Other data types, such as Date/Time, work better with MAX and MIN than with SUM. You can’t add dates and times, but you can easily determine the minimum and maximum (that is, earliest and latest). When it’s calculating the maximum or minimum of the values of a measure, the system ignores NULL values for the MAX and MIN aggregations. You can’t use the String data type with MAX and MIN aggregation. COUNTThe COUNT aggregation calculates the number of records in which the measure value is not NULL. When it comes to the data type of the records you’re counting, it doesn’t really matter what that data type is. This aggregation function is merely counting the records (so long as they’re not NULL). Therefore, 100 records of one data type are equivalent to 100 records of any other data type. When you define the Source property of a measure, it’s best not to mess with the data type of the values; just leave it the same as the original data type in the relational table; it won’t affect the aggregation result. You can use the String data type for measure values with COUNT aggregation, unlike with SUM, MIN, and MAX aggregations. It won’t cost any more resources because the system stores the number of records with no consideration for their values. When it comes to the data type of the count itself, the integer data types are preferred; choose one that minimizes the chance of overflow. Our sample cube has a count of sales: Sales Count measure. It doesn’t matter that the value of the measure has a Currency data type as defined in the relational database. DISTINCT COUNTThe DISTINCT COUNT aggregation calculates the number of records with unique values within the measure. For example, you can define DISTINCT_CUSTOMER measure that calculates the number of unique customers your store has had. Consider the fact table shown in Figure 7.1. It has three columns Customer_ID, Product_ID, and Time_ID. The Product_ID and Time_ID columns are used to build Product and Time dimensions. The Customer_ID column is used as a source of Distinct_Customer measure. To calculate value of this measure, Analysis Services scans over the Customer_ID column and counts the unique customer IDs. In our example, there are two unique customers: 1 and 2. Figure 7.1. Fact table with column used to calculate a DISTINCT_COUNT measure.
One of the most important characteristics of the DISTINCT_COUNT measure is that you cannot calculate the value on the top level by summing values on the level below; the DISTINCT_COUNT measure is not aggregatable. In our example, our store has two unique customers, and therefore the value of the measure corresponding to the member full year of 1997 is 2. However, the value of the DISTINCT_CUSTOMER measure for January 1997 is also 2, and for February 1997 is 1 (see Figure 7.2).
The value of a measure is stored in the format defined by the data specification in the Source property. The String data type is acceptable for this aggregation function, along with others we’ve discussed. The rules for comparing string data to determine whether a value is unique within the measure are defined by Collation in the data specification. With DISTINCT_COUNT aggregation, your cost per record is greater than for other aggregation types because the system has to store the actual value for each record that has a unique value, including string values (which take 2 bytes per character and 6 bytes per string). This cost is multiplied by the number of records because the system has to keep the records for each unique value; that is, the system has not only to keep more in the record, but it also has to keep more records. If the data specification dictates that NULL values be counted in the measure, the system reports all the records with NULL values as a distinct value. It takes a great deal of resources to store data for a DISTINCT COUNT aggregation, and many resources to access the data. This can affect the performance of the system, as compared to other aggregation types such as COUNT. In a measure group, you can have only one measure with the aggregation function DISTINCT COUNT. If you have a measure with a DISTINCT COUNT aggregation in a measure group, we recommend that you avoid having a measure with any other aggregation function in the same measure group. Measure GroupsMeasure groups—in other words, fact data—define the fact space of the data in the cube. In many respects, they define the behavior of the physical data model. Facts lie on the border between the physical and conceptual models and define the following: Note Don’t worry if you’re wondering when we decided that measure groups are the same as facts. We introduced you to the term measure group first, but it is essentially the same as a fact. Different uses, however, require one term or another. DDL, for example, uses measure group. Measures of the same granularity are united in one measure group (fact). That granularity defines the dimensionality of the fact space and the position of the fact in the attribute tree of each dimension. Granularity is a characteristic of a measure group that defines its size, complexity, and binding to the cube. In our sample cube Warehouse and Sales, we have two facts: One contains data about sales, the other one about the warehouses. Each of these facts has a set of dimensions, which define how the measures of a fact are bound to the cube. The Warehouse measure group has Product, Time, Store, and Warehouse dimensions. The Sales measure group has Product, Time, Store, Promotion, and Customer dimensions, as shown in Figure 7.3. Each measure group can aggregate data starting from a different level. For example, you can choose to keep daily sales data, but weekly or monthly warehouse information. We go into greater detail about granularity in a later section, “Measure Group Dimensions.” Having different measure groups within the same cube allows you to analyze data related to each measure group independently and data shared between the measure groups together in the same query. For example, you can request information about units of a product sold in a store and units ordered from warehouse for this store. Figure 7.3. Cube with two measure groups
In DDL, a measure group (fact) is a major object and, therefore, is defined and edited independently of the other objects in the system. As any other major object, a measure group has ID, Name, Description, Annotations, and Translations properties. A measure group doesn’t have its own language; it takes on the language property of the cube to which it belongs. In addition, a measure group contains time and date stamps that indicate when it was created and loaded with data. Of the many properties of a measure group, some belong to the physical model: those that define what data is loaded into the cube and how it is stored there. We discuss those properties in Chapter 20, “Physical Data Model.” Here, we go into detail about the properties that define the conceptual data model. Table 7.2 describes those properties.
Listing 7.2 is the definition of the measure group Sales for our Warehouse and Sales cube. Listing 7.2. Data Definition Language for a Measure Group
This example is not a full example. Some properties are missing from the DDL, such as the definitions of dimensions (to be explained later) and partitions (part of the physical model). In addition, certain properties, such as AggregateFunction, have default values assigned when they are missing from the DDL. Measure Group DimensionsA measure group dimension is a list of the cube dimensions that belong to the fact (measure group). The data space (dimensionality) for a fact is a subset of data space of the cube, defined by the measure group dimensions (that are a subset of the cube dimensions). If the list of dimensions in the cube is the same as the list of those in the measure group, you could say that the measure group has the same dimensionality as the cube. In that case, all the dimensions in the cube define how the data is loaded into the measure group. However, most cubes have multiple measure groups, and each measure group has its own list of dimensions. In our example, the Warehouse measure group contains only four cube dimensions: Product, Time, Store, and Warehouse. All the other cube dimensions are unrelated to the Warehouse measure group (or to the measures of the measure group). Unrelated dimensions don’t define how the data is loaded into the fact. When users request data using an unrelated dimension in a query, Analysis Services based on the IgnoreUnrelatedDimensions measure group property either returns a NULL value or uses other measure group dimensions to resolve the request. For example, suppose a user requests the value of Units Ordered measure by specifying coordinates on the Time, Product, and Customer dimensions. Whereas Time and Product dimensions are related to the Warehouse measure group, the Customer dimension is not related. Figure 7.4. Using the IgnoreUnrelatedDimensions dimensions property.
A measure group dimension doesn’t have its own name or ID. A measure group dimension is referenced by the ID of the cube dimension. The Annotations property is the only property that a measure group dimension has. Granularity of a FactEach dimension of a measure group defines the rules for loading the fact data for that dimension into that measure group. By default, the system loads data according to the key attribute of the dimension. In our sample, the day defines the precision of the data for our Sales measure group. In this case, we would say that the Date attribute of the Time dimension is the one that defines the granularity (or the depth) of data for the fact. The attributes that define granularity in each dimension of the measure group, taken together, make up a list called the measure group granularity. In our sample, the measure group granularity of the Warehouse measure group is a list that contains the Product, Date, Store, and Warehouse attributes. The Units Ordered measure of the Warehouse measure group will be loaded by only those four attributes. Because the Date attribute is the granularity of the Warehouse measure group, you can’t drill down to the hour when a product will arrive at a warehouse or at a store from a warehouse. You can drill down only to the day. If the granularity of the measure group dimension is the key attribute, you don’t have to specify anything else. If not, you have to define a list of attributes for the measure group dimension and to specify which of them is a granularity attribute; otherwise, you’ll get an error. To define a measure group dimension attribute, you specify the ID of that attribute in the Database dimension, its Type property, and its Annotations property. To mark an attribute as a granularity attribute, set its Type property to Granularity. It is theoretically possible to have several granularity attributes for each dimension, but the current version of the system allows only one granularity attribute for each dimension. When you change a granularity attribute, the volume of data loaded into the measure group changes. In our sample, if we change the granularity attribute of the Time dimension from the Date attribute to the Month attribute, the fact data will be summarized by month. The volume will be 30 times smaller than it would be when using the Date attribute as the granularity attribute. After you’ve specified granularity by month, you can’t go back and analyze the data by day. Now that the Month attribute is the granularity attribute, the Date is unrelated to the fact, in the same way that a cube dimension that isn’t included in a measure group is unrelated. Similar to an unrelated dimension, data won’t exist in the measure group for unrelated attributes. If a user requests data by an unrelated dimension or unrelated attribute, the IgnoreUnrelatedDimensions property of the measure group defines the behavior. Listing 7.3 shows an XML definition of the dimensions of our Warehouse measure group. Listing 7.3. Dimensions of the Warehouse Measure Group
Measure Group Dimension Attributes and Cube Dimension HierarchiesTake a look at Figure 7.5. Suppose that you change the granularity attribute for the Customer dimension in our sample from the Customer attribute to the City attribute. You’ll lose sales data not only for the Customer attribute, but also for the Gender attribute. Figure 7.5. The key attribute Customer is used as the granularity attribute and in the user-defined hierarchy Geography.
When the granularity of a measure group changes, it’s not easy to figure out which attributes are related and which are unrelated to the measure group. To make a decision, Analysis Services uses dimension hierarchies defined for a cube. By default, when the granularity attribute is the key attribute, any cube dimension hierarchy can be used to analyze measure group data. When an attribute other than the key attribute defines the granularity of a measure group for a dimension, only the hierarchies that pass through this attribute are available for analysis. For a precise definition of which hierarchies and attributes are available for analyzing measures of a measure group, we introduce a new term: native hierarchy. A native hierarchy is a foundation for every dimension hierarchy. It includes all the attributes that are included in the dimension hierarchy and any attributes that are part of the uninterrupted path, through attribute relationships, from the top attribute of a hierarchy to the key attribute. If there is no uninterrupted path, the hierarchy is an unnatural hierarchy. When it encounters an unnatural hierarchy, Analysis Services divides it into two or more native hierarchies. Analysis Services builds one native hierarchy for each user-defined hierarchy when it processes a database dimension. It uses native hierarchies to make a decision whether an attribute is related to a measure group. In Figure 7.5, you can see an example of the native hierarchy for the hierarchy Geography: Country–City–Customer. The Country–City–Customer path doesn’t include the State/Province attribute, but the native hierarchy includes the State/Province attribute to provide an uninterrupted path from the top attribute (Country) to the key attribute (Customer). Existence of the State/Province attribute in the native hierarchy makes the State/Province attribute related to the measure group. Therefore, even if you change the granularity attribute for the Customer dimension to the State/Province, users of the multidimensional model will be able to access the measure group data using the Geography hierarchy (Country and State/Province levels). You can also have alternative paths from one attribute to the key attribute. Figure 7.6 shows an example of the hierarchy Time: Year–Quarter–Date, based on an attribute tree that has an alternative path from the Date attribute to the Year attribute. Figure 7.6. An alternative path from the Year attribute to the Date attribute.
The presence of alternative paths can cause ambiguity when Analysis Services builds a native hierarchy. Analysis Services chooses one of the two paths to build the natural hierarchy and can choose the undesired path. If your user-defined hierarchy is between the Year and Date attributes, Analysis Services can’t distinguish between the two alternative paths. To build a native hierarchy, Analysis Services chooses the shortest uninterrupted path to the key attribute. So, to query data in the measure group, you can use only attributes from that path; other attributes will be treated as unrelated. For example, if Analysis Services chooses Year–Week–Date attributes for native hierarchy, you won’t be able to use this hierarchy if the granularity attribute of the Time dimension is Month or Quarter attributes. This means that if you use Year, Week, or Date attributes in a query, they will be treated as unrelated attributes. It’s a best practice to avoid alternative paths in the attribute tree altogether. If you can’t avoid alternative paths, at least avoid ambiguity by defining your hierarchy with attributes that fully define the path for the native hierarchy. When you change the granularity attribute from one attribute to another, the hierarchies and attributes available for analysis can also change. Only the native hierarchies that pass through the new granularity attribute are available for analysis. On those hierarchies, only the attributes that are above the granularity attribute are related to the measures of the measure group. Figure 7.7 shows fact data for sales in different countries. When you choose State/Province as the granularity attribute of the Customer dimension, the only hierarchy available for analysis is the Geography hierarchy from the previous example (shown in Figure 7.7; Country–City–Customer), and the only attribute that is related to the fact is Country, because it’s the only attribute from this hierarchy that is above granularity attribute State/Province. For the Time dimension, the only hierarchy that is accessible in our fact is the Time hierarchy (Year–Quarter–Date), because we didn’t create the hierarchy passing through the Week attribute. Figure 7.7. Province and Date as granularity attributes.
In Figure 7.7, the State/Province attribute is the granularity attribute, but not the key attribute, for the Customer dimension. Therefore, members of the State/Province attribute are leaf members of the Customer measure group dimension. It is into these leaf members that data is loaded. If the Time dimension is tied to the fact by the granularity attribute Date, even if the Time dimension has the Minute attribute, the data is loaded to the fact by days and not minutes. The members of the Date attribute are leaf members of the Time dimension in this case. |
手机版|小黑屋|BC Morning Website ( Best Deal Inc. 001 )
GMT-8, 2025-12-13 19:26 , Processed in 0.013196 second(s), 16 queries .
Supported by Best Deal Online X3.5
© 2001-2025 Discuz! Team.