SSAS 2008 Unleashed - Chapter 8. Advanced Modeling

2012-8-16 15:10| 发布者: demo| 查看: 1892| 评论: 0

摘要: Chapter 8. Advanced Modeling In This Chapter Parent-Child RelationshipsAttribute DiscretizationIndirect DimensionsMeasure ExpressionsLinked Measure Groups In the previous chapters, we covered basic a ...

Attribute Discretization

When we speak of the values of an attribute, we can be talking about two different types. Discrete values are those that have clear boundaries between the values. For example, the Gender attribute is typically considered to have only discrete values (that is, male or female). For the majority of attributes, the possible values are naturally discrete.

Contiguous values are those for which no clear boundaries exist, where the values flow along a continuous line. For example, a worker’s wage typically falls in a range of possible values. And, the more employees you have, the more possibilities exist in that range. Some sets of contiguous values can be of infinite or a very large number of possibilities. It can be difficult to work efficiently with such a wide range of values.

You can use discretization to make it easier to work with large numbers of possible values. Discretization is the process of creating a limited number of groups of attribute values that are clearly separated by boundaries. You use discretization to group contiguous values into sets of discrete values.

Analysis Services supports several variations of attribute discretization, based on algorithms of various complexity. They all do basically the same thing—group contiguous values into discrete ranges. The different methods specify different ways to group the values. We also support an additional, user-defined discretization method, not available in the Dimension Editor (use Data Definition Language [DDL]). With this method, you can define the groupings for the attribute values.

To have Analysis Services apply discretization to a contiguous attribute, you set two properties:

DiscretizationMethod defines the specific method for discretization.
DiscretizationBucketCount defines the number of groups the values will be placed in.

Analysis Services supports methods of discretization shown in Table 8.1.

Table 8.1. Methods of Discretization
Method	Description
`Cluster`	Uses the K-Means algorithm to finds ranges on the input values
`EqualArea`	Specifies that when the distribution of continuous values is plotted as a curve, the areas under the curve covered by each range are equal
`Threshold`	Specifies that when the distribution of continuous values is plotted as a curve, ranges are created based on the inflection points (where gradient changes direction) in their distribution curve
`Automatic`	Chooses the best grouping technique among `EqualArea`, `Cluster`, and `Threshold` methods
`UserDefined`	Specifies that the user can define a custom grouping of the members

Listing 8.1 shows an example of DDL that creates an attribute that will be grouped by customer IDs using a discretization algorithm.

Listing 8.1. Defining a Discretization Method


<Attribute>
    <ID>Customer Id</ID>
    <Name>Customer Id Group</Name>
    <KeyColumns>
        <KeyColumn>
            <DataType>Integer</DataType>
            <Source xsi:type="ColumnBinding">
                <TableID>dbo_customer</TableID>
                <ColumnID>customer_id</ColumnID>
            </Source>
        </KeyColumn>
    </KeyColumns>
    <NameColumn>
        <DataType>WChar</DataType>
        <Source xsi:type="ColumnBinding">
            <TableID>dbo_customer</TableID>
            <ColumnID>customer_id</ColumnID>
        </Source>
    </NameColumn>
    <DiscretizationMethod>Automatic</DiscretizationMethod>
</Attribute>

For a user-defined method, you define the boundaries for every group that you specify. In this process, you have to define the binding of the attribute to the data source. (For more information about attribute bindings, see Chapter 18, “DSVs and Object Bindings.”)

When you create attribute groups, it’s helpful to give them names that are intuitive for users. Analysis Services has templates that can help generate names for the groups.

You can also use attribute discretization for attributes that are already discrete but that have a large number of values, such as the CustomerID attribute in the Customer dimension. You can use a discretization method to group these attribute members into groups that are more easily usable for analysis.

123 4 5 / 5 页下一页