Part III: Using MDX to Analyze DataIn This Part
Chapter 10. MDX Concepts
In previous chapters, we have discussed how multidimensional databases are designed, but not how they are used. In the next few chapters, we discuss how the data stored in Analysis Services can be retrieved and the different ways this data can be calculated and analyzed. Among modern database systems, the most popular way of accessing data is to query it. A query defines the data that the user wants to see, but doesn’t define the steps and algorithms that have to be performed to retrieve the data. Structured Query Language (SQL) is one of the most popular query languages used to retrieve relational data. However, it was not designed to work with the rich data model supported by multidimensional databases. To access data stored in Online Analytical Processing (OLAP) systems, Microsoft invented Multidimensional Expressions (MDX). Currently, MDX is an industry standard, and a number of leading OLAP servers support it. It is also widely used in numerous client applications that enable the user to view and analyze multidimensional data. You can use MDX not only to query data but also to define server-side calculations, advanced security settings, actions, key performance indicators (KPIs), and so on. So, even if you are not planning to write a client application that will generate MDX statements, you might find MDX useful to define security and program Analysis Services. If you want to practice writing MDX statements, you can use SQL Server Management Studio (SSMS). The samples that we provide in this chapter were created in SSMS. The SELECT StatementThe syntax of MDX was designed with SQL as a prototype. But new concepts and semantics were introduced to make it more intuitive to query multidimensional data. Similar to SQL, MDX is a text query language. As with SQL, the most important statement is the statement for retrieving data: the SELECT statement. The SELECT statement poses a question about the data and returns the answer. That answer is a new multidimensional space. Like SQL, SELECT in MDX has the three main clauses SELECT, FROM, and WHERE. (To be completely accurate, the SELECT statement in MDX has more than three clauses, but we talk about the others in Chapter 12, “Cube-Based MDX Calculations.”)
Here is a simple MDX SELECT statement:
Code View:
Scroll
/
Show All SELECT <definition of the resulting multidimensional space> FROM <source subspace> WHERE <limiting the results to subspace> The SELECT ClauseThe result of a relational query is a two-dimensional table. The result of an MDX query is a multidimensional subcube; it can contain many dimensions. To differentiate the dimensions in the original cube from dimensions in the subcube that results from the query, we call a dimension in the multidimensional result an axis. When you create a multidimensional query, you should list the axes that will be populated with the results. Theoretically, no limitations apply as to the number of axes that you can request with an MDX query. In the real world, however, the number of axes is limited by the number of dimensions in the multidimensional model; by the physical limitations of the computer; and most important, by the capabilities of the user interface to display the results in a format that is understandable to humans. For example, SSMS supports only two axes. The authors of this book have never seen a query containing more than three axes, but that doesn’t mean that you can’t invent a new user interface that can use sound or smell to represent the data on the fourth and fifth axes. You use an ON clause to list the axes in a SELECT statement. The axes are separated from each other by a comma (,). The syntax is as follows: SELECT <content of the axis> ON axis(0), <content of the axis> ON axis(1), <content of the axis> ON axis(2) ... <content of the axis> ON axis(n), from <cube_name> There are various ways to name an axis. The most generic one is to put the axis number in parentheses following the word axis: ON Axis(0) or ON Axis(1). To simplify the typing of your MDX statement, you can omit Axis and the parentheses and just write the number corresponding to the axis: ON 0 or ON 1. The most commonly used axes have names and can be referenced by their names. Axis number 0 is called columns, axis number 1 is rows, and axis number 2 is pages. This is the most common way to specify an axis; frequently the query will look like Listing 10.1. Listing 10.1. Using the Names of Axes
Defining Coordinates in Multidimensional SpaceNow that we know how to define an axis in an MDX query, we can review the information that should be supplied as an axis definition. In SQL, you use a SELECT clause to define the column layout of the resulting table. In MDX, you use a SELECT clause to define the axis layout of the resulting multidimensional space. Each axis is defined by the coordinates of the multidimensional space, which are the members that we are projecting along the axis. Internally in Analysis Services, members are created on the dimension attribute; in MDX, you navigate to a member only through a navigation path—hierarchy. In Chapter 5, “Dimensions in the Conceptual Model,” we explained two kinds of hierarchy: the user-defined hierarchy and the attribute hierarchy. You can use either a user-defined hierarchy or an attribute hierarchy (as long as it’s set in the model as enabled) to define the coordinates of a multidimensional space. However, each member that can be accessed through the user hierarchy can also be accessed through the attribute hierarchy. Internally, the system uses members of the attribute hierarchy to define the space. If the user has specified a member using a user-defined hierarchy, the system projects that member onto the attribute hierarchy. (For more information about the rules for this process, see Chapter 11.) Each point in a multidimensional space is defined by a collection of coordinates—a tuple—where each coordinate corresponds to a dimension member. You define a tuple by enclosing a comma-delimited list of members in parentheses. For example, a simple tuple that contains two members would be ([1997], [USA]). The simplest (but not the best) way to reference a member in MDX is to enclose its name in square brackets. (We talk about better ways to reference a member later in this chapter.) Each member of the tuple belongs to a different hierarchy; you can’t create a tuple that has more than one member coming from the same hierarchy. Each tuple has two properties: dimensionality—a list of the dimension hierarchies that this tuple represents—and the actual members contained in the tuple. For example, dimensionality of the tuple ([1997],[USA]) is Time.Year and Store.Countries, and the member values are 1997 and USA. Tuples that have the same dimensionality can be united in a set. As the name implies, an MDX set is a set in the mathematical sense—a collection of objects. Therefore, all the laws defined by set algebra apply to MDX sets. The simplest way to define a set is to explicitly list its tuples between braces. For example, a set containing two tuples can be represented in the following way: {([1997],[USA]), ([1998],[USA])}To return to our discussion about specifying axes: You can define a multidimensional space that you need to retrieve from the cube by projecting a set on an axis. Note In Analysis Services 2005 and Analysis Services 2008, sets that reference the same dimension can be projected on different axes, but they have to reference different hierarchies. Let’s take a look at an example of a simple request. In this example, we use a very simple cube, shown in Figure 10.1. Figure 10.1. A simple three-dimensional cube contains dimensions on three axes.
This cube has three dimensions (Stores, Products, and Time) and a measure dimension with three measures (Store Sales, Store Cost, and Units Sold). We placed the Time dimension along the x-axis, the Store dimension along the y-axis, and the Products dimension along the z-axis. Next, we analyze the products in the Food and Drink product families, sold over some period of time; let’s say from the beginning of 1997 to the end of 1998 in all the counties where FoodMart Enterprises has stores. We project the Time dimension and Products dimension on the Columns axis and measures on the Rows axis. As a result, we get a two-dimensional space, shown in Figure 10.2. Figure 10.2. The projection of dimension members on two axes produces a two-dimensional space.
Listing 10.2 demonstrates how you write such a projection in MDX. Listing 10.2. Dimension Members Are Projected on Two Axes
If we execute this query in the MDX query editor of SSMS, we get the results arranged in a grid, shown in Figure 10.3. Figure 10.3. The results of our query appear in this grid.
Default Members and the WHERE ClauseEach cell in a multidimensional space is defined by all the attributes (or attribute hierarchies) in the cube. Some of the attributes are specified in a SELECT clause and define the shape of the resulting multidimensional space. But what happens when the cube has more attributes than the ones that we projected on the axes in Figure 10.1? How can those missing coordinates be defined? Look back at Figures 10.1 and 10.2. Figure 10.1 represents the multidimensional space where data is stored, and Figure 10.2 represents the multidimensional space that is the result of the query. Now let’s pick up one of the resulting cells (for example, Store Sales, Drink, 1997) and see how the system assigned coordinates that can be used to retrieve the data from the cube. You can see in Figure 10.4 that the cell coordinates are based on the attributes of all the dimensions in the cube: Store, Product, Time, and Measures. The coordinates on three of those dimensions were defined by the SELECT statement and are shaded in the figure. The fourth dimension, Stores, was left undefined by the query. Figure 10.4. The coordinates of the cell are based on the attributes of the dimensions in the cube.
When the cube has more attributes than the number of attributes that are projected on the axes, the system creates an axis, called the slicer axis, which contains all the other attributes. The slicer axis is made up of one member from each of those attributes; those members are called default members. In the design phase, you can choose a specific member as the default member for the attribute, by setting the DefaultMember property of either the Dimension attribute or the Perspective attribute. For example, you might decide that the default member of the Currency attribute should be the U.S. dollar. If you don’t specify a default member in the model, however, the system chooses default members automatically. Usually, if the attribute is aggregatable (the attribute hierarchy has an ALL level), the member ALL is used as the default member. This arrangement makes sense because a user typically wants to see aggregated values unless she didn’t explicitly specify a coordinate in her query. For example, if user explicitly did not specified Stores dimension in the query, Analysis Services uses member ALL, and the result value will be total costs in all the stores. If attribute cannot be aggregated, Analysis Services chooses any member from this attribute—usually it’s the first member of an attribute—but the rules are a little bit more complicated, because the system tries to choose a member other than Unknown, hidden, or secure. The notion of the default member is an important one. Analysis Services uses default members every time the MDX query hasn’t specified a coordinate from particular attribute. When you write your query, it’s important to know which member is the default member. To find out which member is the default member, you can use either Schema Rowset or the MDX function Hierarchy.DefaultMember. (Even though we’re looking for a member of an attribute, MDX works through the hierarchy.) Here’s an example of the MDX function: SELECT [Store].[Stores].DefaultMember ON COLUMNS FROM [Warehouse and Sales] Figure 10.5 illustrates the result of this query. Figure 10.5. The default member of the Stores hierarchy of the Store dimension appears as a result of the query.
You can specify in the query the exact member (a slice) or a set of members (a subcube) you want to slice the data by. To limit the results to a particular slice or subcube, you can specify the slice or subcube in a WHERE clause. In the preceding example, instead of using the default member of the Store dimension, we could have specified a Country attribute or any other attribute of the Store dimension we want to see the data for, as we do in Listing 10.3. Listing 10.3. Using a WHERE Clause to Slice the Result by Country
Figure 10.6 shows the result of the query in the preceding listing. Figure 10.6. The query with the WHERE clause results in this data.
Note Analysis Services 2000 supports only a single tuple in a WHERE clause. But Analysis Services 2005 and Analysis Services 2008 support a set of tuples in a WHERE clause. (See Chapter 11 for details about sets in the WHERE clause.) It’s not mandatory that you have a WHERE clause in your query. The WHERE clauses in SQL and MDX are conceptually different. A SQL WHERE clause is used to restrict the rows that are returned as a result of the query; it’s there to condition the result. On the other hand, an MDX WHERE clause is used to define a slice of the cube. It’s mainly designed to clarify the coordinates for the dimension attributes that weren’t specified in the SELECT clause. Now that we’re talking about the differences between the WHERE clause in MDX and SQL, we need to mention one similarity introduced in Analysis Services 2005: The WHERE clause in MDX can also serve to restrict the tuples that are returned by the query. But we are getting a bit ahead of ourselves. We talk about this capability of the MDX query in the section “Existing and Nonexisting Tuples, Auto-Exist” in Chapter 11. Query Execution ContextNow that we’ve covered how you create an MDX query, we move on to what happens on the server when Analysis Services receives the query. Analysis Services first iterates over the members of the sets along each axis. Note Those members are returned to the client application, and the client application usually displays them as labels on a grid, or maybe labels along the axes of a chart. Analysis Services then calculates the cell value for the intersection of the coordinates from each axis. The coordinate in which context the value is calculated is called the current coordinate. In a simple case where there is only one tuple in the WHERE clause, Analysis Services creates the current coordinate using one member from each attribute of each of the dimensions in the cube. If there is a set in the WHERE clause, the current coordinate is a more complex data structure: a subcube. (For a discussion of such complex cases, see the section “Sets in a Where Clause” in Chapter 11.) The current coordinate is built from the members of the attributes that were used in the WHERE clause and from members of the attributes corresponding to the current iteration over each axis. For attributes that have been referenced neither on the axes nor in the WHERE clause, Analysis Services uses the default members. In Listing 10.5, we look under the hood as Analysis Services calculates a single cell of a simple query. Listing 10.5. Calculating a Cell in a Simple Query
As we go along, we explain what Analysis Services does along the way as it populates the current coordinate.
In many cases, it’s useful to reference the current coordinate in an MDX expression. For this, MDX provides the <hierarchy>.CurrentMember function. This function returns a projection of the current coordinate onto a particular hierarchy. Note The CurrentMember function returns a member. So, in the case where your WHERE clause contains more than one member from the attribute that the hierarchy corresponds to, the CurrentMember function returns an error. Set Algebra and Basic Set OperationsIn the earlier discussion about the multidimensional coordinate system and ways of defining the coordinates of multidimensional space, we stated—although not in these exact words—that you can define a multidimensional space by projecting a set on an axis. In this section, we discuss set algebra in greater detail and how it is used in MDX. Three set algebra operations and two basic set operations enable you to construct new MDX sets from existing ones: UnionUnion combines two or more sets of the same dimensionality into one set, as shown in Figure 10.7. The resulting set contains all the tuples from each of the sets. If a tuple exists in both of the original sets, it is added to the new (Union) set just once—the duplicate tuple is not added. This operation is equivalent to the addition operator. Figure 10.7. These two sets have been united using the Union operation.
The code that produces this result is as follows: SELECT Union({[Renton],[Redmond]},{[Edmonds]}) ON COLUMNS
FROM [Warehouse and Sales]Which returns the following set: {Renton, Redmond, Edmonds}Because Union is the equivalent of an addition operation, you can also use a + operator to create a union of sets:
Code View:
Scroll
/
Show All SELECT {[Renton],[Redmond]}+{[Edmonds]} ON COLUMNS FROM [Warehouse and Sales]
MDX supports one more syntax for the Union operation using curly braces, but that operation is not exactly equivalent to the Union function or to the + operator. When two sets that are united by curly braces have duplicated tuples, the resulting set retains the duplicates. For example SELECT {{[Renton],[Redmond],[Edmonds]},{[Edmonds]}} ON COLUMNS
FROM [Warehouse and Sales]Returns the following set: {Renton, Redmond, Edmonds, Edmonds}IntersectIntersect constructs a new set by determining which tuples the two sets have in common, as shown in Figure 10.8. Figure 10.8. The intersection of two sets constitutes a new set with the tuples that the two original sets have in common.
For example, the following code
Code View:
Scroll
/
Show All SELECT INTERSECT({[Burnaby], [Redmond], [Renton]}, {[Redmond], [Renton],[Everett]}) ON COLUMNS FROM [Warehouse and Sales]
Returns the following set: {Redmond, Renton}ExceptExcept finds the differences between two sets, as shown in Figure 10.9. This operation constructs a new set that contains elements that are members of one set, but not members of the other. This operation is equivalent to the subtraction operator (-). Figure 10.9. After an Except operation, a new set that contains elements that are members of one set, but not members of the other, is constructed.
Code View:
Scroll
/
Show All SELECT Except({[Renton],[Redmond],[Burnaby]},{[Burnaby]}) on COLUMNS FROM [Warehouse and Sales]
{Renton, Redmond}Because Except is equivalent to the subtraction operator (-), MDX provides an alternative syntax for this operation. The following query, an example of that alternative syntax, returns the same results as the previous one:
Code View:
Scroll
/
Show All SELECT {[Renton],[Redmond],[Burnaby]}-{[Burnaby]} ON COLUMNS FROM [Warehouse and Sales]
CrossJoinCrossJoin is one of the most often used operations of MDX. It generates a set that contains all the possible combinations of two (or more) sets, as shown in Figure 10.10. This function is typically used to project members from different hierarchies on the same axis. CrossJoin is equivalent to the multiplication operation. For example
Code View:
Scroll
/
Show All SELECT CROSSJOIN({[1997],[1998]},{[USA],[CANADA], [MEXICO]}) ON COLUMNS FROM [Warehouse and Sales]
{([1997],[USA]),([1997],[Canada]),([1997],[Mexico]),
([1998],[USA]),([1998],[Canada]),([1998],[Mexico])}And the following
Code View:
Scroll
/
Show All SELECT {[1997],[1998]}*{[USA],[CANADA],[MEXICO]} ON COLUMNS FROM [Warehouse and Sales]
Results in the same set. Note In Analysis Services 2000, there was one small difference between these two syntaxes: The CrossJoin function was limited to working with two sets, whereas the * operator functioned like a CrossJoin with as many sets as your system permits. This limitation was lifted in Analysis Services 2005; the CrossJoin function now can take any number of sets. ExtractExtract creates a set that contains only tuples of a specific hierarchy from the set as specified by the hierarchy expression in the second argument of the Extract function. This operation is the opposite of CrossJoin. For example
Code View:
Scroll
/
Show All SELECT Extract(CROSSJOIN({[1997],[1998]},{[USA],[CANADA], [MEXICO]}), [Time].[Time]) ON COLUMNS FROM [Warehouse and Sales]
Results in the following set: {[1997],[1998]}MDX FunctionsYou can create new sets by performing certain set algebra operations on existing sets, but you must have some original sets to start with. If you had to enumerate tuples to create a set, you would be forced into a process that would be highly inconvenient, not to mention not optimal, and, more than that, not scalable. To enable you to avoid such pain, MDX provides a rich set of functions so that you can create set objects that you can use to operate with sets. In addition to functions for working with sets, MDX provides functions that operate with other multidimensional objects such as dimensions, hierarchies, levels, members, tuples, and scalars. (A scalar is a constant value, such as a number or a string.) You can use MDX functions to construct MDX fragments, also known as MDX expressions. In this chapter, we explain how to use MDX functions and take a look at a few of the most basic and important functions in MDX. If you look at things from a syntactical point of view, you can divide MDX functions into two groups: methods and properties. There’s no important difference between those groups, except their syntax. Methods have the following syntax: <function_name>([<parameter>[,<parameter>...]]) For example CROSSJOIN({[1997],[1998]},{[USA],[CANADA], [MEXICO]})Properties have the following syntax: <object_name>.<property_name>[(<parameter>[,<parameter>...]]) For example [Time].[Time].DefaultMember Both kinds of MDX functions return MDX values of one of the following types: Dimension, Hierarchy, Level, Member, Tuple, Set, and Scalar. These values can be passed as parameters to other MDX functions. For example, the CrossJoin function produces a set that we pass as a parameter to the Extract function: EXTRACT(CROSSJOIN({[1997],[1998]},{[USA],[CANADA],[MEXICO]}),
[Time].[Time])Now that we’ve covered how MDX functions can be written and used, let’s take a look at the most commonly used functions and lay a foundation for understanding more complex ones. Functions for Navigating HierarchiesMultidimensional data stored in Analysis Services is often traversed using navigation paths that are defined by hierarchies. Members in the hierarchies are usually displayed as a tree. In Figure 10.11, you can see the user-defined hierarchy Stores of the Store dimension. Figure 10.11. The Stores hierarchy appears as a tree.
In this hierarchy, the member ALL is the parent of the members on the next level of the hierarchy: Canada, Mexico, and USA. The states CA, OR, and WA are children of USA, and so on. The states CA, OR, and WA are also descendants of the member ALL, and ALL is an ancestor of the members that represent the states. What we just said in English can be expressed in MDX using hierarchy navigation functions:
Code View:
Scroll
/
Show All <member>.Children, <member>.Parent, <level>.Members, <hierarchy>.members, Descendants and Ancestors Let’s use our tree of members of the Stores hierarchy to see how you can use the .Children function. In this example, we call this function on the member USA: SELECT [USA].Children ON COLUMNS FROM [Warehouse and Sales] It returns a set that contains all the children of the member USA—that is, the states—as shown in Figure 10.12. Figure 10.12. The function .Children produces a set of members that are children of the current member.
The Descendants function is a little bit more complex, but it is more flexible and easier to use. We can call it to get the children of the children of members all the way down to the bottom of the hierarchy (leaves). For example, if we need to analyze the sales of the stores located in different U.S. cities, we would write the following query: SELECT DESCENDANTS([USA],[Store City]) ON COLUMNS FROM [Warehouse and Sales] This query returns the set shown in Figure 10.13. Figure 10.13. The function Descendants returns a set of members that are descendants of the member on a particular level of the hierarchy.To view the leaf members that are descendants of the member USA, we would write the following query:
Code View:
Scroll
/
Show All SELECT DESCENDANTS([Store].[Stores].[Store Country].[USA], , LEAVES) on COLUMNS FROM [Warehouse and Sales] That query returns the set of leaf members shown in Figure 10.14. Figure 10.14. Using LEAVES keyword in descendants function.MDX supports many more functions that fall under the category of hierarchy navigation functions, such as .FirstChild and .LastChild, functions for operating on siblings, and so on. We don’t discuss all MDX functions here (if we did, this book would be far too heavy to read), but you can find the syntax of those functions in Books Online. The Function for Filtering SetsTo solve a business problem, you often need to extract from the set tuples that meet certain criteria. For this, you can use the MDX function Filter. Filter takes two parameters: a set and an MDX expression that evaluates to a Boolean. A Filter function evaluates the Boolean expression against each tuple in the set and produces a set that contains tuples from the original set for which the Boolean expression evaluated to true. For example, if we want to see the stores where the sales in 1998 dropped compared to 1997, we write the following expression:
Code View:
Scroll
/
Show All Filter( [Store].[Stores].[Store].members, ( [Unit Sales], [1998]) < ( [Unit Sales], [1997])). Or, we can put this expression into a SELECT statement and execute it in the SSMS. Figure 10.15 shows the results. Figure 10.15. The Filter function returns the set of stores that have been less profitable in 1998 than in 1997.
SELECT
Filter([Store].[Stores].[Store].members,
([Unit Sales],[1998]) < ([Unit Sales],[1997])) ON COLUMNS,
{[1997],[1998]} ON ROWS FROM [Warehouse and Sales] WHERE [Unit Sales]In MDX, it is not only the cell values that are calculated in the query context; all MDX expressions are calculated in the query context, too. To execute a Filter function, we have to evaluate a filter expression: ([Unit Sales],[1998]) < ([Unit Sales],[1997]). This expression contains only attributes from the Measure and Time dimensions. All the other attributes are obtained by the steps described in the section “Query Execution Context.” Analysis Services first applies the default members of all the attributes to the current coordinate, and then it overwrites the attributes referenced in the WHERE clause. Then Analysis Services overwrites the attributes from the expression and, finally, it overwrites the attributes in the filtering set. Let’s use another example to show which attributes are used during different stages of Filter function execution. Let’s assume that we need to analyze the sales by store in the year 1997, and filter out the stores that sold more than 1,000 items, as in Listing 10.6. Listing 10.6. A Filter Expression Affects the Current Coordinate
In this query, we filter the stores by the measure Unit Sales. Our FoodMart stores were first created in the United States; so, in 1997, all sales occurred | |||||||||||||||||||||||||||||||||||||||||||
手机版|小黑屋|BC Morning Website ( Best Deal Inc. 001 )
GMT-8, 2025-12-13 17:57 , Processed in 0.016741 second(s), 16 queries .
Supported by Best Deal Online X3.5
© 2001-2025 Discuz! Team.