MDX for Everyone (2)

demo · 发表于 2011-5-3 10:10:25

本帖最后由 demo 于 2011-5-4 02:22 编辑

NULL members and EMPTY cells

Let’s recall the example from the previous chapter.

WITH MEMBER Measures.[Sales Growth] AS '(Sales) – (Sales, Time.PrevMember)'
SELECT
{ [Sales], [Sales Growth] } ON COLUMNS,
Month.MEMBERS ON ROWS
FROM Sales

We understand now, that this query will calculate for everymonth it’s growth in sales compared to the previous month. I.e. for Septemberwe will have difference in sales between September and August. However, one maywonder what happens with the first month? There is no previous month for it inour cube (that’s why we call it first month). So what’s gonna happen with MDX?Is it gonna return an error? Well, even though it looks like reasonable thingfrom the mathematical point of view to return an error in such a case, it wouldbe complete disaster from pragmatic, business point of view. Let’s try tounderstand, what system analyst expects to see for Sales Growth in first month.Well, in many the cases he just doesn’t really care. He want to see how thebusiness is doing over the time, it is not important what to do for the firstone. For sure, he doesn’t want the query to have an error because of the firstmonth. One of the most common things is to assume that Sales Growth is equal toSales in first month, because Sales were non-existent (i.e. zero) before thefirst month. MDX designers had experience in real world OLAP implementations,and they were aware of this problem of non-existing members. Therefore,rather then return an error, MDX has notion of NULL members whichrepresents non-existing members. And here is very important definition: Bynon-existing members we mean here multidimensional coordinates have that are outof scope of multidimensional subspace defined by the cube. I.e. invalid membername, such as [Dummy] in level of months doesn’t make it NULL member. If you’lluse such a name in MDX query, it will fail with error. However if by navigatingin multidimensional space you suddenly run out of boundaries of the cube (likesaying [January].PrevMember), you will get NULL member. The semantic of NULLmembers is following:

1. Whenthe cell contains as one of its coordinates NULL member, the numeric value ofthe cell will be zero. Therefore (Sales, [January].PrevMember) = 0

The reasoning behind this is very simple: If the cell hasnon-existing coordinates, this cell doesn’t exist in this cube. In the businesslanguage it means that there were no sale at that moment of time, at this placeof this product etc. And if there were no sale, than we sold zero units,received zero dollars etc. Of course, if MDX had support for VARIANTs, it wouldbe logical to define the value of NULL member as VARIANT’s NULL value. I heardrumors, that the next version of MDX will support variants.

2. Anymember function applied to NULL returns NULL.

This means, that NULLs don’t remember their history. I.e.one might expect, that [January].PrevMember.NextMember would return [January]back. No. Once NULL – forever NULL. If you left the cube’s subspace and enteredthe outer space, you are lost there. There is no way back to the cube. For myopinion, this is not perfect behavior, but given the difficulty in implementing“history” of  NULL members (think about [January].PrevMember.PrevMember.Parent.FirstChild.NextMember.NextMember),and little business value associated with implementing such a history, wedecided about the described above behavior.

3. When NULL member is included in set, it is just ignored. Therefore

{[September], [January].PrevMember} = {[September]}

This rule seems to be obvious, but it has strong motivationbehind it. Jumping a little bit ahead, suppose you were to calculate movingaverage of last three months. The straight-forward approach is to define

([Time] + [Time].PrevMember + [Time].PrevMember.PrevMember) / 3

(remember, that CurrentMember is default property of dimension !).

The formula looks OK, but think what will happen for Januaryand February (first two months). Suppose, that sales in January were 90 and inFebruary 120. Then, the moving average for January will be (90+0+0)/3 = 30instead of 90/1 = 90, and for February it will be (120+90+0)/3 = 70 instead of(120+90)/2 = 105. To solve this problem, MDX has statistical function called AVG, which calculates the average of the members in the set. Then you can apply

AVG( {[Time], [Time].PrevMember, [Time].PrevMember.PrevMember},  Sales)

Since sets automatically eliminate NULL members, AVG will dothe right thing, because it asks set about count of elements in it.

Of course, PrevMember function is not the only one which canresult in NULL member. Asking child (first or last) from member on the lowestlevel, asking Parent from member on highest level, doing NextMember on lastmember in the level – all those are example of NULL members.

The notion of NULL members is tightly connected to thenotion of EMPTY cells. As a matter of fact, we already talked aboutEMPTY cells, when we talked about cell which contains NULL member at least inone of its coordinates. This cell is out of the scope of the cube, therefore itis empty, and numeric value calculated for this cell will be zero. However,EMPTY cells may occur not only when out of boundaries of the cube. One of themost broad problems in OLAP is data sparsity problem. I.e. real lifecubes tend to populate only tiny fraction ( <0.01% ) of their theoreticalvolume. More the dimensionality of the cube, more the sparsity of data. Theproblem gets worse when dimensions are not orthogonal - for example there isstrong correlation between “Customer Location” and “Store Location” dimensions,since people tend to buy where they live. But even when dimensions areorthogonal, sparsity still exist – for example “Customer” dimension, “Product”dimension and “Time” dimension, since there are not too many customers who buysomething every day (this is not true even for my wife J). And even if they do, then they don’t buy every possibleproduct ! As a result, there are a lot of cells in the cube, which don’tcontain any data, because this data just doesn’t exist. Naturally, we’ll callsuch cells EMPTY cells. The numeric value for such cells will be zero for thereasons we explained above. However, sometimes it is important to distinguish betweenthe situation where the data exist and it’s value equal to zero, and situationwhere data doesn’t exist resulting in EMPTY cell whose value again will beevaluated to zero. To accomplish that, MDX supports predicate IsEmpty, whichcan be applied to cell, and returns boolean value of TRUE or FALSE. We’ll seehow useful boolean expression can be in later chapters.

Algebra of sets

So far we talked about axes and in this chapter we are gonna talk about sets. As we will see, sets can be used in many differentfashions in MDX. For now we just assume that set is just a synonym word foraxis. As we all remember from high school’s class “Set Theory”, sets have allbunch of associated operations on them such as union, intersection, difference, cartesian composition and decomposition etc. MDX offers all these classical setfunctions. The input and/or output sets might be sets of members or sets of tuples.Main restriction on set is that all its elements have to be of the samestructure. By structure we mean the following: If set is set of members, allmembers have to come from the same dimension (even though they can be fromdifferent levels). Therefore the following set { [1997], [August] } is valid set, but { [August], [Sales] } is not valid set. If the set is set of tuples, then the dimensionality shouldbe the same and the corresponding members of the tuples have to be from thesame dimension. I.e. {([1997], [USA]), ([August], [WA]) } is valid set, but {([August], [WA]), ([1997], [USA])} is not valid, because order of dimensions in the tuple is reversed.One, who didn’t read the previous chapters may wonder why this restriction ?However, we remember, that primary use of sets is to axes in the query, and itdoesn’t make any sense at all to mix dimensions on the axis. Most common OLAPanalysis is to put set from one dimension against set from another dimension.Even when two dimensions on the same axis are required, it is as we saw fornesting purposes, and tuples are used to achieve that. There is no sense to puton axis tuples with different dimensionality or with incompatible structures.

Well, let’s start looking into set functions defining theset algebra. The first two functions are pretty self-explaining:

UNION( set1, set2 )
INTERSECT( set1, set2 )

While INTERSECT is relatively rarely used function, UNION
is usually heavily used. It is common practice to combine more than twosets together, sometimes, one wants to union sets with members etc. In thesesituations UNION
becomes clumsy touse, so there is another, nire easy way to unite sets and elements. As a materof fact, we already know that way: the same way we enumerated members and tuples into set, we can also have union, i.e. the following is valid MDX:

{ set1, set2, …, setn }

More, than that. One can freely mix sets and elements in enumeration, i.e.

{ member1, set1, set2, member2,…, memberk, setn }

We’ll see soon, that still function UNION offers something, that setenumeration doesn’t offer, but in most cases enumeration is more easy andintuitive (even though, people who used to program in the languages with strongtyping, might feel uncomfortable with free mixing of scalars and arrays. But itis so convinient, so it was worth to define it this way). Let’s consider thefollowing example: we want to see on the axis results for Europe, USA, allstates in USA, all cities in state WA and for Asia (typical drilldownscenario). We can define the set as

{
[Europe],
[USA],
[USA].Children,
[WA].Children,
[Asia]
}

The equivalent with UNION
will look like

UNION(
  { [Europe], [USA] },
  UNION(
[USA.Children],
UNION(
   [WA].Children,
   { [Asia] }
)
  )
)

Some MDX implementations may also define operator + (plus)as a way to do union of sets, i.e. to allow

set1 + set2 + … + setn

This might be nice algebraic extension, but it is not partof current MDX standard. Another important set function besides union is difference.When union is often viewed as addition of two sets, difference

Duplicate members in set

If you have strong mathematical background, you should be disappointed by the name of this chapter. Indeed, mathematical definition of set doesn’t allow the set to have duplicate elements. The mathematical term for sets with duplicate elements is collection. However, the word collection is overused in programming, therefore MDX calls them sets anyway. Well, back to the subject.

Sorting and filtering
Hierarchies
Sparsity and NON EMPTY axes
Properties of calculated members
Statistical functions
Time related functions
Query defined sets
Execution control functions
Generate as a loop
DrillDown and DrillUp family of functions
Miscellaneous functions
User defined functions
Session sets and session calculated members
Plato Extensions to MDX
MDX as DDL
Appendixes

http://www.mosha.com/msolap/articles/MDXForEveryone.htm

juseliao · 发表于 2011-5-23 21:54:16

记得从前刚学会骑自行车时，因为还不熟练，所以并不会转弯、刹车之类的技术！一天，在家闲着无聊，便骑着车出去溜达溜达！和父亲一起，当我们到达一个大坡时（下坡）我不会刹车，而父亲又把我护在里侧，只见这时，我前方出现了一个男人面对一棵大树正撅着屁股给自行车打气。我的自行车正对他的屁股~~~~于是，在我紧急的思考中，自行车已经快速的到达了他的屁股前，因为不会刹车，所以~~~男人被我的车篮狠狠的往前一撞，差点没撞到树上去

如何丰胸最有效，如何有效丰胸，怎么样才能丰胸，怎么样丰胸

juseliao · 发表于 2011-5-24 14:01:38

如此好帖不顶怎么行呢？哈哈

三仙矛草,三仙矛草,三仙矛草

账号		自动登录	找回密码
密码			注册

MDX for Everyone (2)

浏览过的版块