Hi everyone – QlikView and Qlik Sense developers, beginners and experts, those who usually ask questions on Qlik Community and those who usually answer them. Today I’d like to talk about one of the most mysterious issues of all (at least, this is how it looks like on the Qlik Community forums) – using the AGGR() function in combination with Synthetic Dimensions, generated by functions such as ValueList() and ValueLoop().
Summing up the questions and the answers, here is what the community knows about the issue:
Let’s try to demystify the issue, explain why it doesn’t work and suggest some alternative solutions.
I can almost hear you saying: “Duh, everybody knows that!” But wait, let’s explain why. In order to do that, let’s demonstrate the issue on a simplified example. Let’s assume that we are comparing Sales to Forecast by Customer. For this purpose, let’s generate a sample database of 1000 customers with random Sales numbers and random Forecast numbers, but calculate the numbers in such a way that approximately 80% of customers exceeded their forecasts:
Sales: LOAD *, round(IF(Rand()>0.8, Sales*(1+rand()), Sales*(1-rand()))) as Forecast ; LOAD RowNo() as CustomerID, round(rand()*1000) as Sales autogenerate 1000 ;
Now, let’s pretend that you need to show a bar chart, comparing the number of customers that exceeded forecast, versus those that are below forecast, and those that are approximately at forecast.
The most common approach in this case is creating a Synthetic Dimension with the three desired “buckets” and then using the Synthetic Dimension in the Measure (Expression) to determine what calculation to use.
Those of you who have read my book QlikView Your Business, will easily understand the following calculations. For the rest of you, the explanations will follow.
First, we’ll create a variable vColumns and set it to hold the list of the three values: ‘Below Forecast’, ‘At Forecast’, ‘Above Forecast’.
Next, we’ll create a Bar Chart (you can do it in QlikView or in Qlik Sense; my example is done in Qlik Sense) and create a Calculated Dimension, using the following formula:
=ValueList($(vColumns))
Then, the Measure (Expression) should look like the following (we made an assumption here that Sales within 5% above or below Forecast are considered “at forecast”):
Pick(Match(ValueList($(vColumns)), $(vColumns)) ,
count( aggr( if(sum(Sales) <= sum(Forecast)*0.95, 1) , CustomerID) ) , count( aggr( if(sum(Sales) > sum(Forecast)*0.95 and sum(Sales) <= sum(Forecast)*1.05, 1) , CustomerID) ) , count( aggr( if(sum(Sales) > sum(Forecast)*1.05, 1) , CustomerID) ) )
So, let’s explain what’s going on here. We use the variable vColumns to store the values of the Synthetic Dimensions because the same list of values needs to be used numerous times (three times so far, and even more if you account for the conditional color expression). The Match() function compares the current value of the calculated dimension to the list of the same three values and returns a number – 1, 2, or 3. The Pick() function applies the same number to the list of three expressions, thus determining what needs to be calculated. This way of picking the relevant calculation is essentially the same as the familiar set of nested IF() conditions, but it looks a bit cleaner.
So, once everything is in place, we are ready to see our 3 bars, but… The chart only shows one bar. The other two bars are either missing or showing zeros. It looks like only the first expression is being calculated, and the rest never get evaluated at all.
… and this is when most developers turn to Qlik Community to get some answers to the simple question – why wouldn’t it work? We can test the AGGR() calculation outside of the chart, and it works. We can test any other formula that doesn’t contain AGGR() within our chart, and it works. But the combination of the two wouldn’t work. Why???
The reason can be explained in two ways – 1) because of the so called “grain mismatch” (in my book, I call it “The Third Rule of AGGR()” – assuming that you know the first two), or 2) because of the DISTINCT nature of AGGR().
“The Third Rule of AGGR()” claims that the set of the AGGR() dimensions needs to be at least as granular, or more granular than the Chart dimensions. In other words, in a Chart by Product, the AGGR() should at least have Product as a dimension, or have a dimension that is more granular than Product. In our example, the chart has a Calculated Dimension that is not one of the dimensions of the AGGR() function, therefore the dimensions of AGGR() are NOT more granular than the chart dimensions, and that is causing the AGGR() to “misbehave”.
The second explanation is perhaps more specific and more palatable. Being DISTINCT by default, the AGGR() function produces a single aggregated value for each distinct combination of its dimensions – in our case, one number per CustomerID. The chart with the Synthetic Dimension technically requires three values – one for each column.
The fact that each one of the three AGGR() functions in our formula will only be used for only one of the columns, is essentially ignored by the engine. The latter is causing most of the confusion, because based on common sense logic, the expression should work – for each one of the columns, the selected AGGR() function produces a single result. Apparently it’s not how the function works. The expression is being evaluated as a whole in the context of the Synthetic Dimension, and the result is just that – only one bar is calculated.
Now that we understand the underlying reasons for the single column in our chart, we can easily explain why NODISTINCT should solve the problem (at least in most cases). By definition, the AGGR(NODISTINCT …) is calculating the aggregation at each detailed row of the underlying data, and it is capable of returning multiple results for each combination of its dimensional values.
Depending on the nature of your AGGR() calculation, adding the NODISTINCT prefix may or may not produce the desired results. In case it does, it will also solve the problem with the required multiple values. If your calculation remains valid with the NODISTINCT prefix, then the AGGR() can supply enough values for any number of ValueList() columns, and the problem is solved!
Now, what can you do if NODISTINCT is not a viable choice? What else can solve the problem? Quite a few possibilities, actually:
In QlikView, many Synthetic Dimensions can be replaced by a set of multiple Expressions. In our example, a Bar Chart with no Dimensions and three individual Expressions could do the job just fine. The same won’t work on Qlik Sense because every Chart in Qlik Sense is required to have a Dimension (I’m not judging…)
Every Advanced Search condition in Set Analysis contains an implied form of AGGR(). For example, the following condition:
CustomerID={“=<any logical condition about Customers>”}
will get evaluated for each Customer and as a result only the relevant customers will get aggregated. With this understanding in mind, our count of customers that have Sales below, at, or above Forecast, doesn’t necessarily require the use of AGGR. Instead, a simple aggregation with Advanced Search conditions would suffice:
Pick(Match(ValueList($(vColumns)), $(vColumns)) ,
count( distinct {<CustomerID={"=sum(Sales)<=sum(Forecast)*0.95"}>} CustomerID)
, count(distinct {<CustomerID={"=sum(Sales)>sum(Forecast)*0.95 and sum(Sales)<=sum(Forecast)*1.05"}>} CustomerID)
, count( distinct {<CustomerID={"=sum(Sales)>sum(Forecast)*1.05"}>} CustomerID) )
This expressions is slightly simpler than the previous, and it solves the problem by eliminating the need in AGGR().
Another viable approach is to eliminate the need in a synthetic Value List by calculating the same values using the same function AGGR(), this time in a Dimension instead of the Measure. In other words, instead of declaring the three buckets using ValueList() and then calculating what customers belong in each bucket, we could simply calculate the appropriate bucket per Customer and use that as a dimension.
Those of you who attended the Masters Summit for Qlik, probably know what comes next. For the rest of you, the explanations will follow. Our new Calculated Dimension will look like this:
aggr( if(sum(Sales) <= sum(Forecast)*0.95, dual('Below Budget', -1), if(sum(Sales) <= sum(Forecast)*1.05, dual('At Budget', 0), dual('Above Budget', 1))) , CustomerID)
This calculated dimension assigns one of the three “buckets” to each CustomerID, thus associating the bucket value with the CustomerID. We use the Dual() function to ensure the specific sorting order for the three generated values.
Now, the Measure (Expression) becomes extremely simple:
count(distinct CustomerID)
This version of the solution appears to be the simplest of all. Perhaps the most challenging part of this solution is how to color-code the three bars in a meaningful way (red-gray-green). We will leave this problem for you to figure out as your homework.
Hopefully, we managed to demystify the issue with AGGR() and Synthetic Dimensions, explain why it doesn’t work, and offer three possible alternatives – using NODISTINCT, or eliminating AGGR() with Advanced Search, or replacing a Synthetic Dimension with an Aggregated Dimension.
If you like the depth of this conversation, please consider attending one of the upcoming sessions of the Master Summit for Qlik – this is the level of advanced training that you will receive there, tightly packed into three training days.
If you haven’t seen my recently published book QlikView Your Business, I highly recommend to check it out. The book explains in detail how to use the most advanced techniques in QlikView and Qlik Sense – Set Analysis, Advanced Aggregation, and a lot more.
Enjoy, and see you around the Community!