12013Sep

Q-Tip #6 – Those Tricky $-Sign Expansions

This blog post was inspired by an excellent lecture on Advanced Scripting that Rob Wunderlich is delivering at the Masters Summit for QlikView – the only deep-down technical event for advanced QlikView developers. Please check out our upcoming events in Europe.

… The longer I work with QlikView, the more fascinated I get with its almost endless ability to add more and more flexibility to every aspect of our development. Many QlikView features contribute to the overall degree of flexibility, but perhaps the most flexible of all are so-called $-sign expansions.

Let’s cover the basics, to level the field for those of us who are new to QlikView. $-sign expansion is a syntax form that allows a user to evaluate a variable, or an expression, and to replace the name or the expression with its result. The following is the list of some possible examples. While it’s not a complete list, it represents enough for the purpose of this analysis:

$-sign expansions using a variable:

When $-sign expansion contains a variable, it will return the value of the variable at the time of the call. Changing the value of the variable can help make your application more flexible:

         // In this example, the value is passed to the Database as a hard-coded value:
         let var1 = 1000;
         SQL SELECT * FROM Sales Where SalesAmount >= $(var1);  
     
         // In this example, the chart expression is calculated dynamically, based on the value of the variable
         SET var2 = ‘sum(Sales)’;
         Chart Expression:                $(var2)     
 

$-sign expansions using formulas:

When $-sign expansion contains a formula, the formula will get evaluated at the time of the call:

          // In this example, the aggregation function is built based on the formula prior to being evaluated:
sum($(=’Expression’&vExprNumber))

// In this example, the formula is used as a part of a Set Analysis expression:
sum( {<Date={“>=$(=Date(Today()-7))”}>} Sales)

$-sign expansion with parameters:

$-sign expansion can contain a formula with one or more parameters that are specified as $1, $2, $3:

// In this example, the formula can be modified dynamically based on the parameter:
SET vExpr = ‘sum(Expression’ & $1 & ‘)’;
Chart Expression: $(=$(vExpr (1)))    – is equivalent to “sum(Expression1)”
Chart Expression: $(=$(vExpr (2)))    – is equivalent to “sum(Expression2)”
 

Let’s Compare!

OK, now we leveled the field, more or less, and we all understand how we can make things more flexible using $-sign expansions. Now, let’s find out if one way is better than the other.

The question I had in my mind when I started was this: if I have an expression that I’d like to modify dynamically based on user input, what syntax should I use, out of many available options? I’m certainly adding more calculations when I’m replacing a “static” formula such as sum(Sales) with a dynamic calculation that needs to be evaluated every time before it can be calculated. Is the overhead significant? Is one form of $-sign expansion better than the other?

For starters, we need a “straw-man” data model that can provide enough material for the analysis. I picked the “lazy” route and I used the QlikView hidden “test script” that you can invoke by pressing Ctrl-Q-Q. Open a new document, press Ctrl-Q-Q and watch the magic happen in front of you.  QlikView will generate a test script that creates a simple yet versatile data model, with enough Dimensions and Expressions to play with. For our purpose, we need a relatively large dataset, in order to see performance implications, and for that I increase the number of iterations from 1,000 to 10,000,000. If your hardware can’t handle such a large file, you can stop at any number that you are comfortable with.

If you’d like to follow my story in QlikView, go ahead and Reload your script, and examine the test Data model. The script should generate approx. 20 millions of rows, with approximately 10 millions of distinct TransID. You will see 3 different fields that can be used for calculations – Expression1, Expression2 and Expression3.

Now, let’s formulate our goal. We’d like to let our users pick what field is going to be aggregated – Expresion1, Expression2 or Expression3. We can do it in a variety of methods, and our goal is to find out what syntax is the best in terms of optimizing performance and minimizing the overhead.

Our base line is the “static” expression sum(Expression1). We’ll compare its performance with the following options:

1. Using variables with the static text. The variables can get their values assigned as a result of an Action, triggered by the user:

vExpr1:                    ‘Expression1’;
Chart Expression:  sum($(vExpr1))

2. Using variables with the text that’s calculated:

vExprNumber        1;
vExpr2:                   =‘Expression’ & vExpr1;
Chart Expression: sum($(vExpr2))
 

3. Using variables with the text that’s calculated when called:

vExprNumber        1;
vExpr3:                   = ‘Expression’ & vExpr1;
Chart Expression: sum($(=$(vExpr3)))
 

4. Using $-sign expansions with parameters:

vExpr4                     ‘Expression’ & $1;
Chart Expression:  sum($(=$(vExpr4(1))))
 

5. The whole expression, including the aggregation function, is presented as a variable:

vExpr5:                    ‘sum(Expression1)’;
Chart Expression:  $(vExpr5)
 

6. The whole expression is calculated as a formula with a parameter:

vExpr6:                    ‘Expression’ & $1;
Chart Expression:  $(=$(vExpr6(1)))
 

7. Using a formula calculated within a $-sign expansion:

Chart Expression:   sum($(=’Expression’&vExprNumber))

Surely there are other ways to calculate a dynamic expression, but those 7 options just about cover all the angles. The biggest question is – how are they going to perform on a large data set? Will any of them need to be evaluated for each detailed row?

So, for the sake of this experiment, we created a Straight Table with TransID as a dimension (10 mln distinct values will surely make this chart heavy enough) and a simple expression sum(Expression1). In our environment, the chart took approximately 10 sec. to render.

Then, we created 7 more charts, each one with a different form of calculating the same expression. We placed each chart on a separate sheet, to segregate their calculation. Now, the fun part begins… Since QlikView caches results for identical calculations, most of the time the charts show no calculation time, which means that a similar calculation was already performed earlier. In order to test performance of each function, we had to save the document on every one of the individual sheets, then close QlikView and reopen the document on that same sheet, to force the specific chart to calculate.

Here are the results of the tests:

  1. All 8 charts showed about the same calculation time, approximately 10 sec. The syntax of the dynamically calculated expressions didn’t make any difference in terms of performance.
  2. Surprisingly, all 7 calculated expressions were treated by QlikView as the same expression – once either one of the 7 charts was calculated, the other 6 were rendered using cached results.
  3. Even more surprisingly, the “static” chart wasn’t considered the same – it had to be rendered separately, without using cashed results.

Conclusion:

Despite the visible differences in various syntax forms of $-sign expansions, the end results appear to be identical in terms of performance. None of the dynamic expressions reviewed in our analysis, caused any visible performance overhead. More so, QlikView considered all the expressions, calculated in 7 different ways, as the same expression, thus allowing them to share the same cached results. So, pick your favorite syntax and use it for your dynamic calculations.

Bonus Material: How Variables Can Spice Up Your Set Analysis

This blog post was nearly ready when a question on QlikCommunity triggered a thought that I felt compelled to share…

Let’s assume that our Set Analysis conditions need to change based on the nature of current selections. In the specific question that was asked, the requirement was to show the sum of values by Month, while only selecting the last week (date?) of the Month if no week selection was made, or to show the latest of the selected weeks.

This kind of requirement is typical for situations when the data is stored in a number of snapshots (such as weekly inventory balances, or weekly snapshots of orders), and it’s impossible to sum-up multiple snapshots. Only one, typically the latest, snapshot needs to be available for the aggregation.

So, let’s examine this requirement in parts. When no Week selections are made, then the latest Week for each month needs to be selected, and Month is one of the chart dimensions. Since we all know that Set Analysis can’t be sensitive to the Dimension Values (because it’s calculated only once per chart), there is no direct way of constructing a Set Analysis condition that allows selecting the last available week for each month. For this purpose, we usually recommend to create an extra Flag field in the data, signifying the last Week for each Month. Here is a simple example of such a script:

Calendar:
load
Date,
MonthStart(Date) as Month,
Week(Date) as Week
;
load
date(Today() – 730 + RecNo()) as Date
autogenerate 730
;

left join (Calendar)
load
Month,
Max(Week) as Week,
1 as LastWeekFlag
resident
Calendar
group by
Month
;

 With the Flag field like this, the Set Analysis condition becomes very simple:

LastFieldFlag = {1}

This would’ve been the end of the story, if not for the second part of the requirement. If the user made any selections in the field Week, then the latest of the available (selected) weeks should be used instead. This condition would be rather different:

Week = {$(=max(Week))}

Now, how do we put the choice of those two Set Analysis conditions into a single Expression? This is where our friend Variable comes to the rescue. We will create a variable that calculates the correct filter depending on the current selections made in the field Week. If any selections were made, then the second filter will be used, otherwise, the first filter will replace it:

We will create a new variable, called vWeekFilter, and we will assign the following value to the variable:

‘=if(GetSelectedCount(Week) > 0, ‘Week = {$(=max(Week))}’, ‘LastWeekFlag={1}’)‘

For simplicity, we can just assign the value directly in the UI (not in the script). One day, we will discuss the correct syntax of assigning those expressions in the load script.

Now, with this variable being responsible for selecting the relevant filter, our Set Analysis Expression becomes very simple:

Sum( {<$(vWeekFilter)>} Value)

Now, this versatility made even me step back and wonder in awe. I knew we can configure Set Analysis with variables, and I knew that we can calculate conditional expressions. But the ability to replace one filter with another, based on User selections – that’s not something I’d take for granted. This is truly awesome!

There is a lot more to be said about QlikView Variables and all the versatile things that can be done with these wonderful creatures. Please make sure to check out our agenda at the upcoming Masters Summit for QlikView, which is coming to Europe this fall.

Leave a Reply

Your email address will not be published. Required fields are marked *