Ever since the introduction of Derived Fields in Qlik Sense, I was curious – what’s so special about them?
- Are they different from any other fields that are stored in your data set?
- Are they stored or calculated on the fly?
- Do they work equally well in Chart dimensions, in Filters, and in Set Analysis expressions?
- Is there any performance penalty for using Derived fields? Any benefits?
In this post, I will try to lift the curtain, at least partially, and reveal the mystery – for myself, and hopefully, for you.
Leveling the Field
Just to ensure that we are on the same page, let’s describe what Derived Fields are. Two new script commands were added in Qlik Sense, with the goal of creating a special kind of fields (derived fields) – DECLARE FIELD DEFINITION and DERIVE FIELDS FROM FIELDS.
Theoretically, any fields can be created as Derived fields, but the feature was primarily built for Calendar fields. Now, instead of scripting one or more Calendar tables, we can simply “declare” the fields and then “derive” one or more instances of each field from one or more date fields. The easiest way to get introduced to Derived fields is to load any data table that includes a Date field, using the Data Manager. At the bottom of the “Auto-generated section”, you will see the automatically generated Calendar:
[autoCalendar]: DECLARE FIELD DEFINITION Tagged ('$date') FIELDS Dual(Year($1), YearStart($1)) AS [Year] Tagged ('$axis', '$year'), Dual('Q'&Num(Ceil(Num(Month($1))/3)),Num(Ceil(NUM(Month($1))/3),00)) AS [Quarter] Tagged ('$quarter', '$cyclic'), Dual(Year($1)&'-Q'&Num(Ceil(Num(Month($1))/3)),QuarterStart($1)) AS [YearQuarter] Tagged ('$yearquarter', '$qualified'), Dual('Q'&Num(Ceil(Num(Month($1))/3)),QuarterStart($1)) AS [_YearQuarter] Tagged ('$yearquarter', '$hidden', '$simplified'), Month($1) AS [Month] Tagged ('$month', '$cyclic'), Dual(Year($1)&'-'&Month($1), monthstart($1)) AS [YearMonth] Tagged ('$axis', '$yearmonth', '$qualified'), Dual(Month($1), monthstart($1)) AS [_YearMonth] Tagged ('$axis', '$yearmonth', '$simplified', '$hidden'), Dual('W'&Num(Week($1),00), Num(Week($1),00)) AS [Week] Tagged ('$weeknumber', '$cyclic'), Date(Floor($1)) AS [Date] Tagged ('$axis', '$date', '$qualified'), Date(Floor($1), 'D') AS [_Date] Tagged ('$axis', '$date', '$hidden', '$simplified') ; DERIVE FIELDS FROM FIELDS [Date] USING [autoCalendar] ;
Notice that all fields in this declaration are defined as Dual values, even the Year, which might cause some confusion down the road, for example in Set Analysis. The particular formatting of the fields, as well as the new strange-looking tags, were added for the use in so called “Time-aware charts” which are described below.
Setting Up the Testing Lab
Next, let’s define our testing app. In order to test performance impact, I generated a long Fact table (120 million rows of simplified Sales data) with two dimensions – Customers and Products. For the purpose of comparing, one App uses the “conventional” Master Calendar with all the fields generated and stored in the data set, while the other App uses the Derived calendar. Here comes the first visible difference – Derived Fields are not displayed in the Data model viewer:
Let’s compare the other commonalities and the differences.
Are derived fields stored or calculated?
It’s not too obvious from the Help description whether Derived fields are stored or calculated. At first, I was under the impression that Derived fields are not “real fields” and that they might be calculated. My comparison made me believe that Derived fields are stored just like any other fields. You may ask what led me to this conclusion. The file sizes of the two apps. After loading the two apps with data, I found that the two apps maintain exactly the same file size:
This was surprising to me because the two calendars had slightly different ways of calculating their fields – and yet, the size is precisely the same, which is remarkable! I even checked the exact byte count for both – it is the same.
So, based on this discovery, I concluded that Derived fields must be stored in the data set, just like any other fields, they just aren’t visible in the Data model viewer. With that conclusion in mind, it is fair to expect that Derived fields perform equally well in Chart Dimensions, Filters, and in Set Analysis conditions.
Does everything work as expected?
Well, every new feature has to hide a bug or two… In my testing (I used Qlik Sense Desktop 3.1 SR3), adding a Derived field as a Chart dimension, a Filter pane field, or even opening a Selection tool with Derived fields, caused a very long cycle of the “spinning circle” with no visible results. At first, I didn’t have enough patience to wait for the spinning to end. I assumed that the spinning was infinite and I canceled the action every time. After the canceling, Qlik Sense behaved in a strange way, so I was forced to close Qlik Sense Desktop and to start all over again.
Once, I let the circle spin while typing this article, and to my huge surprise, the chart got rendered after about 4 minutes. On other occasions, the same objects behaved a lot better, but I suspect that I was getting cached results.
After trying to add Derived fields to various lists – as a Dimension, as a Field in the Filter Pane, or in the Selection tool, I could see that the Derived fields require a very long time to be added to any of the lists. Below is the typical picture from the Selection tool:
In comparison, conventional Calendar fields showed stellar performance – all charts were calculated instantly (even with 120 million rows).
This performance difference makes me wonder – is this a bug that should be fixed, or is it an inherent trait of Derived fields? Out of curiosity, I reduced my Facts table down to 1 million rows. The chart was calculated instantly. I raised the size of the Facts table up to 10 million rows, and the chart took about a minute to calculate. It looks like performance might be an issue there. I also tested the same behavior on an earlier SR of Qlik Sense Desktop 3.1, and I got consistent results.
So, maybe Derived fields ARE calculated and not stored, after all?
How about them Time-aware charts?
The main reason for creating Derived fields in the first place was the so-called Time Aware Chart – the chart that should be “smart” enough to show you appropriate labels depending on the amount of data and the available screen space. After playing with both the “time-aware” and “not time-aware” charts, I have to admit that the differences, while they exist, are quite marginal. For example, compare the two charts below:
Both charts use Date as a Dimension – one Date is a conventional stored field (on the right) and the other Date.Date is a derived field (on the left). Both charts show a lot of “noise” from daily fluctuations of Sales. My hope for the Time-Aware chart was perhaps too lofty – I was hoping to see the chart pick the appropriate level of “resolution” based on the available amount of data. In this case, a Monthly chart could be more appropriate.
Practically, the only difference between the two charts is in the labels – the Time-Aware chart shows Years and Months, while the conventional chart on the right shows Dates with appropriate intervals. When both charts are resized down to a lower width, both charts show only Year labels (believe it or not, even the conventional chart is smart enough for that!)
Having played with all these features, what can we conclude? Personally, I think that Derived Fields and Time-Aware Charts are certainly good beginnings, steps in the right direction. At the moment, however, I find them a bit… well, raw. The scalability doesn’t seem to be there, and the benefits of time-aware charts don’t seem to be significant enough to put up with the quirky field names and slow performance. For the moment, I’ll stick to my conventional Master Calendar, but I will keep Derived fields and Time-Aware charts on my radar screen, expecting to see better performance and smarter time-awareness in the future.
If you enjoyed the depth of analysis in this post, please check out my book QlikView Your Business and consider if the Masters Summit for Qlik agenda is right for you. At the summit, we teach the most advanced and the most mysterious topics of Qlik application development – Data Modeling, Scripting, Advanced Aggregation, Set Analysis, and Visualizations. This year, we are substantially renewing our content and bringing in a lot of new topics, especially for Qlik Sense. Check out our agenda and come see us this April in Munich, Germany!