Ever since the introduction of Derived Fields in Qlik Sense, I was curious – what’s so special about them?
In this post, I will try to lift the curtain, at least partially, and reveal the mystery – for myself, and hopefully, for you.
Just to ensure that we are on the same page, let’s describe what Derived Fields are. Two new script commands were added in Qlik Sense, with the goal of creating a special kind of fields (derived fields) – DECLARE FIELD DEFINITION and DERIVE FIELDS FROM FIELDS.
Theoretically, any fields can be created as Derived fields, but the feature was primarily built for Calendar fields. Now, instead of scripting one or more Calendar tables, we can simply “declare” the fields and then “derive” one or more instances of each field from one or more date fields. The easiest way to get introduced to Derived fields is to load any data table that includes a Date field, using the Data Manager. At the bottom of the “Auto-generated section”, you will see the automatically generated Calendar:
[autoCalendar]: DECLARE FIELD DEFINITION Tagged ('$date') FIELDS Dual(Year($1), YearStart($1)) AS [Year] Tagged ('$axis', '$year'), Dual('Q'&Num(Ceil(Num(Month($1))/3)),Num(Ceil(NUM(Month($1))/3),00)) AS [Quarter] Tagged ('$quarter', '$cyclic'), Dual(Year($1)&'-Q'&Num(Ceil(Num(Month($1))/3)),QuarterStart($1)) AS [YearQuarter] Tagged ('$yearquarter', '$qualified'), Dual('Q'&Num(Ceil(Num(Month($1))/3)),QuarterStart($1)) AS [_YearQuarter] Tagged ('$yearquarter', '$hidden', '$simplified'), Month($1) AS [Month] Tagged ('$month', '$cyclic'), Dual(Year($1)&'-'&Month($1), monthstart($1)) AS [YearMonth] Tagged ('$axis', '$yearmonth', '$qualified'), Dual(Month($1), monthstart($1)) AS [_YearMonth] Tagged ('$axis', '$yearmonth', '$simplified', '$hidden'), Dual('W'&Num(Week($1),00), Num(Week($1),00)) AS [Week] Tagged ('$weeknumber', '$cyclic'), Date(Floor($1)) AS [Date] Tagged ('$axis', '$date', '$qualified'), Date(Floor($1), 'D') AS [_Date] Tagged ('$axis', '$date', '$hidden', '$simplified') ; DERIVE FIELDS FROM FIELDS [Date] USING [autoCalendar] ;
Notice that all fields in this declaration are defined as Dual values, even the Year, which might cause some confusion down the road, for example in Set Analysis. The particular formatting of the fields, as well as the new strange-looking tags, were added for the use in so called “Time-aware charts” which are described below.
Next, let’s define our testing app. In order to test performance impact, I generated a long Fact table (120 million rows of simplified Sales data) with two dimensions – Customers and Products. For the purpose of comparing, one App uses the “conventional” Master Calendar with all the fields generated and stored in the data set, while the other App uses the Derived calendar. Here comes the first visible difference – Derived Fields are not displayed in the Data model viewer:
Let’s compare the other commonalities and the differences.
It’s not too obvious from the Help description whether Derived fields are stored or calculated. At first, I was under the impression that Derived fields are not “real fields” and that they might be calculated. My comparison made me believe that Derived fields are stored just like any other fields. You may ask what led me to this conclusion. The file sizes of the two apps. After loading the two apps with data, I found that the two apps maintain exactly the same file size:
This was surprising to me because the two calendars had slightly different ways of calculating their fields – and yet, the size is precisely the same, which is remarkable! I even checked the exact byte count for both – it is the same.
So, based on this discovery, I concluded that Derived fields must be stored in the data set, just like any other fields, they just aren’t visible in the Data model viewer. With that conclusion in mind, it is fair to expect that Derived fields perform equally well in Chart Dimensions, Filters, and in Set Analysis conditions.
Well, every new feature has to hide a bug or two… In my testing (I used Qlik Sense Desktop 3.1 SR3), adding a Derived field as a Chart dimension, a Filter pane field, or even opening a Selection tool with Derived fields, caused a very long cycle of the “spinning circle” with no visible results. At first, I didn’t have enough patience to wait for the spinning to end. I assumed that the spinning was infinite and I canceled the action every time. After the canceling, Qlik Sense behaved in a strange way, so I was forced to close Qlik Sense Desktop and to start all over again.
Once, I let the circle spin while typing this article, and to my huge surprise, the chart got rendered after about 4 minutes. On other occasions, the same objects behaved a lot better, but I suspect that I was getting cached results.
After trying to add Derived fields to various lists – as a Dimension, as a Field in the Filter Pane, or in the Selection tool, I could see that the Derived fields require a very long time to be added to any of the lists. Below is the typical picture from the Selection tool:
In comparison, conventional Calendar fields showed stellar performance – all charts were calculated instantly (even with 120 million rows).
This performance difference makes me wonder – is this a bug that should be fixed, or is it an inherent trait of Derived fields? Out of curiosity, I reduced my Facts table down to 1 million rows. The chart was calculated instantly. I raised the size of the Facts table up to 10 million rows, and the chart took about a minute to calculate. It looks like performance might be an issue there. I also tested the same behavior on an earlier SR of Qlik Sense Desktop 3.1, and I got consistent results.
So, maybe Derived fields ARE calculated and not stored, after all?
The main reason for creating Derived fields in the first place was the so-called Time Aware Chart – the chart that should be “smart” enough to show you appropriate labels depending on the amount of data and the available screen space. After playing with both the “time-aware” and “not time-aware” charts, I have to admit that the differences, while they exist, are quite marginal. For example, compare the two charts below:
Both charts use Date as a Dimension – one Date is a conventional stored field (on the right) and the other Date.Date is a derived field (on the left). Both charts show a lot of “noise” from daily fluctuations of Sales. My hope for the Time-Aware chart was perhaps too lofty – I was hoping to see the chart pick the appropriate level of “resolution” based on the available amount of data. In this case, a Monthly chart could be more appropriate.
Practically, the only difference between the two charts is in the labels – the Time-Aware chart shows Years and Months, while the conventional chart on the right shows Dates with appropriate intervals. When both charts are resized down to a lower width, both charts show only Year labels (believe it or not, even the conventional chart is smart enough for that!)
Having played with all these features, what can we conclude? Personally, I think that Derived Fields and Time-Aware Charts are certainly good beginnings, steps in the right direction. At the moment, however, I find them a bit… well, raw. The scalability doesn’t seem to be there, and the benefits of time-aware charts don’t seem to be significant enough to put up with the quirky field names and slow performance. For the moment, I’ll stick to my conventional Master Calendar, but I will keep Derived fields and Time-Aware charts on my radar screen, expecting to see better performance and smarter time-awareness in the future.
If you enjoyed the depth of analysis in this post, please check out my book QlikView Your Business and consider if the Masters Summit for Qlik agenda is right for you. At the summit, we teach the most advanced and the most mysterious topics of Qlik application development – Data Modeling, Scripting, Advanced Aggregation, Set Analysis, and Visualizations. This year, we are substantially renewing our content and bringing in a lot of new topics, especially for Qlik Sense. Check out our agenda and come see us this April in Munich, Germany!
5 Comments
Hi Oleg,
great post! I need to do some more investigations myself here. In large data scenarios I tend to have no calculated dimensions at all to prevent single threaded calculations which takes forever sometimes..
However, regarding time-aware charts, this can be solved with normal master calendars too when adding the same tags. So, IMHO no real need for derived fields, or did I miss something?
– Ralf
Hi Ralf,
thank you for your comments. I’ve been playing with time-aware charts, and some of the functionality cannot be achieved with “conventional” master calendars, even when adding flags.
This become even more important in version 3.2, with the introduction of additional functionality that’s only available for derived calendars – the automatically generated “Calendar Measures”.
Will need to re-test this functionality again in ver. 3.2.
thank you!
Oleg Troyansky
Hi Oleg and Ralf,
I also did some deep-dive tests with this functionality and wanted to share my experience with you related to the article above.
Regarding “So, maybe Derived fields ARE calculated and not stored, after all?”
I had contact with Qlik support in Sweden and the guy from support mentioned that those derived calendar fields are NOT stored, just calculated.
Opening an app with 160million records took 8 mins.
To tackle the issue I created a table with a single field field connected to the facts table. I used FieldValues-Function, see http://qlikviewcookbook.com/2015/05/better-calendar-scripts/) which is very quick (in comparison with Distinct).
Then I applied the derived calendar to this field and performance was good as expected, because for only some thousand records the calendar had to be derived.
Regarding “I was hoping to see the chart pick the appropriate level of “resolution” based on the available amount of data”
You can specify which resolution you want to have in the time-aware-chart. Just drag e.g. the derived “YearMonth” field to the line-chart and you´ll get rid to the (daily) “noise” in the chart and it looks perfect.
A very nice thing which my customers like is that you can use your scroll-wheel on the mouse to zoom-in and -out in the time axis.
I think this can´t be provided with the normal mastercalendar but only with derived calendar.
If you have some new findings it would be great if you could share them here.
Best,
Thomas
PS: Now I´m currently trying to make the time-aware-charts learning fiscal years … not sure if this will be possible
Hi Oleg and Ralf,
I did some research on this topic and shared my findings at
https://community.qlik.com/docs/DOC-18299
Curious about your feedback.
Best,
Thomas
This is really informative article on derived fields. Appreciate the clarity with which these concepts are brought to light.