It’s been a while since I had a chance to write a blog article… I can’t complain though, because preparing and delivering the Masters Summit for Qlik in Munich was a lot of fun. I also took a few days off and jumped to Prague, which was awesome! Interesting discovery (at least to myself) – in Prague, beer is cheaper than water and half the price of a double-espresso. Isn’t that amazing? I was so impressed that I convinced the rest of the Masters Summit team to have our next European event in Prague – hello again, cheap bear and hearty Bavarian cuisine! But that, of course, after we sample fine lobsters at the Masters Summit for Qlik in Boston this October.
Anyway, between the Masters Summit events, I work at my day job as a QlikView (Qlik Sense) consultant, and that job has kept me entertained for the last 15 years. Once in a while I get amused by something and that’s when I decide to write another blog article. So, here goes…
Once upon a time, there were two dummy companies, Company 1 and Company 2. Even though the two dummies were twin sisters, they disliked each other so much that they kept all of their stuff separately – separate closets, separate kitchens, and separate data files. And, since they were twins, their data files also looked identical. And the two dummy sister companies had a Father, and the Father demanded to see their sales on a single dashboard. For that purpose, they hired a Qlik Sense consultant to bring their data together and show it on one screen.
The consultant was quite an expert, he even attended the Masters Summit, and he knew not only how to spell Qlik, but he could even tell the difference between Link Tables and Concatenated Facts. He even earned the status of a Wizard on Qlik Community!
So, the Wizard looked at the twin sisters and their twin data files – “Sales 1.xslx” and “Sales 2.xlsx”, as well as “Customer 1.xslx” and “Customer 2.xslx”, and he decided to load them together using a simple wildcard load:
Sales: LOAD Company, Customer, Product, Sales FROM [lib://Q-Tips/17 - JOIN and Wildcard load\Sales*.xlsx] (ooxml, embedded labels, table is Sheet1); Customers: LOAD Company, Customer, "Customer Name" FROM [lib://Q-Tips/17 - JOIN and Wildcard load\Customers*.xlsx] (ooxml, embedded labels, table is Sheet1);
Nice and simple! After loading the data, the Wizard created a Sales sheet with all the data in tables, and he showed his work to the dummy sisters:
The two dummy sisters were so happy – Company 1 saw her customers A, B, C, D, … I and Company 2 saw her customers AA, BB, CC, DD, … II and that filled their hearts with joy, as it happens often when people see their data for the first time. This could have been the happy end, but then there wouldn’t be much of a story to tell on such a respectable blog site…
At this happy moment, Company 1, who was known as a stickler, asked the Wizard: “Why don’t you show me the data model? I want my data to look all pretty and nice, you know?”. So, they opened the Data model viewer, and naturally they found an ugly Synthetic key right in the middle of it:
So, naturally Company 1 couldn’t pass up an opportunity to throw a fit: “No stinking Synthetic keys in my data models!”, while Company 2 smiled and said “I don’t know, I kinda like it the way it is…”. The Wizard couldn’t stand to see the sisters argue, so he suggested a truly wise solution – “Look, I’ll just join the two tables, and the ugly Synthetic key will go away. Nothing to worry about, it will be fixed in a moment.” And he added a simple LEFT JOIN in front of the second load. Piece of cake!
Sales: LOAD Company, Customer, Product, Sales FROM [lib://Q-Tips/17 - JOIN and Wildcard load\Sales*.xlsx] (ooxml, embedded labels, table is Sheet1); //Customers: LEFT JOIN (Sales) LOAD Company, Customer, "Customer Name" FROM [lib://Q-Tips/17 - JOIN and Wildcard load\Customers*.xlsx] (ooxml, embedded labels, table is Sheet1);
After seeing the new data model, Company 1 was so happy – the ugly Synthetic key was gone, and everything looked so pretty:
But then, they opened the Sales sheet and this time Company 2 broke out into tears – all her Customer names were gone, and all the sales were associated with an ugly NULL value:
“Well, that’s a minor detail”, dismissively claimed Company 1. She was happy because her data was still there. But Company 2 insisted that she needs to see her Customers, or else…
The Wizard felt completely befuddled. “I loaded Sales from the two tables with the wildcard load” – he was mumbling to himself. “Then I loaded Customers from the two tables with the wildcard load. And then, I joined the customers to the Sales. What could possibly go wrong?”
And then, he realized what truly happened there. The wildcard load is not a single load. Instead, it’s a series of individual loads that happen one after another in a sequence. And, if a JOIN prefix is attached to such a load, the joining happens individually for every file. So, the whole process, step by step, unfolded like this:
This was quite a “gotcha” moment for the Wizard, but since he learned from the best (Masters Summit for Qlik, remember?), he quickly came up with the working solution – we will first load the multiple files using the wildcard load, and only then join the resulting table to Sales:
Sales: LOAD Company, Customer, Product, Sales FROM [lib://Q-Tips/17 - JOIN and Wildcard load\Sales*.xlsx] (ooxml, embedded labels, table is Sheet1); Customers: LOAD Company, Customer, "Customer Name" FROM [lib://Q-Tips/17 - JOIN and Wildcard load\Customers*.xlsx] (ooxml, embedded labels, table is Sheet1);
LEFT JOIN (Sales) LOAD * Resident Customers ;
drop table Customers;
And this was the happy end, boys and girls. The dummy sisters were pleased, their father paid the Wizard handsomely, and they all lived happily ever after.
That’s all for now, folks. Concatenate carefully, keep your Synthetic keys well groomed, and for goodness sake, check your calendars on October 23-25 this year – I hear there will be a lot of fun and Qlikking happening in Boston on those days. Visit the Masters Summit for Qlik website for details.
5 Comments
“Oh father, that is a splendid bedtime story, indeed! Please, oh please read us another!”
Hi Oleg –
I really enjoyed reading this post. I haven’t gone through all your blogs but do you have a preference between Wildcard load vs. a for next loop when loading multiple files? I might have read HIC saying that he prefers For next loop over wildcard load, but was curious to know your thoughts.
Best,
Sunny
Hi Sunny,
thank you for your question! Frankly, I don’t have a clear preference between the wild card load and the For Next loop, mostly because I haven’t used neither one of them in a truly large data environment. For small datasets, I believe both techniques would perform equally well. Wild card load has a clear benefit of being simpler and easier to script. We can ask Henric about his preference of using loops instead.
cheers,
Oleg Troyansky
skazo4nik par excellence
thanks , was having this issue and resolved the same way but I didnt really understand why it was happening until I read your post