92017May

Q-Tip #17 – The Curious Case of a Wildcard Load and a Join

It’s been a while since I had a chance to write a blog article… I can’t complain though, because preparing and delivering the Masters Summit for Qlik in Munich was a lot of fun. I also took a few days off and jumped to Prague, which was awesome! Interesting discovery (at least to myself) – in Prague, beer is cheaper than water and half the price of a double-espresso. Isn’t that amazing? I was so impressed that I convinced the rest of the Masters Summit team to have our next European event in Prague – hello again, cheap bear and hearty Bavarian cuisine! But that, of course, after we sample fine lobsters at the Masters Summit for Qlik in Boston this October.

Anyway, between the Masters Summit events, I work at my day job as a QlikView (Qlik Sense) consultant, and that job has kept me entertained for the last 15 years. Once in a while I get amused by something and that’s when I decide to write another blog article. So, here goes…

Once upon a time, there were two dummy companies, Company 1 and Company 2. Even though the two dummies were twin sisters, they disliked each other so much that they kept all of their stuff separately – separate closets, separate kitchens, and separate data files. And, since they were twins, their data files also looked identical. And the two dummy sister companies had a Father, and the Father demanded to see their sales on a single dashboard. For that purpose, they hired a Qlik Sense consultant to bring their data together and show it on one screen.

The consultant was quite an expert, he even attended the Masters Summit, and he knew not only how to spell Qlik, but he could even tell the difference between Link Tables and Concatenated Facts. He even earned the status of a Wizard on Qlik Community!

So, the Wizard looked at the twin sisters and their twin data files – “Sales 1.xslx” and “Sales 2.xlsx”, as well as “Customer 1.xslx” and “Customer 2.xslx”, and he decided to load them together using a simple wildcard load:

Sales:
LOAD
     Company,
     Customer,
     Product,
     Sales

FROM [lib://Q-Tips/17 - JOIN and Wildcard load\Sales*.xlsx]
(ooxml, embedded labels, table is Sheet1);

Customers:
LOAD
     Company,
     Customer,
     "Customer Name"

FROM [lib://Q-Tips/17 - JOIN and Wildcard load\Customers*.xlsx]
(ooxml, embedded labels, table is Sheet1);

Nice and simple! After loading the data, the Wizard created a Sales sheet with all the data in tables, and he showed his work to the dummy sisters:

Sales sheet


The two dummy sisters were so happy – Company 1 saw her customers A, B, C, D, … I and Company 2 saw her customers AA, BB, CC, DD, … II and that filled their hearts with joy, as it happens often when people see their data for the first time. This could have been the happy end, but then there wouldn’t be much of a story to tell on such a respectable blog site…

At this happy moment, Company 1, who was known as a stickler, asked the Wizard: “Why don’t you show me the data model? I want my data to look all pretty and nice, you know?”. So, they opened the Data model viewer, and naturally they found an ugly Synthetic key right in the middle of it:

The Data Model

So, naturally Company 1 couldn’t pass up an opportunity to throw a fit: “No stinking Synthetic keys in my data models!”, while Company 2 smiled and said “I don’t know, I kinda like it the way it is…”. The Wizard couldn’t stand to see the sisters argue, so he suggested a truly wise solution – “Look, I’ll just join the two tables, and the ugly Synthetic key will go away. Nothing to worry about, it will be fixed in a moment.” And he added a simple LEFT JOIN in front of the second load. Piece of cake!

Sales:
LOAD
     Company,
     Customer,
     Product,
     Sales

FROM [lib://Q-Tips/17 - JOIN and Wildcard load\Sales*.xlsx]
(ooxml, embedded labels, table is Sheet1);

//Customers:
LEFT JOIN (Sales)
LOAD
     Company,
     Customer,
     "Customer Name"

FROM [lib://Q-Tips/17 - JOIN and Wildcard load\Customers*.xlsx]
(ooxml, embedded labels, table is Sheet1);

After seeing the new data model, Company 1 was so happy – the ugly Synthetic key was gone, and everything looked so pretty:

Improved Data Model

But then, they opened the Sales sheet and this time Company 2 broke out into tears – all her Customer names were gone, and all the sales were associated with an ugly NULL value:

Sales sheet after the change

“Well, that’s a minor detail”, dismissively claimed Company 1. She was happy because her data was still there. But Company 2 insisted that she needs to see her Customers, or else…

The Wizard felt completely befuddled.  “I loaded Sales from the two tables with the wildcard load” – he was mumbling to himself. “Then I loaded Customers from the two tables with the wildcard load. And then, I joined the customers to the Sales. What could possibly go wrong?”

And then, he realized what truly happened there. The wildcard load is not a single load. Instead, it’s a series of individual loads that happen one after another in a sequence. And, if a JOIN prefix is attached to such a load, the joining happens individually for every file. So, the whole process, step by step, unfolded like this:

  1. Sales were loaded from the spreadsheet “Sales 1”.
  2. Sales were loaded from the spreadsheet “Sales 2”. Since the two tables have the same structure, they were automatically concatenated. So far so good.
  3. Customers were loaded from “Customers 1” and joined into Sales. The new field “Company Name” was added to the Sales table.
  4. Customers were loaded from “Customers 2”, and the attempt was made to join into Sales. This time, however, the field “Company Name” exists in both tables, so the JOIN was made for three fields – Company, Customer, and Customer Name. Since the Customer Names didn’t match, no LEFT JOIN condition was satisfied and no Customers were loaded from the second file.

This was quite a “gotcha” moment for the Wizard, but since he learned from the best (Masters Summit for Qlik, remember?), he quickly came up with the working solution – we will first load the multiple files using the wildcard load, and only then join the resulting table to Sales:

Sales:
LOAD
     Company,
     Customer,
     Product,
     Sales

FROM [lib://Q-Tips/17 - JOIN and Wildcard load\Sales*.xlsx]
(ooxml, embedded labels, table is Sheet1);

Customers:
LOAD
     Company,
     Customer,
     "Customer Name"

FROM [lib://Q-Tips/17 - JOIN and Wildcard load\Customers*.xlsx]
(ooxml, embedded labels, table is Sheet1);
LEFT JOIN (Sales)
LOAD * Resident Customers
;

drop table Customers;


And this was the happy end, boys and girls. The dummy sisters were pleased, their father paid the Wizard handsomely, and they all lived happily ever after.

That’s all for now, folks. Concatenate carefully, keep your Synthetic keys well groomed, and for goodness sake, check your calendars on October 23-25 this year – I hear there will be a lot of fun and Qlikking happening in Boston on those days. Visit the Masters Summit for Qlik website for details.



Leave a Reply

Your email address will not be published. Required fields are marked *