Saturday, August 23, 2008

Field names and compressing data

One of the cool things about QlikView is the way it can store enormous amounts of data in the report file. It can load many millions of rows of data into a report file that might only be one or two MB in size. It compresses data so well that I seldom zip a report file before sharing it since the data is stored so densely in the QV report that zipping the file doesn't give much of a savings. If you are working with loading very large tables into a report be aware that part of the QV strategy for compressing data depends on not actually storing duplicate values in a field. If the report loads sales order data, for example, and the data contains customer name then there may be many duplicate values in the data for any customer who has ordered many times. QV will only store the customer name once and then keep track of where that customer name is used again as subsequent rows are loaded.

This mechanism means that if you load two different tables from your database that contain customer name into two different QV tables and if the field name is the same for both QV tables then each customer name value is only stored once. But, if you give the field a different name in each QV table then the program cannot know that it is the same data and it must treat each field separately and many customer name values will be stored twice. This is really only a consideration when working with enormous tables where the sheer size of the tables is affecting memory utilization of the report or the size of the report file.

Remember that QV relates or joins the tables together based on the fields they have in common. Two QV tables that both contain the field Customer_Name will be related based on that field. So, sometimes making the field names in the tables the same to be more efficient with memory utilization will not be possible if it causes two tables to be joined when they shouldn’t (the joining of tables also consumes memory space). And the field names should only be the same if the field data is truly the same. Customer number from your company’s database is not the same as customer number from another company’s database. But, customer number from your company’s 2006 sales data is probably the same as customer number from the 2007 sales data.

No comments: