Re: Design approach for summary table by 3 items
Posted by: Rick James
Date: January 31, 2009 11:23PM

Yes! Do a summary table. Let's dissect the typical(?) summary table:

Key fields:
* Time: truncated to, typically, hour or day
* Other dimension(s): perhaps url of page and/or userid

Subtotals:
* A COUNT(*) of number of fact table rows matching the keys (above).
* Some SUM(...) columns, say, SUM(elapsed_time)
* Do NOT include AVG(...) -- you will want to take AVG(avg_col) and this is mathematically wrong. Instead, plan on SUM(sum_col)/SUM(ct_col)

Populating -- There are several ways.

* Every hour (day?) do INSERT INTO summary_tbl SELECT ... FROM facts WHERE time BETWEEN ... GROUP BY ... -- This assumes you do not delete or update the fact table, nor do you have any late-arriving rows.

* Ignore the unit of time; allow dup rows in summary table. This requires a monotonic id on the fact table. And a way to keep track of where you "left off". Now the WHERE clause is based on id.

* In between... Have a cron job doing the first approach above; then have your UI do the second as the user is viewing a report. In fact, the same code can be done for both -- the cron job probes the UI and ignores the output. In some applications, I have found that this works very well.

The Report...
If your summary table is split on hour, you can still get a report for, say, a "week" by simply doing
SELECT SUM(ct_col) as Num_visitors,
       SUM(sum_elaps)/SUM(ct_col) as Avg_response_time
   FROM summary_tbl 
   WHERE hr >= $date AND hr < DATE_ADD($date, INTERVAL 7 DAY)
     AND ...
   GROUP BY hr, ...
   ORDER BY ...
It will be pleasingly faster than hitting the fact table.

Another bonus, assuming the fact table grows to be much bigger than will fit in RAM...  If you have lots of indexes on the fact table, INSERTs into it will get slower.  Why?  Some of the indexes will be 'random', and will have to hit disk.  With the above approach, you don't need more than the PRIMARY KEY on the table, and that is not 'random', hence very cacheable.

Options: ReplyQuote


Subject
Written By
Posted
Re: Design approach for summary table by 3 items
January 31, 2009 11:23PM


Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.