Hello.
Thank you for replay.
Yes, I know that rows having text fields can not be in fixed format, but than i tried to create table with ROW_FORMAT=FIXED and phpmyadmin showed this option in table overview, so I thought it worked somehow may be just for columns with fixed length (also this is why I changed my initial varchars to char type). I did a little more research and realize that it's not possible to use different formats for columns in row. So now I changed row format to dynamic (just leaved it as default). Thank you for this note.
About specifying MAX_ROWS to avoid 4GB limit. It seems that it can be some performance issue with 32-bit machines, so I decided to leave it default, as table size for much our customers will not be more than 4GB. As I understand it can be changed lately. Anyway, thanks to mention this.
Coming to IP address columns. I've changed it to INT (11) unsigned. I just thought may it be performance bottleneck to call INET_NTOA function each time to get data?
Normalizing user_agent and platform columns is great idea too in terms of keeping data as small as possible. I decided to to move all textual data (varchar and text columns) to separate tables, and now I can use fixed format rows freely, as it is performance gain as it seems so far from articles I've read.
So now I have only numeric data in my stats table. This of course will dramatically decrease used space.
The note on how LIMIT works is new for me too, but I've not said in my post that I'm using LIMIT anywhere. May be you are about subquery which limits number of rows scanned numbers in main table (to which overs are joined), it really works as EXPLAIN shows that only one row scanned from that table.
I finally managed to turned off query caching on my development machine for testing and benchmarking proposes only.
Also I've added some more columns needed for my product.
So here is my current table structure and status
SHOW CREATE TABLE sample_stats;
CREATE TABLE `sample_stats` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`banner_id` int(11) unsigned NOT NULL,
`location_id` smallint(3) NOT NULL,
`url_id` int(11) unsigned NOT NULL,
`page_id` int(11) unsigned NOT NULL,
`dateline` int(11) unsigned NOT NULL,
`ip` int(11) unsigned NOT NULL,
`browser_id` smallint(3) unsigned NOT NULL,
`platform_id` smallint(3) unsigned NOT NULL,
PRIMARY KEY (`id`),
KEY `bannerid` (`banner_id`),
KEY `dateline` (`dateline`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 PACK_KEYS=1 ROW_FORMAT=FIXED
SHOW TABLE STATUS LIKE 'sample_stats';
Name Engine Version Row_format Rows Avg_row_length Data_length Max_data_length Index_length Data_free Auto_increment Create_time Update_time Check_time Collation Checksum Create_options Comment
sample_stats MyISAM 10 Fixed 0 0 0 8725724278030335 1024 0 1 2009-05-17 21:28:49 2009-05-17 21:28:49 NULL latin1_swedish_ci NULL pack_keys=1 row_format=FIXED
After that I populated table with sample data for ~3 500 000 rows and run explain query and other queries you have mentioned.
I tried to get data for 5 time intervals, so using 5 self joins.
EXPLAIN SELECT COUNT( DISTINCT stat0.id ) , COUNT( DISTINCT stat1.id ) , COUNT( DISTINCT stat2.id ) , COUNT( DISTINCT stat3.id ) , COUNT( DISTINCT stat4.id ) , COUNT( DISTINCT stat5.id )
FROM vbpua_stats AS stat
LEFT JOIN vbpua_stats AS stat0 ON ( stat0.dateline >=1242000000
AND stat0.dateline <1242086400 )
LEFT JOIN vbpua_stats AS stat1 ON ( stat1.dateline >=1242086400
AND stat1.dateline <1242172800 )
LEFT JOIN vbpua_stats AS stat2 ON ( stat2.dateline >=1242172800
AND stat2.dateline <1242259200 )
LEFT JOIN vbpua_stats AS stat3 ON ( stat3.dateline >=1242259200
AND stat3.dateline <1242345600 )
LEFT JOIN vbpua_stats AS stat4 ON ( stat4.dateline >=1242345600
AND stat4.dateline <1242432000 )
LEFT JOIN vbpua_stats AS stat5 ON ( stat5.dateline >=1242432000
AND stat5.dateline <1242518400 )
WHERE stat.id =1
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE stat const PRIMARY PRIMARY 4 const 1 Using index
1 SIMPLE stat0 range dateline dateline 4 NULL 199 Using where
1 SIMPLE stat1 range dateline dateline 4 NULL 215 Using where
1 SIMPLE stat2 range dateline dateline 4 NULL 220 Using where
1 SIMPLE stat3 range dateline dateline 4 NULL 233 Using where
1 SIMPLE stat4 range dateline dateline 4 NULL 242 Using where
1 SIMPLE stat5 range dateline dateline 4 NULL 262 Using where
SHOW VARIABLES LIKE 'key_buffer%'
Variable_name Value
key_buffer_size 16777216
SHOW STATUS LIKE 'key%';
Variable_name Value
Key_blocks_not_flushed 0
Key_blocks_unused 14333
Key_blocks_used 14
Key_read_requests 52
Key_reads 14
Key_write_requests 0
Key_writes 0
I'm using machine with 2GB RAM.
I'll test new scheme with 20 000 000+ rows and post test results here.
I really appreciate your help. Thanks.