MySQL :: Re: Get statistics using self joins

New Topic

Re: Get statistics using self joins

Posted by: Harutyun Drnoyan
Date: May 17, 2009 11:41AM

Hello.
Thank you for replay.

Yes, I know that rows having text fields can not be in fixed format, but than i tried to create table with ROW_FORMAT=FIXED and phpmyadmin showed this option in table overview, so I thought it worked somehow may be just for columns with fixed length (also this is why I changed my initial varchars to char type). I did a little more research and realize that it's not possible to use different formats for columns in row. So now I changed row format to dynamic (just leaved it as default). Thank you for this note.

About specifying MAX_ROWS to avoid 4GB limit. It seems that it can be some performance issue with 32-bit machines, so I decided to leave it default, as table size for much our customers will not be more than 4GB. As I understand it can be changed lately. Anyway, thanks to mention this.

Coming to IP address columns. I've changed it to INT (11) unsigned. I just thought may it be performance bottleneck to call INET_NTOA function each time to get data?

Normalizing user_agent and platform columns is great idea too in terms of keeping data as small as possible. I decided to to move all textual data (varchar and text columns) to separate tables, and now I can use fixed format rows freely, as it is performance gain as it seems so far from articles I've read.

So now I have only numeric data in my stats table. This of course will dramatically decrease used space.

The note on how LIMIT works is new for me too, but I've not said in my post that I'm using LIMIT anywhere. May be you are about subquery which limits number of rows scanned numbers in main table (to which overs are joined), it really works as EXPLAIN shows that only one row scanned from that table.

I finally managed to turned off query caching on my development machine for testing and benchmarking proposes only.

Also I've added some more columns needed for my product.

So here is my current table structure and status

SHOW CREATE TABLE sample_stats;

CREATE TABLE `sample_stats` (
 `id` int(11) unsigned NOT NULL AUTO_INCREMENT,
 `banner_id` int(11) unsigned NOT NULL,
 `location_id` smallint(3) NOT NULL,
 `url_id` int(11) unsigned NOT NULL,
 `page_id` int(11) unsigned NOT NULL,
 `dateline` int(11) unsigned NOT NULL,
 `ip` int(11) unsigned NOT NULL,
 `browser_id` smallint(3) unsigned NOT NULL,
 `platform_id` smallint(3) unsigned NOT NULL,
 PRIMARY KEY (`id`),
 KEY `bannerid` (`banner_id`),
 KEY `dateline` (`dateline`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 PACK_KEYS=1 ROW_FORMAT=FIXED

SHOW TABLE STATUS LIKE  'sample_stats';
Name			Engine		Version			Row_format		Rows		Avg_row_length		Data_length		Max_data_length			Index_length		Data_free		Auto_increment		Create_time		Update_time		Check_time		Collation		Checksum		Create_options												Comment
sample_stats		MyISAM		10			Fixed			0		0				0		8725724278030335		1024			0			1			2009-05-17 		21:28:49		2009-05-17 		21:28:49		NULL			latin1_swedish_ci	NULL	pack_keys=1 row_format=FIXED

After that I populated table with sample data for ~3 500 000 rows and run explain query and other queries you have mentioned.

I tried to get data for 5 time intervals, so using 5 self joins.

EXPLAIN SELECT COUNT( DISTINCT stat0.id ) , COUNT( DISTINCT stat1.id ) , COUNT( DISTINCT stat2.id ) , COUNT( DISTINCT stat3.id ) , COUNT( DISTINCT stat4.id ) , COUNT( DISTINCT stat5.id ) 
FROM vbpua_stats AS stat
LEFT JOIN vbpua_stats AS stat0 ON ( stat0.dateline >=1242000000
AND stat0.dateline <1242086400 ) 
LEFT JOIN vbpua_stats AS stat1 ON ( stat1.dateline >=1242086400
AND stat1.dateline <1242172800 ) 
LEFT JOIN vbpua_stats AS stat2 ON ( stat2.dateline >=1242172800
AND stat2.dateline <1242259200 ) 
LEFT JOIN vbpua_stats AS stat3 ON ( stat3.dateline >=1242259200
AND stat3.dateline <1242345600 ) 
LEFT JOIN vbpua_stats AS stat4 ON ( stat4.dateline >=1242345600
AND stat4.dateline <1242432000 ) 
LEFT JOIN vbpua_stats AS stat5 ON ( stat5.dateline >=1242432000
AND stat5.dateline <1242518400 ) 
WHERE stat.id =1

id	select_type	table	type	possible_keys		key		key_len		ref		rows	Extra
1	SIMPLE		stat	const	PRIMARY			PRIMARY		4		const		1		Using index
1	SIMPLE		stat0	range	dateline		dateline	4		NULL		199		Using where
1	SIMPLE		stat1	range	dateline		dateline	4		NULL		215		Using where
1	SIMPLE		stat2	range	dateline		dateline	4		NULL		220		Using where
1	SIMPLE		stat3	range	dateline		dateline	4		NULL		233		Using where
1	SIMPLE		stat4	range	dateline		dateline	4		NULL		242		Using where
1	SIMPLE		stat5	range	dateline		dateline	4		NULL		262		Using where


SHOW VARIABLES LIKE  'key_buffer%'

Variable_name		Value
key_buffer_size		16777216

SHOW STATUS LIKE 'key%';

Variable_name			Value
Key_blocks_not_flushed		0
Key_blocks_unused		14333
Key_blocks_used			14
Key_read_requests		52
Key_reads			14
Key_write_requests		0
Key_writes			0

I'm using machine with 2GB RAM.

I'll test new scheme with 20 000 000+ rows and post test results here.

I really appreciate your help. Thanks.

Navigate: Previous Message• Next Message

Options: Reply• Quote

Subject

Views

Written By

Posted

Get statistics using self joins

6381

Harutyun Drnoyan

May 16, 2009 10:40AM

Re: Get statistics using self joins

2101

Rick James

May 16, 2009 07:42PM

Re: Get statistics using self joins

2339

Harutyun Drnoyan

May 17, 2009 11:41AM

Re: Get statistics using self joins

2112

Rick James

May 17, 2009 12:32PM

Re: Get statistics using self joins

2208

Harutyun Drnoyan

May 17, 2009 12:48PM

Re: Get statistics using self joins

2069

Harutyun Drnoyan

May 18, 2009 03:24AM

Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.