Re: large InnoDB table partitioning without explicit PK
Posted by:
Miko M
Date: November 22, 2014 09:07AM
> (I guess I missed this message.)
>
> > Regarding Select_scan / Com_select - (relatively
> high) number of full table scans, there was a
> select statement...
>
> I love working with you -- I point you in a
> direction; you carry the ball from there.
[Miko] The pleasure's all mine!
>
> > My idea it to change it and perform aggregations
> on MySQL directly (either on a heap table / temp
> or on a slave or both).
>
> Sure. The use of the MEMORY cache was designed
> specifically to make that work well. (BTW, my
> implementation did not have multiple threads, and
> it could batch multiple rows in a single INSERT.)
[Miko] Ok, thanks for letting me know.
> An issue with aggregation -- when doing the scan
> to collect the rows to summarize, it is best to
> have those rows consecutive. That is, they should
> be one of
> * an entire table (such as the MEMORY cache table)
> or
> * an entire partition or
> * a range of rows based on the PRIMARY KEY (not a
> secondary key!)
>
> This is because it is very efficient to do a table
> scan or a range scan; it is much less efficient to
> use a secondary key. In case you are not familiar
> with InnoDB's indexing scheme, I'll briefly
> explain it.
> * A B+Tree is efficient to scan consecutive rows.
> * The data and the PRIMARY KEY live together in
> one BTree (actually a B+Tree);
> * Each secondary INDEX lives in a separate B+Tree
> -- it contains the secondary column(s), plus the
> PK's column(s);
> * A secondary key lookup requires 2 BTree lookups;
> Thus, scanning one of your secondary keys would
> be quite inefficient.
>
> By using an AUTO_INCREMENT PK (and no cache
> table), you have a way to summarize less than the
> full table (or partition) efficiently. You just
> remember where you "left off".
>
> The aggregation would probably be something like
> INSERT INTO agg SELECT a,b, sum(c), count(*), ...
> FROM xx [WHERE (range)] GROUP BY a,b;
> That requires touching each original row 1 extra
> time, on top of the 1 INSERT that touches it now.
> I would guess that the aggregation might take
> one-fifth the CPU cost of the single-row inserts.
[Miko] Many thanks for your explanation.
So it seems that I will have to use the partition to collect the rows.
In this case do you think it is still worth to add an AUTO_INCREMENT BIGINT as the PK (together with existing partitioning key)?
> Sorry, but adding in aggregation will mess with
> your nice 23K inserts/sec. I don't really know
> how much it will degrade things -- to make the
> scan possible and/or to add an AI and/or to have
> an efficient cache table, etc. I would hope that
> you could get aggregation + 3rd party code at
> better than 10K inserts/sec.
[Miko] Aggregations will be carried out in a slightly different way.
Data will be copied based on the partitioning key into the MEMORY engine and all the aggregations will done there.
> Your COUNT(*) slow queries could, instead, run off
> some agg table.
>
> > compressed and transferred onto another system.
>
>
> Do you need replication for backup? Seems like
> this 'dump' could be your backup?
> I rarely seen a need to push "reports" to a Slave
> -- given that the Summary tables are well
> designed.
[Miko] Actually, not. Initially I thought that all the aggregations and summaries could be carried out on a slave. But MEMORY on a master seems to be even a better option.
> > I quickly noticed that all available RAM was
> eaten up and MySQL started crashing.
>
> Was that because of running out of memory outside
> MySQL? When MySQL (buffer_pool, etc) starts
> swapping, it feels like a crash.
[Miko] If I can recall it correctly this situation that often led to MySQL instance crash was occuring when large volume of data (whole hourly partition) was save into a flat file and a tmp table has been used (select * from Table1 INTO OUTFILE '/tmp/partP2014111213' where col20 between...)
It really looked strange since tmp_table_size was configured to 16777216
I am happy to perform such test again as ling as you want to know the details.
Subject
Views
Written By
Posted
4926
November 12, 2014 07:41PM
2460
November 13, 2014 06:40PM
2296
November 14, 2014 03:02AM
1737
November 15, 2014 12:05AM
1765
November 16, 2014 07:39PM
1735
November 17, 2014 04:21PM
1788
November 17, 2014 07:48PM
1768
November 18, 2014 02:00AM
1844
November 18, 2014 11:35PM
1830
November 22, 2014 06:44AM
1705
November 18, 2014 01:13AM
2155
November 19, 2014 12:27AM
1924
November 19, 2014 06:01PM
2060
November 22, 2014 08:39AM
2013
November 14, 2014 03:04AM
2054
November 14, 2014 03:05AM
2052
November 15, 2014 12:32AM
2088
November 16, 2014 08:31PM
1852
November 19, 2014 12:12AM
Re: large InnoDB table partitioning without explicit PK
1747
November 22, 2014 09:07AM
1780
November 23, 2014 09:00PM
1937
December 01, 2014 11:16AM