MySQL :: Re: large InnoDB table partitioning without explicit PK

New Topic

Re: large InnoDB table partitioning without explicit PK

Posted by: Miko M
Date: November 14, 2014 03:02AM

Hi Rick,

Many thanks for your prompt feedback along with great hints.
As a matter of fact I started studying mysql partitioning from your blog (http://mysql.rjweb.org/doc.php/partitionmaint).
Again, great piece of comprehensive knowledge!

Please find my answers inline.

Rick James Wrote:
-------------------------------------------------------
> 96 PARTITIONs -- that's about the practical limit.
> (There are inefficiencies when having too many
> partitions; 5.7 / 10.1 may fix some of them.)
[Miko] Ok, thanks for the hint.
>
> > Estimated number of records inserted per hour:
> 20M (peak 50M)
>
> 50M INSERTs/hr = 14K/sec => several performance
> issues. First, some questions:
> * Disks? Rotating? SSDs? RAIDed?
> * Batch insert? LOAD DATA? Multi-threaded?
> * How random are the indexes?
[Miko] Yes, 14k per sec in a peak.
Storage: decent high-end RAID10 spread over 48 FC disks, approx 8k+ IOPS (50/50 reads/writes random/sequential)
I've tested max insert rate, and it can handle up to 18k per sec, and CPU seemed to be the bottleneck not the I/O (I rarely met such situation before when CPU was a bottleneck not I/O - with this kind of operations)
Inserts: unfortunately not batch /multivalued inserts, just JDBC streams from MongoDB that looks like:
BEGIN
insert into Table1(col1,col2,..., col19) values (val1, val2,..., val19) ON DUPLICATE KEY UPDATE col1 = col1
insert into Table1(col1,col2,..., col19) values (val1, val2,..., val19) ON DUPLICATE KEY UPDATE col1 = col1
...
COMMIT;
(there is no 'col20' here, for which value is autogenerated - partition key - when a record is inserted)
About indexes:
Index 1: CURRENT_TIMESTAMP
Index 2: Timestamp as varchar2 ('YYYY:MM:DD:HH24:MI:SS:MS')
Index 3 (planned): BIGINT as varchar2 - High cardinality; 10M+ of possible different values

> You really ought to have an PRIMARY KEY, even it
> is an artificial AUTO_INCREMENT. (If you don't
> specify a PK, InnoDB gives you a _hidden_ 6-byte
> int-like thingie.)
[Miko] Yes! But such artificial increment needs to be a part of PK together with the partition key. And it has to be BIGINT (8 bytes) for such large number of records. I am aware that 6B implicit key is added as long as there's neither UK nor PK. In fact I thought about such AUTO_INCREMENT artificial key, but saw no real reason to do this (definitely it is my lack of experience in that area).

> > 5.2.10-MariaDB
> > Change version to mariadb10.0.x to support
> better partition pruning (it is almost not
> supported now)
>
> Good.
[Miko] Ok, thanks again.

> > RANGE partitioning
>
> Good. See my blog for more discussion:
> http://mysql.rjweb.org/doc.php/partitionmaint
[Miko] As I said before, great source of information, looking at it on a regular basis.

> > Add an additional index key
>
> Caution -- each INDEX must be updated for every
> row INSERTed. If they are random, then this could
> lead to I/O thrashing.
[Miko] Yes, I know. It definitely does not help with inserts!

> > Enable replication (and bin log)
>
> Fine. Note that the longer you wait, the bigger
> the dump/reload will be to initialize the Slave.
> And the gotchas may bite you.
[Miko] Yes, but my headache is how much the replication slows down the insertion process. In fact, to speed up the process, I decided to disable the binlog, now I have to enable it back. But this is the matter of tests and performance degradation (which I am going to carry out in the near future).

> > Use temp tables / heap storage engine to copy in
> data from a single partition of Table1 and play
> with inserted records (do aggregations and store
> results into other tables)
>
> Excellent! (I won't need to lecture on such.)
> I found this to be especially efficient:
> 1. Load batches of rows into a MEMORY table in a
> _separate_ database.
> 2. Aggregate from that table into the Summary
> tables
> 3. Load from that table into the big Fact table.
> Furthermore,
> * Use RBR (Row Based Replication)
> * binlog_ignore_database = the database with the
> MEMORY table.
> This will cut down significantly on the
> replication traffic.
[Miko] That's the great recommendation! I will test it very soon and keep you posted!

> > I'm aware of costly JDBC inserts (no batch
> inserts, autocommit, etc.)
>
> You _will_ need to improve on that in order to
> achieve 50M inserts/hour. 90% of the cost of a
> single-row INSERT is overhead that could be
> amortized if you batch the inserts.
[Miko] I wish I could... This "JDBC stream" comes from 3rd party, it would be hard to change anything in it as long as there is no blocking point (unfortunately for me, any performance degradation is not considered as blocking point)

> > and "ON DUPLICATE KEY UPDATE"
>
> If you can gather thousand(s) of rows at a time in
> the MEMORY table, then about 2 SQLs (an
> INSERT..SELECT..LEFT JOIN, and a UPDATE..JOIN)
> will
> * replace the IODKU, and
> * be 2 COMMITs (autocommit) instead of
> thousand(s)
> (If you want the details on the 2 queries, I'll
> scratch my head and reconstruct them. Or maybe I
> will finish my blog on such.)
[Miko] Sounds tempting. Could you please expand on that a little bit more or point me to some further reading in your blog / on the Internet?

> > just partition drops.
>
> After 96 (or 36) hours?
[Miko] I've developed a piece of software; it gets input from a "management" table and creates ahead of / drops the partition(s) automatically and it drops old partition(s) if particular thresholds have been reached (size of a table for example). It works well so far.

> > `col14` smallint(5) unsigned DEFAULT '0',
>
> Two more lectures averted -- SMALLINT (and other
> sizes), and UNSIGNED.
> What about "NOT NULL"?
[Miko] Point taken! I will add "NOT NULL" clause. Some of the columns of INT types were previously part of PK, that is why they are "NOT NULL"

> > ### my.cnf
>
> (SHOW VARIABLES is better. Possibly not necessary
> at the moment. But if you would like tuning
> advice, provide SHOW VARIABLES and SHOW GLOBAL
> STATUS.)
>
> > query_cache_size=8M
>
> Hardly worth having. And, since you have a high
> ingestion rate, you probably have a high QC purge
> rate. Recommend
> query_cache_size=0
> query_cache_type=OFF
[Miko] Ok. Thanks!

> > lower_case_table_names=1
>
> Windows?
[Miko] No, it is Linux.

> > table_cache=1024
> -->
> table_open_cache=5000
> Note that each partition is effectively a "table".
[Miko] Ok. I will change it, it was setup initially for 24 partitions. Thanks!
>
> > innodb_flush_log_at_trx_commit=0
>
> Aha, you already found that one. Recommend 2 over
> 0; hardly any slower, possibly more secure.
[Miko] Ok. Thanks!

> > insert into Table1(col1,col2,..., col19) values
> (val1, val2,..., val19) ON DUPLICATE KEY UPDATE
> col1 = col1
>
> Ouch. IODKU needs a UNIQUE (or PRIMARY) key in
> order to check for "DUPLICATE KEY"!
[Miko] So basically - provided that there are neither PK nor UKs such syntax does not result in performance degradation with inserts, is it correct?

> Did you think about INSERT IGNORE?
[Miko] Well, as I said this stream is from 3rd party, it will be (very) hard to change anything inside.

> After you upgrade, be sure to set
> innodb_buffer_pool_instances = 8 -- or some value
> > 1
[Miko] Ok. Thanks. Need to read a little more to get better understanding of configuration details.

> > Average record size 209B
> > 800 million of records in the table (number of
> partitions: 36)
> > 64GB RAM,
> > innodb_buffer_pool_size=32768M
> > Buffer pool hit rate 1000 / 1000
>
> --> 167 GB for the table. (Plus indexes?)
> However,... Before the table gets to 32GB (re
> buffer_pool), you _may_ start being I/O bound.
> This is where the randomness of the indexes, and
> other issues, comes into play.
> The hit rate suggests that you have not hit the
> problem _yet_.
[Miko] Hmm, it sounds scary. As of now database size is 60GB (data + indexes), I have not observed any issues but the insert rate is very low indeed (AVG 500 per sec, MAX 1500 per sec)

> Oh, one more question -- Are the records inserted
> in roughly timestamp order? That is, is the
> 'latest' partition hot while the others are cold?
[Miko] Yes, exactly! The partition key is always generated for each inserted record (`col20` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP)

> Another thing to note: in 5.2 (I think), INSERT
> does no PARTITION pruning. Another reason to
> upgrade. (No, I don't know for a fact that 10.0's
> INSERT will prune.)
[Miko] Definitely!
>
> If all the activity is in one partition, then the
> "working set" is only 1.7GB (167GB/96), which is
> much less than 32GB. This implies that even a
> 'random' index won't lead to thrashing.
[Miko] Yes! The "true" working set is relatively small.

> Too much information? Probably not. And I love
> expounding on scaling/performance of MySQL.
[Miko] No, not at all! Each and every hint is highly appreciated!

> Can any of those VARCHARs be normalized in order
> to shrink the disk footprint? The normalization
> is a couple of SQLs while the data is still in the
> MEMORY table -- very cheap and efficient.
[Miko] Yes, if normalised it would be much better in terms of footprint, IO, and so on. As I said the insert "stream" is from the 3rd party.
Theoretically the data could be loaded into temp table / memory, and the trigger/procedures could check the varchars, update the "dictionary" in case of any new varchar appears, and update/replace the varchars in temp / memory with appropriate hashes (from the "dictionary" table), etc.
It could be done, but I am not sure whether it makes sense.

Navigate: Previous Message• Next Message

Options: Reply• Quote

Subject

Views

Written By

Posted

large InnoDB table partitioning without explicit PK

4802

Miko M

November 12, 2014 07:41PM

Re: large InnoDB table partitioning without explicit PK

2409

Rick James

November 13, 2014 06:40PM

Re: large InnoDB table partitioning without explicit PK

2240

Miko M

November 14, 2014 03:02AM

Re: large InnoDB table partitioning without explicit PK

1695

Rick James

November 15, 2014 12:05AM

Re: large InnoDB table partitioning without explicit PK

1714

Miko M

November 16, 2014 07:39PM

Re: large InnoDB table partitioning without explicit PK

1696

Rick James

November 17, 2014 04:21PM

Re: large InnoDB table partitioning without explicit PK

1752

Rick James

November 17, 2014 07:48PM

Re: large InnoDB table partitioning without explicit PK

1729

Miko M

November 18, 2014 02:00AM

Re: large InnoDB table partitioning without explicit PK

1800

Rick James

November 18, 2014 11:35PM

Re: large InnoDB table partitioning without explicit PK

1787

Miko M

November 22, 2014 06:44AM

Re: large InnoDB table partitioning without explicit PK

1670

Miko M

November 18, 2014 01:13AM

(non-unique) index more efficient than partition pruning

2108

Miko M

November 19, 2014 12:27AM

Re: (non-unique) index more efficient than partition pruning

1868

Rick James

November 19, 2014 06:01PM

Re: (non-unique) index more efficient than partition pruning

2017

Miko M

November 22, 2014 08:39AM

Re: large InnoDB table partitioning without explicit PK

1968

Miko M

November 14, 2014 03:04AM

Re: large InnoDB table partitioning without explicit PK

2007

Miko M

November 14, 2014 03:05AM

Re: large InnoDB table partitioning without explicit PK

2009

Rick James

November 15, 2014 12:32AM

Re: large InnoDB table partitioning without explicit PK

2045

Miko M

November 16, 2014 08:31PM

Re: large InnoDB table partitioning without explicit PK

1806

Rick James

November 19, 2014 12:12AM

Re: large InnoDB table partitioning without explicit PK

1711

Miko M

November 22, 2014 09:07AM

Re: large InnoDB table partitioning without explicit PK

1726

Rick James

November 23, 2014 09:00PM

Re: large InnoDB table partitioning without explicit PK

1890

Miko M

December 01, 2014 11:16AM

Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.