MySQL Forums
Forum List  »  Performance

Performance of a integrity check during insertions
Posted by: Ricky Egeland
Date: March 04, 2005 05:18PM

Hello,

I am creating a database in MySQL 4.1 of large tables that store data by what is called an interval of validity. This corresponds to two timestamps (datetime type) that create a duration for which the data in the row is valid. In order to maintain data integrity, I must check upon inserting that two intervals do not overlap. The table I am working with approximately looks like this:

CREATE TABLE data (
id INT,
since DATETIME,
till DATETIME,
value FLOAT,
PRIMARY KEY (lid, since, till),
INDEX IoV (since, till),
FOREIGN KEY (id) REFERENCES otherTable (id)
) TYPE=InnoDB

Now suppose I have a new data interval with $id, $time1, $time2 to insert. I need to check in the database that this interval will not overlap with other intervals. I am doing it with a query like this:

SELECT id, since, till FROM data
WHERE id = $id
AND ((since >= $time1 AND since < $time2)
OR (till > $time1 AND till < $time2)
OR ($time1 >= since AND $time2 < till))
LIMIT 1

If this query returns a row, then there is an overlap and I should not insert the data. I do this check for every interval inserted.

Now for my problem: what is happening as I insert data is that the rate of insertion is decreasing quite quickly. I expect it to decrease somewhat as the table grows and slows down the SELECT statement check, but it seems like it could be improved in some way. The strange thing to me is that the query slows down very fast, but if I disconnect from the DB and start inserting again, it starts again at its "maximum" rate before going down again. Here are my benchmarks, every 1000 inserts I calculate the number of inserts per second:

[#insert] [date time] [ins/s]
1000 2005-03-04 16:37:03 938.014
2000 2005-03-04 16:37:05 477.557
3000 2005-03-04 16:37:08 320.648
4000 2005-03-04 16:37:12 240.917
5000 2005-03-04 16:37:18 195.115

... stop, start again ... now the table has 5000 rows to begin with

[#insert] [date time] [ins/s]
1000 2005-03-04 16:37:32 928.964
2000 2005-03-04 16:37:35 483.085
3000 2005-03-04 16:37:38 322.567
4000 2005-03-04 16:37:42 243.716
5000 2005-03-04 16:37:47 194.967

... stop, start again ... now the table has 10000 rows to begin with

[#insert] [date time] [ins/s]
1000 2005-03-04 16:38:02 926.574
2000 2005-03-04 16:38:04 489.657
3000 2005-03-04 16:38:07 328.143
4000 2005-03-04 16:38:12 246.533
5000 2005-03-04 16:38:17 196.704

Why is the speed only fast upon starting a new insertion stream? I generated this with as simple a program as possible, and my insertion program (in perl) is not accumulating memory usage in any way. I am using bound statements for both the INSERT and the SELECT check.

After looking through the manual about key use in WHERE clauses and a problem of not using keys when there is an OR in it, I tried another statement with UNIONs instead of ORs

SELECT id, since, till FROM data
WHERE id = $id
AND (since >= $time1 AND since < $time2)
UNION
SELECT id, since, till FROM data
WHERE id = $id
AND (till > $time1 AND till < $time2)
UNION
SELECT id, since, till FROM data
WHERE id = $id
AND ($time1 >= since AND $time2 < till) ];

But this change yielded worse performance (though the effect of restarting at the maximum speed was still there)

[#insert] [date time] [ins/s]
1000 2005-03-04 16:11:25 568.528
2000 2005-03-04 16:11:29 279.621
3000 2005-03-04 16:11:34 186.456
4000 2005-03-04 16:11:41 138.475

If anyone has any clues as to what is the cause of this, or any suggestions for optimizing this sort of integrity checking, I would greatly appreciate it.

thanks,
Rick

Options: ReplyQuote


Subject
Views
Written By
Posted
Performance of a integrity check during insertions
3742
March 04, 2005 05:18PM


Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.