Performance of a integrity check during insertions
Hello,
I am creating a database in MySQL 4.1 of large tables that store data by what is called an interval of validity. This corresponds to two timestamps (datetime type) that create a duration for which the data in the row is valid. In order to maintain data integrity, I must check upon inserting that two intervals do not overlap. The table I am working with approximately looks like this:
CREATE TABLE data (
id INT,
since DATETIME,
till DATETIME,
value FLOAT,
PRIMARY KEY (lid, since, till),
INDEX IoV (since, till),
FOREIGN KEY (id) REFERENCES otherTable (id)
) TYPE=InnoDB
Now suppose I have a new data interval with $id, $time1, $time2 to insert. I need to check in the database that this interval will not overlap with other intervals. I am doing it with a query like this:
SELECT id, since, till FROM data
WHERE id = $id
AND ((since >= $time1 AND since < $time2)
OR (till > $time1 AND till < $time2)
OR ($time1 >= since AND $time2 < till))
LIMIT 1
If this query returns a row, then there is an overlap and I should not insert the data. I do this check for every interval inserted.
Now for my problem: what is happening as I insert data is that the rate of insertion is decreasing quite quickly. I expect it to decrease somewhat as the table grows and slows down the SELECT statement check, but it seems like it could be improved in some way. The strange thing to me is that the query slows down very fast, but if I disconnect from the DB and start inserting again, it starts again at its "maximum" rate before going down again. Here are my benchmarks, every 1000 inserts I calculate the number of inserts per second:
[#insert] [date time] [ins/s]
1000 2005-03-04 16:37:03 938.014
2000 2005-03-04 16:37:05 477.557
3000 2005-03-04 16:37:08 320.648
4000 2005-03-04 16:37:12 240.917
5000 2005-03-04 16:37:18 195.115
... stop, start again ... now the table has 5000 rows to begin with
[#insert] [date time] [ins/s]
1000 2005-03-04 16:37:32 928.964
2000 2005-03-04 16:37:35 483.085
3000 2005-03-04 16:37:38 322.567
4000 2005-03-04 16:37:42 243.716
5000 2005-03-04 16:37:47 194.967
... stop, start again ... now the table has 10000 rows to begin with
[#insert] [date time] [ins/s]
1000 2005-03-04 16:38:02 926.574
2000 2005-03-04 16:38:04 489.657
3000 2005-03-04 16:38:07 328.143
4000 2005-03-04 16:38:12 246.533
5000 2005-03-04 16:38:17 196.704
Why is the speed only fast upon starting a new insertion stream? I generated this with as simple a program as possible, and my insertion program (in perl) is not accumulating memory usage in any way. I am using bound statements for both the INSERT and the SELECT check.
After looking through the manual about key use in WHERE clauses and a problem of not using keys when there is an OR in it, I tried another statement with UNIONs instead of ORs
SELECT id, since, till FROM data
WHERE id = $id
AND (since >= $time1 AND since < $time2)
UNION
SELECT id, since, till FROM data
WHERE id = $id
AND (till > $time1 AND till < $time2)
UNION
SELECT id, since, till FROM data
WHERE id = $id
AND ($time1 >= since AND $time2 < till) ];
But this change yielded worse performance (though the effect of restarting at the maximum speed was still there)
[#insert] [date time] [ins/s]
1000 2005-03-04 16:11:25 568.528
2000 2005-03-04 16:11:29 279.621
3000 2005-03-04 16:11:34 186.456
4000 2005-03-04 16:11:41 138.475
If anyone has any clues as to what is the cause of this, or any suggestions for optimizing this sort of integrity checking, I would greatly appreciate it.
thanks,
Rick