MySQL Forums
Forum List  »  Performance

Re: Millions storage...
Posted by: James Day
Date: April 01, 2005 12:51AM

Your case is nice enough. So is the select a few records into a different table, drop the original table and rename back approach to bulk deletions. The golden rule applies here: do what works and what you know is best for your application - you know it better than anyone here does!

Answer 1: truncate and reload is good. Better than lots of deletes. With the number of records you're talking about it should be no problem at all, whichever database engine you use. Temporary tables are more for per-session/connection work (automatically vanishing with end of session), so it sounds as though it's not appropriate. But if you mean temporary as in create, fill, drop, repeat, no problem. You'll get a little fragmentation either on disk or in the InnoDB tablespace but it's unlikely to be a significant factor. If this data is really disposable, using InnoDB may not matter if you can just drop and reload the table after a failure.

I'd say that's not many inserts, so isn't really a reason to switch to InnoDB just for its insert properties. Still, I see about 180:1 selects:inserts/updates and prefer InnoDB, so it's not a reason not to do it either. Mine are a bit more frequent, though - about 1.2 million inserts/updates per day, 14 per second. Your inserts are only one every 86 seconds, so you have a much lower chance of having a dirty page in OS cache during a crash than I do - for me it's almost certain if I use MyISAM. That substantially reduces the benefit of the nicer InnoDB crash recovery for your situation because you're much less likely to have a failure at just the wrong time.

InnoDB is worth a try. It's easy enough to do and you'll find out one way or another whether it's better or not.

Answer 2: InnoDB caveats:

Nicer crash recovery but when it does get a corrupt page, the easiest ways to recover (the practical ones you're likely to really use) involve a complete recopy of the database from a slave or reload from backup and reaply of binary log. Visible corrupt pages shouldn't happen - they mean a failure of a RAID setup to write properly and/or a failure of fsyncs to flush properly for some other reason or just bad luck and damaged data on disk or in a move to or from RAM. But they can happen. Working with tablespaces in the 180GB range the trade still clearly favors InnoDB for me, even though that means rsync of 180GB from one slave to the damaged system if I have a major problem. InnoDB is not magic: you do still need normal backups and binary logs. You'll need them less often, but be more desperate for them when the need is there.

InnoDB can use substantially more disk and RAM space per record - even several times as much - when all fields in a record are small. There's a more compact format in MySQL 5 which reduces, but doesn't eliminate, this. Compression of pages is coming, but still, it's a cost.

If you're seek-intensive InnoDB might help because it does cache the data records in RAM better than the OS. Seek-intensive really means trying indexes and query optimizing, to try to get rid of all the seeks you can eliminate. Experimenting with my.cnf settings like the MyISAM split between OS cache and key_buffer_size or experimenting with the various buffers can help a lot. InnoDB is a bit easier because there's no need to worry about the index/OS split and you can just give most of that RAM to InnoDB for it to manage.

InnoDB doesn't yet support full-text search. It can use raw disk partitions and may be nicer for supporting many gigabytes of RAM than MyISAM, at least prior to version 4.1, which improves that quite a bit. It does have some overhead just because it does need to deal with being a transactional engine with full ACID support and that can make it slower. It's not accidental that MySQL performance experts chose MyISAM for some tables when a benchmark didn't require transaction integrity for them - it really is faster for many situations.

For day to day use, InnoDB seem significantly easier on those who have to look after the servers and keep them up as close as possible to all the time. Particularly important for high profile web sites or mission-critical applications, so it can compensate for being a bit slower sometimes, if it is.

Answer 3: You say you're seek limited. The cheapest way to get more seeks is often to add more disk drives and use RAID 1 or 10 or 0. The systems I look after generally have six drives, some RAID 10, some RAID 0 (slaves, where I can afford to lose the whole server for a while if one drive dies). Disks include 15,000 RPM SCSI and a mixture of SATA and slower SCSI.

Two systems are of particular interest to you: identical except that one has a six 7nGB drive RAID 0 10,000 RPM SATA set and the other a six 7nGB drive RAID 0 10,000 RPM SCSI set. The SATA system is generally faster. The SCSI one is faster under very heavy combined replication and query load, which shows up most obviously as shorter replication lag when it's not quite or only barely keeping up. Be wary of generalizing from that, though - remember it's just one system and application. And you shouldn't really be operating the servers so they are that close to the disks not being able to keep up. Should have purchased more power before that, because well before that point end users will be seeing speed degradation. Since the extra cost of 10k SCSI isn't that great compared to 10K SATA (in a server with two CPUs and 4-8GB of RAM, anyway) I'm going SCSI mostly, because replication lag is a significant end user annoyance and I want to reduce it. Absent that consideration, I'd go with SATA.

So, SCSI isn't dramatically better. Except that you can't buy 15,000 RPM SATA drives yet. Remember that every dollar you save on the drives is another dollar for more RAM and strike the balance which seems best for your situation. Sometimes that'll be 4 SATA drives and less RAM. Sometimes it'll be time to buy six of the 140GB 15,000RPM SCSI drives and 16GB of RAM, because that may be what you need - my place is about to do exactly that.

I suggest starting by filling your case with SATA drives in RAID 10, since you say you're seek limited. It's the most cost-effective way to get those extra seeks.

For more on server choices, see the excellent article by Peter Zaitsev: http://dev.mysql.com/tech-resources/articles/mysql_platform_selection.html .

Options: ReplyQuote


Subject
Views
Written By
Posted
1864
March 28, 2005 07:53PM
1836
March 30, 2005 12:50AM
1875
March 30, 2005 01:16AM
Re: Millions storage...
2088
April 01, 2005 12:51AM
1836
March 31, 2005 11:34PM
1898
April 01, 2005 09:07AM
1943
April 01, 2005 09:09PM


Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.