MySQL Forums
Forum List  »  Performance

Re: Questions about performance for huge table (100M+ rows)
Posted by: Vladan Stefanovic
Date: January 03, 2009 06:02PM

Thank you for your answers.

I would like to clear up some things that weren't so well explained in my first post.

- I am using `id` because, like I said, the `name` is it's base 36 encoded value. However, because `name` is required to be unique the only way I can be sure that it is, is by having an auto_increment field to base it on. Furthermore, the `name` can also be custom (but is so somewhat rarely), and before any INSERT I'm also doing a SELECT for the requested `name` to notify of it's unavailability and fetch other relevant fields. Also, if the INSERT is done without a custom `name`, it is possible that the `name` generated from `id` already exists (since custom `name`s are allowed), so I need to insert a blank row and continue doing so until the `name` from current `id` is unique. Blank rows are deleted every 7 days.
Considering all that, SELECTs are done against the `name` only.

- The `updated` field is a timestamp, but it's actually a UNIX timestamp adjusted to GMT, which is basically an integer. I'm using that because it can then easily be converted to any needed timezone and date format.
Same thing for `added`.

- 90% of the SELECTs on this table will fetch a single row by it's `name`. But this row also needs to be updated, namely it's `updated` and `hits` fields.
The rest of the queries will fetch either 100, 1000 or 10000 rows and they need to be ordered by either `id`, `special`, `added` or `hits`. The only conditional is that the `name` is not NULL or that it specifically is NULL. I don't need these results to be real time, so I can cache them for 3 to 24 hours.

- Based on your input, I modified the structure of the table. Here it is:

`id` int(10) unsigned NOT NULL auto_increment,
`name` varbinary(50) default NULL,
`special` tinyint(1) NOT NULL default '0',
`content` text collate utf8_unicode_ci NOT NULL,
`hash` varbinary(40) default NULL,
`ip` varbinary(15) NOT NULL,
`added` int(10) unsigned NOT NULL default '0',
`updated` int(10) unsigned NOT NULL default '0',
`hits` int(10) unsigned NOT NULL default '0',
UNIQUE KEY `name` (`name`),
UNIQUE KEY `hash` (`hash`)

The main problem here is the fact that my employers say that their load varies wildly; at certain times there can be only a few dozen queries per second but a couple of hours later there could be hundreds per second on this table alone. So, I need to prepare for the worst as they need shortest possible response times in any given moment. And after the new application is done (which consists of PHP frontend and Java backend, both written by me) the average number of queries will continue to rise as will the number of rows in this table.

I'm sorry if I showed ignorance somewhere, but like I said, this is my first project with such a large database and thus I never needed to pay too much attention to optimization. Hopefully it won't be the last :)

Edited 2 time(s). Last edit at 01/05/2009 11:16AM by Vladan Stefanovic.

Options: ReplyQuote

Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.