MySQL Forums
Forum List  »  Optimizer & Parser

Re: Query needs to be optimized
Posted by: Rick James
Date: July 01, 2010 09:43PM

Generally, the schema, the indexes, and the query look good. Switching to the subqueries (with their COUNT(*)s) probably sped things up some. How much?

Do you get 42800 rows of output? That could take some time.

`job` is over 8.6GB. (All the tables and indexes add to 20GB?) innodb_buffer_pool_size is 1.8G. So not everything is cached.

How much RAM? Sounds like 4GB. More RAM, plus 64-bit OS & MySQL would help some.

Fetching 42K rows may take a few disk hist; it may take 42K disk hits if the records are scattered. 42K disk hits may take about 7 minutes for ordinary disks. What to do...

* Fetch fewer rows? That is a business decision.

* "Cluster" the rows so that there are fewer disk hits.

* Vertical partitioning.

* Normalization.

* Shrink the data.

Some observations...

* I'm pretty sure the PRIMARY KEY for `job` is
UNIQUE KEY `job_id_index` (`id`).
That implies that the rows are ordered (clustered) based on id.

* A row in `job` is about 3332 bytes. So there are only a few rows per 16KB block. Based on this, I now put the lower bound for this query at more than a minute (assuming a cold cache).

Clustering... Well, with so few rows fitting in a block, there is not much hope for this. Anyway, it is probably not practical to make a PRIMARY KEY that starts with company_id.

Vertical Partitioning... The is a technique that I often use for MyISAM, but it would probably benefit `job`. The idea is to split the table into two (or more) with some columns in one table, the other columns in the other. By moving the bulky and infrequently used columns to another table makes the main table leaner, thereby more cacheable.

Optimal for this query (perhaps not optimal overall): Keep these in `job`:
requisition_number, title_en, customer, company_division, recruiter_name, location_stateprovince, location_city, location_postalcode, function,
company_id, timestamp, id
Put the other fields in `job_other` (plus `id`).

A more logical way to do this is to develop a "star schema", wherein the recruiter and all his info is moved into `recruiter` and replaced by `recruiter_id`. Ditto for company, etc. This may lead to a small number of JOINs to get the stuff back together. So, I cannot predict that this will necessarily be faster.

If, on the other hand, you had hundreds of instances of the same recruiter, hundreds of the same company, etc., then the overall bulk would decrease, leading to a performance improvement.

Shrink the data... Aside from the above comments, there is also the possibility that some TEXT fields could be compressed (typically 3:1 shrinkage). I can't tell fraction of 3332 the TEXT fields are, so I can't predict the overall benefit.

INT -> MEDIUMINT (where safe to do so). Doing this for just one column will save 2.5MB. But, looking at another way, it would save only 1/3332 of the table size.

Unrelated... Your tables are latin1, yet it feels like you have international data. Shouldn't you be using utf8?

Options: ReplyQuote

Written By
June 29, 2010 03:39AM
June 30, 2010 08:50AM
Re: Query needs to be optimized
July 01, 2010 09:43PM

Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.