MySQL Forums
Forum List  »  Optimizer & Parser

Optimizing query with joins
Posted by: kayne
Date: June 03, 2006 12:18PM

Hello,

I'm trying to optimize a query that takes an unacceptable amount of time on MySQL. I think the number of rows that it's examining is part of the problem. We have a table with 90,000+ rows - not a large table by any measure - but it looks like the query optimizer wants to scan it.

This query comes from CiviCRM 1.4. I'm not a CiviCRM developer and I'm not a database developer by trade, but I used to work with large databases frequently. Make explicit recommendations and I'll try them, and also share them with the CiviCRM developer team.

The only queries executing on the server were the queries shown. No other users were on the box.


Using the "explain" feature of MySQL and the slow query log, I came up with the following.

This first query takes 76 seconds to execute, and is a search for "hoben" - no quotes around that.
--snip, snip--
# Query_time: 76 Lock_time: 0 Rows_sent: 2 Rows_examined: 765476 use deompojk_civicspace; SELECT civicrm_contact.id as contact_id, civicrm_contact.contact_type as contact_type, civicrm_contact.sort_name as sort_name, civicrm_contact.display_name as display_name, civicrm_individual.id as individual_id, civicrm_individual.first_name as `first_name`, civicrm_individual.middle_name as `middle_name`, civicrm_individual.last_name as `last_name`, civicrm_individual.birth_date as `birth_date`, civicrm_gender.id as gender_id, civicrm_gender.name as `gender`, civicrm_location.id as location_id, civicrm_address.id as address_id, civicrm_address.street_address as `street_address`, civicrm_address.supplemental_address_1 as `supplemental_address_1`, civicrm_address.supplemental_address_2 as `supplemental_address_2`, civicrm_address.city as `city`, civicrm_address.postal_code as `postal_code`, civicrm_address.postal_code_suffix as `postal_code_suffix`, civicrm_address.geo_code_1 as `geo_code_1`, civicrm_address.geo_code_2 as `geo_code_2`, civicrm_state_province.id as state_province_id, civicrm_state_province.abbreviation as `state_province`, civicrm_country.id as country_id, civicrm_country.name as `country`, civicrm_phone.id as phone_id, civicrm_phone.phone_type as phone_type, civicrm_phone.phone as `phone`, civicrm_email.id as email_id, civicrm_email.email as `email`, civicrm_im.id as im_id, civicrm_im.name as `im`, civicrm_contact.legal_identifier as `legal_identifier`, civicrm_contact.external_identifier as `external_identifier`, civicrm_contact.nick_name as `nick_name`, civicrm_contact.home_URL as `home_URL` FROM civicrm_contact LEFT JOIN civicrm_individual ON (civicrm_contact.id = civicrm_individual.contact_id) LEFT JOIN civicrm_gender ON civicrm_individual.gender_id = civicrm_gender.id LEFT JOIN civicrm_location ON (civicrm_location.entity_table = 'civicrm_contact' AND civicrm_contact.id = civicrm_location.entity_id AND civicrm_location.is_primary = 1) LEFT JOIN civicrm_address ON civicrm_location.id = civicrm_address.location_id LEFT JOIN civicrm_state_province ON civicrm_address.state_province_id = civicrm_state_province.id LEFT JOIN civicrm_country ON civicrm_address.country_id = civicrm_country.id LEFT JOIN civicrm_email ON (civicrm_location.id = civicrm_email.location_id AND civicrm_email.is_primary = 1) LEFT JOIN civicrm_phone ON (civicrm_location.id = civicrm_phone.location_id AND civicrm_phone.is_primary = 1) LEFT JOIN civicrm_im ON (civicrm_location.id = civicrm_im.location_id AND civicrm_im.is_primary = 1) WHERE civicrm_contact.domain_id = 1 AND ( ( LOWER(civicrm_contact.sort_name) LIKE '%hoben%' ) OR ( LOWER(civicrm_email.email) LIKE '%hoben%' ) ) AND ( 1 ) ORDER BY sort_name asc LIMIT 0, 50;
--snip, snip--

When run through EXPLAIN, it showed that it was using a filesort.

I then added an index on Domain_id and sort_name on the civicrm_contact table and ran ANALYZE TABLE for good measure.

The next search required less time - still an unacceptable amount of time - but only 52 seconds.

--snip, snip--
# Query_time: 52 Lock_time: 0 Rows_sent: 1 Rows_examined: 669790 SELECT civicrm_contact.id as contact_id, civicrm_contact.contact_type as contact_type, civicrm_contact.sort_name as sort_name, civicrm_contact.display_name as display_name, civicrm_individual.id as individual_id, civicrm_individual.first_name as `first_name`, civicrm_individual.middle_name as `middle_name`, civicrm_individual.last_name as `last_name`, civicrm_individual.birth_date as `birth_date`, civicrm_gender.id as gender_id, civicrm_gender.name as `gender`, civicrm_location.id as location_id, civicrm_address.id as address_id, civicrm_address.street_address as `street_address`, civicrm_address.supplemental_address_1 as `supplemental_address_1`, civicrm_address.supplemental_address_2 as `supplemental_address_2`, civicrm_address.city as `city`, civicrm_address.postal_code as `postal_code`, civicrm_address.postal_code_suffix as `postal_code_suffix`, civicrm_address.geo_code_1 as `geo_code_1`, civicrm_address.geo_code_2 as `geo_code_2`, civicrm_state_province.id as state_province_id, civicrm_state_province.abbreviation as `state_province`, civicrm_country.id as country_id, civicrm_country.name as `country`, civicrm_phone.id as phone_id, civicrm_phone.phone_type as phone_type, civicrm_phone.phone as `phone`, civicrm_email.id as email_id, civicrm_email.email as `email`, civicrm_im.id as im_id, civicrm_im.name as `im`, civicrm_contact.legal_identifier as `legal_identifier`, civicrm_contact.external_identifier as `external_identifier`, civicrm_contact.nick_name as `nick_name`, civicrm_contact.home_URL as `home_URL` FROM civicrm_contact LEFT JOIN civicrm_individual ON (civicrm_contact.id = civicrm_individual.contact_id) LEFT JOIN civicrm_gender ON civicrm_individual.gender_id = civicrm_gender.id LEFT JOIN civicrm_location ON (civicrm_location.entity_table = 'civicrm_contact' AND civicrm_contact.id = civicrm_location.entity_id AND civicrm_location.is_primary = 1) LEFT JOIN civicrm_address ON civicrm_location.id = civicrm_address.location_id LEFT JOIN civicrm_state_province ON civicrm_address.state_province_id = civicrm_state_province.id LEFT JOIN civicrm_country ON civicrm_address.country_id = civicrm_country.id LEFT JOIN civicrm_email ON (civicrm_location.id = civicrm_email.location_id AND civicrm_email.is_primary = 1) LEFT JOIN civicrm_phone ON (civicrm_location.id = civicrm_phone.location_id AND civicrm_phone.is_primary = 1) LEFT JOIN civicrm_im ON (civicrm_location.id = civicrm_im.location_id AND civicrm_im.is_primary = 1) WHERE civicrm_contact.domain_id = 1 AND ( ( LOWER(civicrm_contact.sort_name) LIKE '%mcgladrey%' ) OR ( LOWER(civicrm_email.email) LIKE '%mcgladrey%' ) ) AND ( 1 ) ORDER BY sort_name asc LIMIT 0, 50;
--snip, snip--

That second search - 24 seconds faster, now shows up in EXPLAIN as follows:

--snip, snip--
Host: localhost
Database: deompojk_civicspace
Generation Time: Jun 02, 2006 at 09:19 PM
Generated by: phpMyAdmin 2.8.0.2 / MySQL 4.1.19-standard-log
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE civicrm_contact ref domain_id,domain_and__sort domain_and__sort 4 const 47980 Using where
1 SIMPLE civicrm_individual ref contact_id contact_id 4 deompojk_civicspace.civicrm_contact.id 1
1 SIMPLE civicrm_gender eq_ref PRIMARY PRIMARY 4 deompojk_civicspace.civicrm_individual.gender_id 1
1 SIMPLE civicrm_location ref index_entity,index_entity_location_type index_entity 196 const,deompojk_civicspace.civicrm_contact.id 1
1 SIMPLE civicrm_address ref location_id location_id 4 deompojk_civicspace.civicrm_location.id 1
1 SIMPLE civicrm_state_province eq_ref PRIMARY PRIMARY 4 deompojk_civicspace.civicrm_address.state_province... 1
1 SIMPLE civicrm_country eq_ref PRIMARY PRIMARY 4 deompojk_civicspace.civicrm_address.country_id 1
1 SIMPLE civicrm_email ref location_id location_id 4 deompojk_civicspace.civicrm_location.id 1 Using where
1 SIMPLE civicrm_phone ref location_id location_id 4 deompojk_civicspace.civicrm_location.id 1
1 SIMPLE civicrm_im ref location_id location_id 4 deompojk_civicspace.civicrm_location.id 1
--snip, snip--

It's no longer using a filesort; the "domain_and_sort" index is the one I'd added. However, MySQL still reports that it needs to search through 47,980 records to return that ONE record in the correct sorted order.

The next search is the result of searching for "mcgladrey, kayne" - no quotes around that, mind you. As reported by the slow query log:
--snip, snip--
# Query_time: 53 Lock_time: 0 Rows_sent: 1 Rows_examined: 669790
--snip, snip--

For what it's worth, we're using InnoDB tables that were created using the create scripts that shipped with CiviCRM 1.4.

Suggestions on how to optimize this basic search function would be greatly appreciated.

Notes on the server:

--snip, snip--
SHOW VARIABLES LIKE '%query_cache%';
+------------------------------+----------+
| Variable_name | Value |
+------------------------------+----------+
| have_query_cache | YES |
| query_cache_limit | 1048576 |
| query_cache_min_res_unit | 4096 |
| query_cache_size | 10485760 |
| query_cache_type | ON |
| query_cache_wlock_invalidate | OFF |
+------------------------------+----------+
6 rows in set (0.00 sec)
--snip, snip--

--snip, snip--
describe civicrm_contact
Field Type Null Key Default Extra
id int(10) unsigned PRI NULL auto_increment
domain_id int(10) unsigned MUL 0
contact_type enum('Individual','Organization','Household') YES NULL
legal_identifier varchar(32) YES NULL
external_identifier varchar(32) YES NULL
sort_name varchar(128) YES MUL NULL
display_name varchar(128) YES NULL
nick_name varchar(128) YES NULL
home_URL varchar(128) YES NULL
image_URL varchar(128) YES NULL
source varchar(255) YES NULL
preferred_communication_method enum('Phone','Email','Post') YES NULL
preferred_mail_format enum('Text','HTML','Both') YES Both
do_not_phone tinyint(4) YES 0
do_not_email tinyint(4) YES 0
do_not_mail tinyint(4) YES 0
do_not_trade tinyint(4) YES 0
hash int(10) unsigned 0
is_opt_out tinyint(4) 0
--snip, snip--

Options: ReplyQuote


Subject
Views
Written By
Posted
Optimizing query with joins
3302
June 03, 2006 12:18PM
2056
June 07, 2006 10:17AM
2047
February 12, 2007 12:07PM


Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.