MySQL Forums
Forum List  »  General

Sophisticated Name Lookup
Posted by: Mark M
Date: September 23, 2009 11:52PM

I am trying to create a sophisticated Ajax name lookup control for a set of approximately 100,000 names. I have created the following tables:

CREATE TABLE IF NOT EXISTS `acme`.`persons` (
`id_person` MEDIUMINT UNSIGNED NOT NULL AUTO_INCREMENT COMMENT,
`lname` VARCHAR(45) NOT NULL COMMENT 'last name' ,
`fname` VARCHAR(45) NOT NULL COMMENT 'first name',
`mname` VARCHAR(45) NOT NULL COMMENT 'middle name',
`nname` VARCHAR(45) NOT NULL COMMENT 'nick name',
<... additional fields of information about the person ...>
PRIMARY KEY (`id_person`) ,
ENGINE = MyISAM
AUTO_INCREMENT = 0;

CREATE TABLE IF NOT EXISTS `acme`.`persons_search` (
`person_id` INT UNSIGNED NOT NULL COMMENT foreign key for id_person in acme.persons,
`lname` VARCHAR(45) NOT NULL COMMENT stripped lname from acme.persons,
`fname` VARCHAR(45) NOT NULL COMMENT stripped fname from acme.persons,
`nname` VARCHAR(45) NOT NULL COMMENT stripped nname from acme.persons,
`lsname` CHAR(20) NOT NULL COMMENT metaphone value of lname,
`fsname` CHAR(20) NOT NULL COMMENT metaphone value of fname,
`nsname` CHAR(20) NOT NULL COMMENT metaphone value of fname,
`lname_fname` VARCHAR(90) NOT NULL COMMENT CONCAT(lname,fname),
`lname_nname` VARCHAR(90) NOT NULL COMMENT CONCAT(lname,nname),
`fname_lname` VARCHAR(90) NOT NULL COMMENT CONCAT(fname,lname),
`nname_lname` VARCHAR(90) NOT NULL COMMENT CONCAT(nname,lname),
`lfs` CHAR(40) NOT NULL COMMENT metaphone value of CONCAT(lname,fname),
`lns` CHAR(40) NOT NULL COMMENT metaphone value of CONCAT(lname,nname),
`fls` CHAR(40) NOT NULL COMMENT metaphone value of CONCAT(fname,lname),
`nls` CHAR(40) NOT NULL COMMENT metaphone value of CONCAT(nname,lname),
PRIMARY KEY (`person_id`) )
ENGINE = MyISAM;

The first table, "persons", is the table with all the information about the person and the "persons_search" table has some stripped data (explained later) from the first table and some other information (metaphones and concatenations of the stripped data) used in the search queries. Metaphones are the phonetic representations of the string. The metaphones will help catch typographical errors, i.e., Williams typed Wiliams. The goal here is to find people by typing in characters in an Ajax style input box and use those characters as "starting with" or "containing" characters to search the fields. Any apostrophes, spaces and hypens are stripped before being compared. O'Reilly is stored as OReilly in the acme.persons_search.lname field, but is stored as O'Reilly in the acme.person.lname field. $token is the variable holding the stripped search term from the Ajax input. The minimum number of input characters searched is two.

For example, if they type in John, I would like to find Johnson, Johnston from the lname field and John as a fname field (So a sample return set for John might be (lname, fname)--> Johnson,Robert|Johnston,William|Smith,John. If they type a comma in the string, then it would split the token into two pieces, the lname would be on the left side of the comma and fname on the right side, otherwise it will be fname lname. To make things more complicated, if there was an entry William Johnston, I would like to return that if they searched for Bill (as Bill is a common nickname for William and Bill would be in the nname field). Here is an example of a query I am proposing to use.

SELECT
ps.person_id, p.lname, p.fname, p.nickname, <... additional p. fields ...>
FROM acme.persons_search ps
INNER JOIN acme.persons p
ON ps.person_id = p.id_person
WHERE ps.lname LIKE '%$token%' OR ps.fname LIKE '%$token%' OR ps.nname LIKE '%$token%' <... additional LIKE ORs for the other fields except metaphone which would be an = ...>
ORDER BY ps.lname LIKE '$token%',ps.fname LIKE '$token%'


My Questions:
1) Should I break this into two tables as I am doing?
2) Would FULLTEXT work for me at all?
3) Should I use a UNION or SUB QUERIES to optimize the ORs?
4) Should I use InnoDB instead of MyISAM (I am using 5.1 for dev but this will go live on 5.4)?
5) Should I Index every field in persons_search (this table will be read many more times that it will be updated). Doesn't using %xxx% notation force a full table scan and indexes would be not effective?
6) Is there a better approach to this problem?

Thanks for any help or suggestions.

Options: ReplyQuote


Subject
Written By
Posted
Sophisticated Name Lookup
September 23, 2009 11:52PM
September 26, 2009 11:47AM
September 26, 2009 05:40PM
September 26, 2009 10:04PM
September 26, 2009 11:09PM


Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.