Creating Index for Many to Many Junction Table
I'm trying to associate 'keywords' to 'products'. Obviously a keyword can belong to more than one product and vice versa. This is a 3rd party database and originally they stored all keywords for each product in its row in a single column (`all_keywords`) with delimiters for each keyword.
CREATE TABLE `catalog_products` (
`PUBLISHR` char(3) NOT NULL default '',
`NUMBER` varchar(50) NOT NULL default '',
`PROPER_TITLE` varchar(255) default NULL,
`ALL_KEYWORDS` text,
`CAT_SUBJECT` varchar(255) default NULL,
PRIMARY KEY (`publisher`,`number`),
CONSTRAINT `FK_catalog_products` FOREIGN KEY (`PUBLISHER`) REFERENCES `catalog_publisher` (`id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8 MAX_ROWS=250000 CHECKSUM=1 DELAY_KEY_WRITE=1 ROW_FORMAT=DYNAMIC
I wanted to normalise this so I used PHP to parse the all the unique keywords and place them in their own table. Which enabled me to create this.
CREATE TABLE `catalog_keyword` (
`keyword` varchar(250) collate utf8_unicode_ci NOT NULL,
PRIMARY KEY (`keyword`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
I used the same script to associate the correct keywords to the correct products in a junction table whose schema is as follows.
CREATE TABLE `catalog_product_keyword` (
`publisher` char(3) NOT NULL,
`number` varchar(50) NOT NULL default '',
`keyword` varchar(100) NOT NULL default '',
`status` char(1) NOT NULL default 'A',
PRIMARY KEY (`publisher`,`number`,`keyword`),
CONSTRAINT `FK_products_PK` FOREIGN KEY (`publisher`, `number`)
REFERENCES `catalog_products` (`publisher`, `number`)
ON DELETE CASCADE
ON UPDATE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8 CHECKSUM=1 DELAY_KEY_WRITE=1 ROW_FORMAT=FIXED
The problem is that when I perform a search, I only know the KEYWORD on which I'm searching on. Consequently, my query will not be able to take advantage of using the index since I'm not providing it a PUBLISHER and NUMBER which make up the rest of the composite primary key for this table.
For example:
SELECT *
FROM `catalog_product_keywords_assoc`
WHERE `keyword` = 'boat';
The above query will not be able to use the index and runs very slow (10+ seconds) but returns the correct results.
SELECT *
FROM `catalog_product_keywords_assoc`
WHERE `keyword` = 'boat'
AND `publisher` = 'TFA'
AND `number` = 'abc123';
The above query here will return instantaneous however will only return one specific result.
How should I change my design around to handle searching? This is still in the developmental stage and not live anywhere so I'd be willing for a full overhaul.
Note: There are approxiamately:
160,000 products
50,000 keywords
2.4 million keyword-product associations
Edited 1 time(s). Last edit at 02/21/2008 10:20PM by Raphael Thomas.