I need help with figuring out how many self-joins I need to do. Here's my problem:
I have a table containing unique image filenames:
Quote
CREATE TABLE IF NOT EXISTS `images` (
`id` smallint(5) unsigned NOT NULL AUTO_INCREMENT,
`filename` varchar(64) NOT NULL,
PRIMARY KEY (`id`)
);
INSERT INTO `images` (`id`, `filename`) VALUES
(1, 'face12417.jpg'),
(2, 'face15518.jpg'),
(3, 'face17123.jpg'),
...
(43229, 'face43824.jpg');
The images are of photos of people's faces. I have another table representing pairwise comparison of the photos, and those pairs that match (i.e., it's the same person appearing in both photos) are indicated with 'y', like this:
Quote
CREATE TABLE IF NOT EXISTS `pairs` (
`id1` smallint(5) unsigned NOT NULL,
`id2` smallint(5) unsigned NOT NULL,
`match` char(1) NOT NULL DEFAULT '',
PRIMARY KEY (`id1`,`id2`)
);
INSERT INTO `pairs` (`id1`, `id2`, `match`) VALUES
(1, 2, 'n'),
(1, 3, 'y'),
(1, 4, 'y'),
...
(43229,43228,'n');
I would like to write a query that, given one photo, lists all photos that match it. If a photo can match at most one other (i.e., a person's face can appear in at most two photos), then I don't need to do a JOIN; the rows of <b>pairs</b> table represents all the pairwise matches. If a person's face can appear at most in 3 photos, I'd only have to do a self join, like this:
Quote
SELECT * FROM `pairs` p1
CROSS JOIN pairs p2 ON (p1.id1=p2.id1 OR p1.id1=p2.id2) AND p1.match='y' AND p2.match='y'
WHERE p1.id1<>p2.id1 AND p1.id2<>p2.id2
But I don't know how many times a person's face can appear in the photos. Is there a general solution for N? Is there an initial query I can/should run on the <b>pairs</b> table to see how many self-joins I need to do?
Thanks.