The usual - help optimizing create/select match-up for 2-column db
Hello all,
Could someone please give me a hand optimizing this mysql schema/query duo? My interest is speed of selects only. This database is static - no inserts, updates or anything else to worry about.
The basic schema is:
CREATE theTable (
user_id smallint unsigned,
characteristic_id smallint unsigned
);
Users have provided a list of characteristics, and each of those characteristics have been mapped to a preset list of charcteristic id's. (Example, between the age of 50 and 55 = characteristic id 456; lives in New York = characteristic id 2687) There are 3,000 possible characteristic ids, but your average user would be identified by only about 1/5 of those, so each user will have ~600 entries. There are about 20,000 users.
The basic query is
"count users with characteristic x and characteristic 2"
in other words,
SELECT COUNT(*) from theTable A inner join theTable B on (A.user_id = B.user_id) where (A.characteristic_id = x and B.characteristic_id = y);
My question is, how to best apply indexes/keys to the table and how to best match query structure to that for maximum speed.
I've tried:
CREATE theTable (
user_id smallint unsigned,
characteristic_id smallint unsigned,
PRIMARY KEY (user_id, characteristic_id)
);
In this case, doing an EXPLAIN on the query mentioned above gave:
+-------+--------+---------------+---------+---------+----------------+----+------+--------------------------+
| table | type | possible_keys | key | key_len | ref + | rows | Extra |
+-------+--------+---------------+---------+---------+----------------+----+------+--------------------------+
| B | ref | PRIMARY | PRIMARY | 2 | const + | 3007 | Using where; Using index |
| A | eq_ref | PRIMARY | PRIMARY | 4 | const,B.user_id | 1 | Using where; Using index |
+-------+--------+---------------+---------+---------+----------------+----+------+--------------------------+
Then I tried
CREATE theTable (
user_id smallint unsigned,
characteristic_id smallint unsigned,
PRIMARY KEY (user_id, characteristic_id)
KEY(user_id), KEY(characteristic_id)
);
And EXPLAIN on the same query gave:
+-------+--------+-----------------------------+----------+---------+--------------------+------+--------------------------+
| table | type | possible_keys | key | key_len | ref | rows | Extra |
+-------+--------+-----------------------------+----------+---------+--------------------+------+--------------------------+
| A | ref | PRIMARY,respondent,response | response | 2 | const | 3372 | Using where |
| B | eq_ref | PRIMARY,respondent,response | PRIMARY | 4 | A.user_id,const | 1 | Using where; Using index |
+-------+--------+-----------------------------+----------+---------+--------------------+------+--------------------------+
Unfortunately I'm abysmal at interpreting these EXPLAIN results
I expected the second to be faster than the first, but it wasn't - which I suspect has more to do with something I'm doing wrong than the implimentation of keys themselves.
If it's of importance, I'm using Perl DBI as an interface.
Thanks,