Retrieve a random row, but fast.
Posted by:
Nils A.
Date: November 28, 2007 07:21AM
Hi Folks, I am a newbie to this forum, so please apologize for any inconveniance first.
I was trying to retrieve random rows from a table which is quite loaded (> 50000 rows). Suppose this table to be a "user" table, with a primary key id and a varchar(300). I want to pick a random user.
The construct
SELECT * FROM user ORDER BY rand() LIMIT 1
is not satisfying, since the whole table has to be read and sorted (to my opinion).
So I found out the following:
1. Getting a random "rownum" or sortof that works, by:
SELECT (CEIL(RAND() * COUNT(*))) from user
COUNT(*) is a constant and to be fetched in O(1), since I use MyISAM tables.
2. Putting that in a subquery like
SELECT * FROM user WHERE id > (
SELECT (CEIL(RAND() * COUNT(*))) from user
) LIMIT 1;
is not working, but it should, shouldn't it?
I get a constant row, which is always the same??
EXPLAIN ... delivers:
+----+----------------------+---------------+-------+---------------+---------+---------+------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+----------------------+---------------+-------+---------------+---------+---------+------+------+--------------------------+
| 1 | PRIMARY | user | index | NULL | PRIMARY | 4 | NULL | 3097 | Using where; Using index |
| 2 | UNCACHEABLE SUBQUERY | user | index | NULL | PRIMARY | 4 | NULL | 3097 | Using index |
+----+----------------------+---------------+-------+---------------+---------+---------+------+------+--------------------------+
And that's right. Its a subquery, and since it is computed with RAND(), one cannot cache it. But why do I get always the same row? A bug?
If I replace the star by the attribute I initially wanted to retrieve, id, the outcome is not always constant, but the variance of the values of id is much bigger than the variance of the results I get.
I can work with a temporary variable, which works perfectly.
SELECT @rand_no := (RAND() * COUNT(*)) from community_uni;
SELECT id FROM user WHERE id > @rand_no LIMIT 1;
But I don't understand why the above solution was not working.
Greets, Nils