I am having trouble working out a way to improve this query.
The problem is the group by object_id as there are about half as many object_ids as rows in the table.
The aim of the query is to count the number of objects that were accessed per person, where only the most recent person to access the object gets counted.
select person_id, count(id)
from (select id, person_id from events group by object_id) as subquery
group by person_id;
The query takes just over 2 minutes with about 1 million events.
Below is the create statement for the event table.
create table events(
id int not null auto_increment,
object_id int not null,
person_id int not null,
primary key(id),
index(object_id),
index(person_id)
) ENGINE=InnoDB;
To recreate this I used the following:
create table numbers(i int unsigned not null);
insert into numbers(i) values (0), (1), (2), (3), (4), (5), (6), (7), (8), (9);
insert into numbers(i) select a.i * b.i * c.i from numbers as a cross join numbers as b cross join numbers as c;
insert into events(object_id, person_id)
select
floor((select count(i) from numbers) / 2 * rand()),
floor(10 * rand())
from numbers as a
cross join numbers as b;
Query OK, 1020100 rows affected (4 min 2.54 sec)
select person_id, count(id)
from (select id, person_id from events group by object_id) as subquery
group by person_id;
+-----------+-----------+
| person_id | count(id) |
+-----------+-----------+
| 0 | 52 |
| 1 | 54 |
| 2 | 44 |
| 3 | 44 |
| 4 | 47 |
| 5 | 43 |
| 6 | 48 |
| 7 | 56 |
| 8 | 61 |
| 9 | 56 |
+-----------+-----------+
10 rows in set (2 min 15.38 sec)
Any help improving this would be much appreciated.