Running one query in parallel on lots of computers
Thanks a lot for the answer Jan!
I would like to use MySQL proxy to run one query in parallel on lots of computers.
My database has a huge table, which is split horizontally, each part being placed on its own node (or computer). What I specifically want to use MySQL Proxy for, is to run a query like the following in parallel on the nodes:
select count(*) from t group by c;
At the end of this post you can see an example on how this query could be run in parallel.
If it is possible to implement this using MySQL proxy, then MySQL will have its first solution for running one query in parallel across several nodes (there is a large demand for this functionality).
The critical point is whether MySQL Proxy can send queries to several nodes in parallel without blocking after sending the first query to the first node?
Example proving that a query can be run across many tables (each table could be placed on a different node) and still retreve the correct result:
mysql> create table sales_partition1 (productid int, customerid int);
Query OK, 0 rows affected (0.11 sec)
mysql> create table sales_partition2 (productid int, customerid int);
Query OK, 0 rows affected (0.11 sec)
mysql> create table sales_partition3 (productid int, customerid int);
Query OK, 0 rows affected (0.22 sec)
mysql> insert into sales_partition1 (productid, customerid) values (1, 1), (1, 2);
Query OK, 2 rows affected (0.11 sec)
Records: 2 Duplicates: 0 Warnings: 0
mysql> insert into sales_partition2 (productid, customerid) values (1, 3), (2, 1);
Query OK, 2 rows affected (0.11 sec)
Records: 2 Duplicates: 0 Warnings: 0
mysql> insert into sales_partition3 (productid, customerid) values (2, 2), (1, 4);
Query OK, 2 rows affected (0.11 sec)
Records: 2 Duplicates: 0 Warnings: 0
mysql> select * from sales_partition1;
+-----------+------------+
| productid | customerid |
+-----------+------------+
| 1 | 1 |
| 1 | 2 |
+-----------+------------+
2 rows in set (0.09 sec)
mysql> select * from sales_partition2;
+-----------+------------+
| productid | customerid |
+-----------+------------+
| 1 | 3 |
| 2 | 1 |
+-----------+------------+
2 rows in set (0.09 sec)
mysql> select * from sales_partition3;
+-----------+------------+
| productid | customerid |
+-----------+------------+
| 2 | 2 |
| 1 | 4 |
+-----------+------------+
2 rows in set (0.09 sec)
mysql> select productid, sum(thecount) from (
-> (select productid, count(*) as thecount from sales_partition1 group by productid) union all
-> (select productid, count(*) as thecount from sales_partition2 group by productid) union all
-> (select productid, count(*) as thecount from sales_partition3 group by productid)) as combinedResultSet group by productid;
+-----------+---------------+
| productid | sum(thecount) |
+-----------+---------------+
| 1 | 4 |
| 2 | 2 |
+-----------+---------------+
2 rows in set (0.11 sec)