Optimization of application to reduce db workload and improve performance
Posted by:
Jan Becker
Date: December 15, 2018 03:57PM
We have built an application, which runs on 40 virtual boxes (VMs) in parallel. Each virtual box performs certain tasks. The tasks are defined and prioritized though a MySQL database table. The table contains all tasks and documents, when they are done.
Thus the table looks as follows:
Row1: Task1 Priority = 1 Status = done
Row2: Task2 Priority = 1 Status = done
Row3: Task3 Priority = 1 Status = done
Row4: Task4 Priority = 1 Status = work in progress
Row5: Task5 Priority = 2 Status = work in progress
Row6: Task6 Priority = 6 Status = to do
Row7: Task7 Priority = 3 Status = to do
...
Row2,500,00: Task2,500,000 Priority = 5 Status = to do
The virtual boxes contain applications, which connect through ODBC to the database/ table.
At the moment each VM
1) catches the next available task (SELECT Row FROM ToDoList WHERE Status = "to do" ORDER BY Row LIMIT v, 1) (v = number of VM)
2) blocks the task (UPDATE ToDoList SET Status = "work in progress" WHERE Row = Rowx)
3) documents, that the task was done successfully or sets it back, if there was an error (UPDATE ToDoList SET Status = "done" WHERE Row = Rowx or ... Status = "to do" (if the task failed))
We have some performance problems. It's unclear, whether they are due to the application as such (nothing to do with ODBC or MySQL), OBDC or MySQL.
My ideas were:
1) Instead of having one large to do list, I would have a master to do list and synchronize once a day with a shorter list. The applications just run 6 a.m. to midnigth. Thus I could replicate after midnight and write the daily workload (around 40,000 tasks) to a smaller table. The smaller table would be created daily and wouldn't include tasks already done.
2) Maybe there is something wrong with the indexes of the table. Which indexes should it have? I currently have (Status, Priority), (Row) and (Priority)
3) Sometimes I see, in PHPMyAdmin, that the table is locked. Maybe I should do 1) and split the smaller table into 40 tables, so the VMs can work with one table each.
4) Maybe it is mainly an ODBC problem. I could retrieve more than 1 task, block multiple tasks and finally write the results for all tasks (use LIMIT 10*v, 10) in the first SQL statement
What do you think? I don't want to do things, which don't make sense, but create efforts. With 2,500,000 rows, if I use PHPMyadmin, the first SQL statement requires (as per PHPMyadmin) 0.004 seconds. Thus I don't see much need to further improve it. What about the 4th idea? Could ODBC be the problem and would ths fix it? Any other ideas? Comments to the ideas?