Dear All,
I've been developing a script that will be running simulations on some clusters we have at the university. As part of this study I have to run many different combinations of parameters (approx 500000+) and so I've stored a list of each parameter combination in a mysql database. Each process checks the database to find the next set of parameters that need processing. So the script looks a bit like as follows:
#!/bin/bash
declare -r WAITING_TO_RUN="0"
declare -r RUNNING="1"
declare -r FINISHED="2"
declare -r HOST="mysql.somehost.com"
declare -r USER="myusername"
declare -r PASSWORD="mypassword"
# This is the main query string that isn't working correctly
declare -r MYSQLQUERY="START TRANSACTION; SELECT RUN_ID FROM ParametersTable WHERE RUN_STATUS=$WAITING_TO_RUN ORDER BY RUN_ID LIMIT 1; UPDATE ParametersTable SET RUN_STATUS=$RUNNING WHERE RUN_STATUS=$WAITING_TO_RUN ORDER BY RUN_ID LIMIT 1; COMMIT"
read RUN_ID <<!
$(echo $MYSQLQUERY | mysql -h $HOST -u $USER -p$PASSWORD -P 3306 ParametersDatabase | tail -1 | sed 's/\r//g;')
!
# Then do some things using the parameter run ID variable RUN_ID
# Finally change the status to "finished" once the simulation is done
echo "UPDATE ParametersTable SET RUN_STATUS=$FINISHED WHERE RUN_ID=$RUN_ID LIMIT 1" | mysql -h $HOST -u $USER -p$PASSWORD -P 3306
Each process pulls off the next list of parameters that has the RUN_STATUS=$WAITING_TO_RUN and *should* at the same time change the RUN_STATUS to $RUNNING to stop other processes from trying to run the same set of parameters at the same time. However, I am getting lots of problems with the processes clashing with each other because many are trying to pull off the parameter list with the same RUN_ID. Now I understand why the following SQL wouldn't work:
SELECT RUN_ID FROM ParametersTable
WHERE RUN_STATUS=$WAITING_TO_RUN
ORDER BY RUN_ID
LIMIT 1;
UPDATE ParametersTable
SET RUN_STATUS=$RUNNING
WHERE RUN_STATUS=$WAITING_TO_RUN
ORDER BY RUN_ID
LIMIT 1;
I understand that this is because whilst one process is processing the SELECT query another could be updating the parameters table at the same time. Viola, we have multiple processes running the same parameters. However, I thought the following would run just fine:
START TRANSACTION;
SELECT RUN_ID FROM ParametersTable
WHERE RUN_STATUS=$WAITING_TO_RUN
ORDER BY RUN_ID
LIMIT 1
FOR UPDATE;
UPDATE ParametersTable
SET RUN_STATUS=$RUNNING
WHERE RUN_STATUS=$WAITING_TO_RUN
ORDER BY RUN_ID
LIMIT 1;
COMMIT;
It was my understanding that the FOR UPDATE modifier to the select statement would lock the selected rows for reading AND writing until the changes where commited. This would block out rival processes trying to run the parameters associated with the same RUN_ID. This doesn't happen with this script however. Is this due to this bug:
http://bugs.mysql.com/bug.php?id=18184 or (more likely) due to some massive flaw in my understanding of what the FOR UPDATE modifier does?
Does anyone know of any work-arounds that would achieve what I'm trying to do?
Many, many thanks in advance