Rick James Wrote:
> > Its a bit of a rant
> I would say it is a bit of a "rut". You have
> decided that I/O is the problem, and you have
> decided on how to solve it. My job is to get you
> to back up and question all of your
> > But I need more performance out of them, and the
> rows need to be on a separate disk,...
> You are assuming your OS and device drivers will
> do anything in parallel. That is a big
I'm fairly confident that this is true in my case, I run other custom code in parallel.
> > at best I get 100 a second on my hdds
> Yeah, that's all you can get on standard
I currently have to stick as many tables as I can fit on my SSD as a emergency measure, this keeps up. But I can't fit a whole lot and its all manual moving of files, not to mention is slows up the data gathering process. It looks like I do about 300-500 updates a second with this SSD while "gathering 5,000 a second", my 6 hdds need to achieve collectively this to alleviate the whole issue.
> > there are millions to be updated
> Do they have to be updated _now_? Or can they be
> delayed? Or batched? Or accumulated from a log?
> Or kept in RAM?
Its actually Billions of potential updates, but I estimate several hundred million will be updated. I Load up the whole dataset and scan it and create a table of records that need updating, which is small enough to fit on my SSD, This allows me to pause and keep track.
They don't have to be updated in realtime or any thing, I actually buffer them in a 3G buffer inside of the application constructing the queries. I insert into this buffer from batches 354, arranged by the table id (0-255).
I may try something else, have an "update" table for each table, where all I do is insert records, this is actually very speedy, I can top 10,000 a second. Then update records from this table, to keep it in the cache.
> Have you tuned your cache?-- see
I spare 50% of ram for my keybuffer (4G), and have applied as many memory optimizations as possible. I get about 50% to 90% on the cache hit rate.
> InnoDB? If so, please provide SHOW STATUS LIKE
> 'innodb%'; -- some settings have a big impact on
> disk hits. Are you using transactions?
Not using InnoDB. I could try this, but only if all other options fail.
> Are the rows to be updated "clustered" at all?
> That is, could they be "near" each other on disk?
Yes and no, Due to the nature of the overhead associated with obtaining the data to update with, its all random for maximum parallel-ability. Note that I will try a separate "update" table as above to combat this type of hard to cache access.
On other note, Data striping using partitioning is not going to work, see the other conversation in this thread. I will have to implement the striping in the application itself utilizing separate databases on each of the drives.
Both these thoughts should help, as the new round of updates will be enormous, unlike the previous ones for which were only "timed out" data records.
Thanks for your time!