MySQL Forums
Forum List  »  General

Re: Query with > 500000 entities in in where clause crashes server
Posted by: Rick James
Date: June 14, 2010 08:02PM

If you have an older version of MySQL,
key_buffer_size 8489271296
may actually be truncated to something less than 4G, since that used to be a hard limit.

If it actually is 8G, then I would argue that it is higher than it should be for a 16GB machine. The key_buffer holds only MyISAM index blocks. The rest of spare RAM is used (by the OS) for data blocks. So, you have more space reserved for indexes than data. 4G might be better. (But this probably won't explain the problem.)

Probably the EXPLAIN is taking so long because of the 374000 Entities in the IN clause.

One thing that might help (I really don't know), is to put those long lists into tables (temp tables, if appropriate), be sure to have appropriate indexes, then see if the suitable JOIN runs faster.

These might be worth increasing:
max_heap_table_size 16777216
myisam_sort_buffer_size 36700160
tmp_table_size 33554432

For "With server limits I meant something like the achitectural limits of mysql: e.g. the number of max allowed where/joins conditions, max. column length, max allowed entities in a where statement and the like." -- This is something like that:
http://forums.mysql.com/read.php?21,25724,224521

The only one you are threatening is the size of the IN and/or the size of the SELECT, which might be controlled by max_allowed_packet, which may be defaulted to 8MB. Feels like the SELECT is about 4MB long.

I don't know if this is impacting things: "group by WebsiteId, CustomerId, CampaignId, PlacementId -- Why do you have this? You don't have any aggregates; you could use DISTINCT instead; you don't have an ORDER BY."

"where a.date=20100430" vs.
"where a.Date >= 20100430 and a.Date <= 20100430"
There _could_ be a big difference between these...
If you have a compound INDEX(..., Date, ...), the latter will stop at Date, the former may continue on to the fields after that.

In general, one flavor of "best" index when you have
WHERE a=... AND b=... AND c=... AND d IN/BETWEEN/</>/GROUP/ORDER/etc
is
INDEX(a,b,c,d)
because it can do the "=" on a,b,c, then do a 'range scan' on d.

Because of that, I suggested
INDEX(CustomerId, NetworkPlacementId, SubNetworkPlacementId, Date, PlacementId)
And suggested the change to a.date=20100430"

None of your existing indexes will do as good.

Another "best" is, as you call it "closed query", wherein the query can be executed entirely in one index. The EXPLAIN would indicate by saying "Using index". That is, if _all_ the fields (including Date!) in a SELECT are in a single index. But, to make best use of such an index, it still needs the fields ordered (in the index definition) with the '=' fields first. You started with WebsiteId, so, it would effectively have to scan the entire index; that index would be far bigger than your key_buffer, or even your entire RAM, so it would take about as long as it takes to read the entire index of 742M rows (several gigabytes).

Starting with WebsiteId is good for the GROUP BY, and may let it avoid the 'filesort'. But that is not where your query is bogged down.

Alas, doing an ALTER to add index(es) will take many hours because of how big the table is.

What is your disk subsystem like? RAID striping would help. SSD would help.

How did the frontend user choose 374000 campaign ids? Sounds like there is some "category" of campaigns. Maybe that should be in the table, instead of this huge list? Or maybe a summary table?

Options: ReplyQuote




Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.