MySQL :: Re: distinct vs. group by

New Topic

Re: distinct vs. group by

Posted by: Rick James
Date: August 24, 2010 11:52PM

Tom's comments (I did not read them all) are pretty much correct for MySQL -- InnoDB or MyISAM.

Think of the semantics --
SELECT DISTINCT
means to dedup the output. It does not say anything about what order the output is in.

SELECT ... GROUP BY
is designed (in all(?) database engines) to guide the use of "aggregate" functions (COUNT, SUM, ...). Again it says nothing about order. But wait... MySQL has a slight extension. In MySQL (only), GROUP BY xx implies ORDER BY xx, unless there is an explicit ORDER BY, even if it is ORDER BY rand().

So, you ask, won't both of those be implemented by sorting the data, etc? Well, not necessarily. Suppose your engine (not MySQL's) was smart enough to process partitions in parallel. Then the output might not be sorted.

Or, what if the dedupping or GROUPing could be done by a hash table (which MySQL sometimes does). Then the output might be in random order.

I think I have seen the same query (except for a value in the WHERE clause) without an ORDER BY come out either ordered or unordered. You are at the mercy of the optimizer.

Bottom line...
* Use DISTINCT for dedupping -- that's what it tells the reader.
* Use GROUP BY for aggregates -- that's what it is for.
* Always add on an ORDER BY (even if it is redundant), unless you really don't care.

Navigate: Previous Message• Next Message

Options: Reply• Quote

Subject

Written By

Posted

distinct vs. group by

Henrik Genssen

August 23, 2010 11:45AM

Re: distinct vs. group by

Rick James

August 24, 2010 11:52PM

Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.