MySQL Forums
Forum List  »  Newbie

Re: distinct vs. group by
Posted by: Rick James
Date: August 24, 2010 11:52PM

Tom's comments (I did not read them all) are pretty much correct for MySQL -- InnoDB or MyISAM.

Think of the semantics --
SELECT DISTINCT
means to dedup the output. It does not say anything about what order the output is in.

SELECT ... GROUP BY
is designed (in all(?) database engines) to guide the use of "aggregate" functions (COUNT, SUM, ...). Again it says nothing about order. But wait... MySQL has a slight extension. In MySQL (only), GROUP BY xx implies ORDER BY xx, unless there is an explicit ORDER BY, even if it is ORDER BY rand().

So, you ask, won't both of those be implemented by sorting the data, etc? Well, not necessarily. Suppose your engine (not MySQL's) was smart enough to process partitions in parallel. Then the output might not be sorted.

Or, what if the dedupping or GROUPing could be done by a hash table (which MySQL sometimes does). Then the output might be in random order.

I think I have seen the same query (except for a value in the WHERE clause) without an ORDER BY come out either ordered or unordered. You are at the mercy of the optimizer.

Bottom line...
* Use DISTINCT for dedupping -- that's what it tells the reader.
* Use GROUP BY for aggregates -- that's what it is for.
* Always add on an ORDER BY (even if it is redundant), unless you really don't care.

Options: ReplyQuote


Subject
Written By
Posted
August 23, 2010 11:45AM
Re: distinct vs. group by
August 24, 2010 11:52PM


Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.