MySQL Forums
Forum List  »  Newbie

Re: SUM problem.
Posted by: Chris Preston
Date: March 18, 2009 12:52AM

Hi Patric,

There is a clean and relatively simple solution to your problem involving plain old SQL with some variables and some joins. I have provided the solution with a detailed explanation herein. I have provided the detailed explanation because your question poses an interesting combination of concepts when coming up with a correct solution and therefore I thought it might be helpful to you as well as others (forum trolls with non-answers notwithstanding) in understanding how to solve problems like this and many other kinds of problems as well.

First of all, the basic concept is that you need to create new data in a query to go with your current data. Wouldn't it be nice if you had a "game_no" column that was in the same order as the points that each player scored in each tourney?

Well, let's see if we can do that then.

First, so as to make sure we're talking about the same problem, let me recap. We have players such that

player 1 played in tourneys 1-11
player 2 played in tourneys 3-5, 7,8,9
etc

And we have points scored in each tourney such that we would like to order to top scores within each player and then take only the top 8.

If I have that correct, then my English language answer with assumed english language table names and english language columns (well, worry, I'm sadly mono-lingual) is as follows:

select * from
(
select winners.player_name, sum(winners.points) grand_totals
from(
select
player_name,
case when @lastId!=result.player_id then
@pos:=1
else
@pos:=@pos+1
end as game_no,
@lastId:=result.player_id as player_id,
result.points
from ( select @pos:=-1 as junk1) var1
join ( select @lastId:=-1 as junk2) var2
join (select * from tourney_result order by player_id, points desc) result
join player players on players.player_id=result.player_id
) winners
where winners.game_no < 9
group by winners.player_name
) final_result
order by grand_totals desc;

Now then, that might be a bit much to consume in one bite, therefore, I shall decompose the solution for you so that you can solve this sort of thing yourself in the future.

First, we look at the inner most select:

select
player_name,
case when @lastId!=result.player_id then
@pos:=1
else
@pos:=@pos+1
end as game_no,
@lastId:=result.player_id as player_id,
result.points
from ( select @pos:=-1 as junk1) var1
join ( select @lastId:=-1 as junk2) var2
join (select * from tourney_result order by player_id, points desc) result
join player players on players.player_id=result.player_id

This does several things for us. First of all, let's start with the 'from' and work our way down and then back up since select statements begin their execution starting with the from clause. So, we have created two junk rows with variables in them (var1 and var2). To explain what this does, let me take you through the from and join arguments first.

set @pos to -1, set @lastId to -1 ... Note that I have assumed your id columns are numeric and will be using numeric values to compare with your 'id' column - this would also work for a varchar, date or other type 'id' column and we'd just have to get the initial value right (e.g. 'xxxnotplayeridxxx', '1900-01-01', etc).

Each of these variables is turned into a single column row by the from and the join. So essentially we have

row 1 from result set called var 1 contains column junk1 with value -1

this is then "joined" to the next singleton row var2 consisting of the singleton column junk2 also having a value of -1

These columns are "junk" because what we've really done is create 2 variables in memory and initialized them to the values -1 in each case. Note that this sort of auto variable creation is not present in all databases, but we are only concerned with MySql at the moment.


So, by the end of the first join, we have a single row that looks like this:
junk1 junk2
-1 -1

Now, we will join to all the values in our tourney table. What is important here is that we already want to have our tourney table "sorted" (not really, you need an index to avoid sorting, more on that in a bit) in order of player_id and points in descending order for points (i.e. 99, 98, 97, etc). So, assuming our table has values like

player_id points

8 10
8 52
9 23
9 45

our sub query -

(select * from tourney_result order by player_id, points desc)

gives us:

player_id points

8 52
8 10
9 45
9 23

now then, since this "sub select" is "joined to the preceding values, we get the following:

-1, -1, 8, 52
-1, -1, 8, 10
-1, -1, 9, 45
-1, -1, 9, 23


Next, we join with our player table that has our player_id and player_name columns such that we get rows looking like

-1, -1, 8, 52, 8, 'Frodo'
-1, -1, 8, 10, 8, 'Frodo'
-1, -1, 9, 45, 9, 'Bilbo'
-1, -1, 9, 23, 9, 'Bilbo'

Notice that our join resulted in the player_id being included twice in the rows that will result with the first player id being called result.player_id and the second player_id being called players.player_id because these are the "alias names" that I chose for my "sub query" from tourney_result and my aliasing of the table player to "players". It's a good idea to always alias everything in from arguments and joins irrespective of whether the dbms in question requires that you do so in each case (some do, some don't).

Now, we are ready to work with the data and thus we look at the actual logic in the select:

select
player_name,
case when @lastId!=result.player_id then
@pos:=1
else
@pos:=@pos+1
end as game_no,
@lastId:=result.player_id as player_id,
result.points

Alright then, player_name seems obvious enough and note that I didn't prefix it with an alias. I would consider that a stylistic on my part because it's really a good idea to always prefix column names with the appropriate alias for clarity, particularly when you're doing a number of joins. So, really players.player_name would be more stylistically correct, but I got lazy typing the query in :)

Now then, on our next line, we do a case state such that if the current player_id that we're looking at in the current row is not equal to the last player id then we set @pos to 1 and return that value, else we increment @pos and return that value and we alias this to "game_no". Next, we reset the @lastID variable to the current player_id. TAKE NOTE, the order is important here. We do not want to reset the variable @lastId before we have used it in our case statement. Notice that the order of the columns in our select statement is also the *execution* order and we can rely on that behavior programmatically as I have done here - first check @lastId, then reset it. Next, of course, we have our result.points that we want to do our calculation on.

Therefore, our result set looks like the following:

'Frodo', 1, 8, 52
'Frodo', 2, 8, 10
'Frodo', 1, 9, 45
'Frodo', 2, 9, 23


Now, we can see that we have a nice result set such that we can do a select where the game_no is less than some maximum, in your case, less than 9, so:

select winners.player_name, sum(winners.points) grand_totals
from(
select
...
) winners
where winners.game_no < 9
group by winners.player_name

The where clause, "winners.game_no < 9" is obvious enough and then we have the more important "group by winners.player_name" What our group by does is to allow us to use "aggregation functions" on a group that we define in our group by such that the number of arguments in the group by determine the set of rows against which the aggregation functions are applied thereby resulting in a subset of rows that is the moral equivalent of "totals" within categories, so to speak.

Finally, since we'd probably like our result to show the players in the order of highest to lowest points so that we can make the good players feel good and depress the weaker ones (just kidding), we do the following:

select * from
(
...
) final_result
order by grand_totals desc;

giving us the player_name and grand_totals in a single row ordered descending (high to low).

Ok, now if you recall, I said I'd mention where you need an index. In our hypothetical tourney_result table that would be similar to yours, we would likely have the following columns:

create table tourney_result(
tourney_result_id int,
tourney_id int,
player_id int,
points int,
primary key (tourney_result_id)
);

in our innermost query, we did

(select * from tourney_result order by player_id, points desc)

if you will recall. Well, when we do an order by, this is a useful hint as to where we might want to add an index. In this case, we would like to be able to fetch rows already ordered by player_id and points descending. If we don't have an index, the dbms will have to sort the rows in the select. If we do have an index, it will not have to sort. Therefore, we ought to put an index on tourney_result such as:

create idx_tourney_result_1 on tourney_result(player_id, points desc);

Now then, of course in your problem you talk about having tourneys and dates and such but you don't mention having tourneys from one year to the next. So, in reality, you'll probably want to narrow your select to a given year's worth of tourneys and you can do that by joining your tourney table with the tourney_result table in the inner most select query and providing a date range for the relevant tourneys. You could also rewrite your sql to have just the year portion of the date and do your group by on year as well as player name, so that your results would be something like:

2007 'James T. Kirk' 984
2007 'Spock of Vulcan' 944
2008 'Frodo Baggins' 1010
2008 'Bilbo Baggins' 922

(note how the Hobbits are beating the heck out of the taller humanoids in this example)

Anyway, I will leave that extra bit of SQL magic to you as a practical exercise to solve your problem in a slightly broader fashion.

Now then, there are some important lessons here:

#1 if you don't have the data in your tables that you need to solve a problem, find a way to create it using variables or temporary tables

#2 you can do programmatic style queries that avoid the use of stored procedures if you put your mind to it in most cases - seldom is the case where you really, really cannot manipulate data using a stored procedure and in most of those cases it is because the data model has problems rather than the problem being unsolvable with SQL

#3 joins are important, but like everything, they are seldom the final answer to a problem as some have spuriously suggested in their responses to you - to solve problems *well* in a database you should learn not only ANSI standard SQL, but what other features are available to you in the database you are using.

on a slightly more cynical note:

#4 if lots of people are answering more than once, they probably don't know the answer themselves but really like to respond to forum threads

#5 it's seldom useful to respond to people that aren't actually answering your question - a better tactic is to post something like "well nobody's really answered this yet, anyone out there know the actual answer?"

In any event, I hope that this solution to your problem and the explanation cause you to have some interest and curiosity in reading more about relational databases like MySql and the SQL. At the end of the day, a relational database is nothing more than a fancy tool to do set operations like the kinds of "sub set," "super set," "intersection," "disjoint set" and so forth that you probably learned in elementary school or middle school. Thus, things like "joins" are about doing intersections of various kinds. I will leave the learning of what the "kinds" of intersections are as something for your future reading list :)

I hope that I have fully answered your question ("LOLLOLOLOL" as my nephews say using their 'Arby N the Chief' voices - whatever that is, I haven't figured it out yet, hah hah) and I wish you the best of luck in whatever game it is that your data refers to (Counter-Strike? Bowling? Canasta? :) )

I used to play Counter-Strike (started playing when I was 38 :) but don't have the reflexes now, and that is the other reason that your question caught my attention. Always willing to help out a fellow gammer :)

best of luck to you,


--cbp

Patric Andersson Wrote:
-------------------------------------------------------
> Or if it's even SUM I should use..... well, on to
> describe my problem.
> Ok, to make it short so I can understand a simple
> answer how to solve this :)
> Ok, I have a database containing :
> *Userinfo
> *Tournamentinfo
> *Resultinfo
>
> Ok, what I info I have.
> In a year we have 11 tournaments.
> Not every user is in every tournament...
> I want to "print" a sorted table based on the top
> 8 (out of the 11) results of the tournaments. (I
> hope that one made sense).
>
> Lets say we have users (A,B,C,D,E... etc).
> And we have 11 torneys (1,2,3... etc)
> And then in result I save how many points each
> user gets in each torney....
> So basicly A1=5, A2=0, A3=10 etc...
>
> Now, if I wanted to SUM A1 - A11 (that's no
> problem), but what I want to do is just sum the
> top 8 out of A1 - A11, then sum the top 8 out of
> B1 - B11 and so on.
>
> Then I want to show it sorted by the sum of the
> top8 results.
>
> I have no problem sorting it by the sum of all 11,
> but how do I sort it by the sum of just the top 8
> out of the 11 ?
>
> I hope this all made sense on what I want to do.

Options: ReplyQuote


Subject
Written By
Posted
March 12, 2009 03:29AM
March 12, 2009 09:52AM
March 13, 2009 07:10PM
March 13, 2009 11:10PM
March 14, 2009 04:01PM
March 14, 2009 06:40PM
March 15, 2009 07:54AM
March 15, 2009 10:41AM
March 15, 2009 10:35PM
March 16, 2009 09:33AM
March 16, 2009 09:52AM
March 16, 2009 02:46PM
March 16, 2009 04:15PM
March 17, 2009 02:11PM
March 17, 2009 05:03PM
March 17, 2009 06:31PM
March 17, 2009 10:17PM
March 18, 2009 06:03AM
March 15, 2009 05:41PM
March 15, 2009 07:14PM
March 16, 2009 08:34AM
Re: SUM problem.
March 18, 2009 12:52AM
March 21, 2009 05:58AM
March 21, 2009 11:13AM
March 22, 2009 05:01PM


Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.