Efficient MySQL subquery/percentage calculations

Discussion in 'Web Design and Development' started by big_malk, Apr 2, 2010.

  1. big_malk macrumors 6502a

    Joined:
    Aug 7, 2005
    Location:
    Scotland
    #1
    I've developing an online game where people can come first - fourth in each game, and I want a league table showing the people with the highest percentage of 1sts for all the games they've played.
    This is the MySQL query I've got at the moment, but it's taking ~16 seconds and it's still in private testing with about 400 tables (a table = a game), so when it [hopefully] become the most popular game ever, it'll have 1,000s of games to work with and it'll take forever.
    Is there a more efficient way to do this?

    Code:
    SELECT (
    (SELECT COUNT(*) FROM tables WHERE first=players.playerID GROUP BY first) / 
    (SELECT COUNT(*) FROM tables WHERE status='finished' and 
         (first=players.playerID or second=players.playerID or third=players.playerID or fourth=players.playerID) GROUP BY playerID)
    *100) as first_percentage, userName 
    FROM tables JOIN players WHERE 1=1 GROUP BY playerID 
    ORDER BY first_percentage DESC LIMIT 10
    I also have a table linking each player to each game that I could use to count the games, but I thought joining another table could be more resource intensive, but does each subquery use more resources?
    I also have a test player who's won 133% of the game, but I think this is just because of anomalous data before the system was working right, or is this query just not right?

    Thank you :)
     
  2. jpyc7 macrumors 6502

    Joined:
    Mar 8, 2009
    Location:
    Denver, CO
    #2
    I started considering a solution, but I realize that your table is not normalized. You'll want a table "ranking" that keeps the finished games and their player rankings.

    Instead of having columns, (first, second, third, fourth), you want a column "rank" whose values will be an integer (where they placed). Other columns are "player_id", "game_id".

    If you insert into the table only when games are finished, then you would do a query similar to:
    Code:
    SELECT r.player_id, SUM(CASE WHEN r.rank=1 then 1 else 0)/SUM(1) as pct FROM ranking r GROUP BY player_id;
    Obviously, you have to insert a set of rows for a single game in a transaction to prevent bad data. For every game that a player participated in, the player should have one row in the "ranking" which will be a unique combination of "game_id" and "player_id". So that will be your table key.

    If you want to know how often a player "ranked" in the top 4, then you expand the CASE WHEN clause into multiple OR conditions.

    Of course, I may have made errors in the above, but I hope you get the general idea.
     
  3. ChrisA macrumors G4

    Joined:
    Jan 5, 2006
    Location:
    Redondo Beach, California
    #3
    Two things..

    1) rethink how your tables are designed. IF you don't understand "normal form" then just use the rule here every row in a table contains ONE real thing and not some composite of several. So a table might contain the result of a game or data for a person and NEVER a person and how he did on game. But really just read up on normal form and any DBMS text book.

    2) Never do a "Select *" Because later when you change the table (you WILL) the change breaks all you SQL code. Always list the fields out.

    3) MySQL can have rather poor performance as it scales so test a prototype early for the case of MANY users doing queries at the same time. Typically you get into issues with locks that you will never see with just testing by one user. This only happens with many people trying to both read and write to the same tables. You will need to write scripts to do this kind of testing.

    For example in your case, can data be written while those 16 second select quarries are taking place?
     
  4. big_malk thread starter macrumors 6502a

    Joined:
    Aug 7, 2005
    Location:
    Scotland
    #4
    Excellent! Thanks! :)
    I do have that data saved in another table actually, but it'll be around 2 − 4 times larger so might be slower, I was totally wrong, your way is well over 100 times faster!
    There is an error in what you suggested though, if anyone finds this looking for the same solution, here what I've used:

    Code:
    SELECT players.playerID, 
    SUM(CASE WHEN playerGames.status='1st' THEN 1 ELSE 0 END)/SUM(1)*100 as first_percentage, 
    userName 
    FROM tables JOIN playerGames JOIN players ON 
    players.playerID = playerGames.playerID and tables.tableID = playerGames.tableID WHERE tables.status='finished' GROUP BY playerID 
    ORDER BY first_percentage DESC LIMIT 10;
     

Share This Page