MySQL query for top 10 rated items, average rating included

Discussion in 'Web Design and Development' started by SrWebDeveloper, May 7, 2008.

  1. SrWebDeveloper macrumors 68000

    SrWebDeveloper

    Joined:
    Dec 7, 2007
    Location:
    Alexandria, VA, USA
    #1
    I've got MySQL 4.1 and some legacy software that allows users to rate the quality of an item in a poll, rated from lowest to highest (1-5).

    I need to create a SQL query that on its own generates a top 10 list of highest rated items which returns as columns the ranking, pollID and average rating per poll (minumum 6 votes). I am seeking help on the average rating column.

    The simplified structure of the table that stores the poll results:

    Code:
       
      PollID    Rating   Votes
      -----     ------   -----
      1         5        100
      1         4        80
      1         3        20
      1         2        18
      1         1        2
      ...and so on...
    
    I did a little research and I think what I need (or close) is a Bayesian estimate such as the commonly used formula documented here.

    Problem is, I'm not sure what to plug in and where. Once I see that I can go from there (set sort order and limit, add additional select fields that exist in the real table, etc.) Needs to be a query as the software uses the query results to display the top 10 report. The real report is more detailed than this, of course.

    Thank you!

    -jim
     
  2. angryboffin macrumors newbie

    Joined:
    May 7, 2008
    Location:
    London
    #2
    SQL Without the Ranking

    Hello there,

    This should provide the top 10, without the ranking (ordered by rating first, PollID second to ensure consistency):

    Code:
    Select
    	PollID,
    	sum(Results.Rating*Results.Votes)/sum(Results.Votes)
    			as Simple_Average,
    	( sum(Results.Votes )/( Sum(Results.Votes) + 6) ) * 
    		sum(Results.Rating*Results.Votes)/sum(Results.Votes)
    	+ (6 / ( Sum(Results.Votes) + 6) ) ) * Mean.mean_vote_across_dim
    			as Bayesian_Estimate_Formula
    From
    	TableName Results,
    	(
    		Select
    			sum(Rating*Votes)/sum(Votes)
    				as mean_vote_across_dim
    		From
    			TableName
    	) Mean
    ORDER BY
    	( sum(Results.Votes )/( Sum(Results.Votes) + 6) ) * 
    		sum(Results.Rating*Results.Votes)/sum(Results.Votes)
    	+ (6 / ( Sum(Results.Votes) + 6) ) ) * Mean.mean_vote_across_dim DESC,
    	PollID
    limit 10
    
    I've recently started a job with mySQL 5.1 and I haven't got access to an editor at home, so this may not work. Thanks for the link to the Bayesian formula; I've never seen that before.

    Regards,

    Joshua

    Edit: I should add that your minimum value of 6 is hardcoded into the formula; if you wish to make the query easier to edit you could add in another scalar subquery in the From clause {e.g. (Select 6 as Min_Value From TableName limit 1) Min_Value}. Also, TableName is, obviously, the name of table.
     
  3. angryboffin macrumors newbie

    Joined:
    May 7, 2008
    Location:
    London
    #3
    While I'm at it...

    Hello again,

    This should do the whole caboodle, including returning those in the top 10 regardless of how many there are.

    Code:
    Select
    	count(Distinct B.PollID) + 1 as Ranking,
    	A.PollID,
    	A. Bayesian_Estimate_Formula
    From
    (
    Select
    	PollID,
    	( sum(Results.Votes )/( Sum(Results.Votes) + 6) ) * 
    		sum(Results.Rating*Results.Votes)/sum(Results.Votes)
    	+ (6 / ( Sum(Results.Votes) + 6) ) ) * Mean.mean_vote_across_dim
    			as Bayesian_Estimate_Formula
    From
    	TableName Results,
    	(
    		Select
    			sum(Rating*Votes)/sum(Votes)
    				as mean_vote_across_dim
    		From
    			TableName
    	) Mean
    ) A
    LEFT OUTER JOIN
    (
    Select
    	PollID,
    	( sum(Results.Votes )/( Sum(Results.Votes) + 6) ) * 
    		sum(Results.Rating*Results.Votes)/sum(Results.Votes)
    	+ (6 / ( Sum(Results.Votes) + 6) ) ) * Mean.mean_vote_across_dim
    			as Bayesian_Estimate_Formula
    From
    	TableName Results,
    	(
    		Select
    			sum(Rating*Votes)/sum(Votes)
    				as mean_vote_across_dim
    		From
    			TableName
    	) Mean
    ) B
    ON	A.Bayesian_Estimate_Formula < B.Bayesian_Estimate_Formula
    GROUP BY
    	A.PollID,
    	A. Bayesian_Estimate_Formula
    HAVING
    	count(Distinct B.PollID) + 1 <= 10
    ORDER BY
    	A. Bayesian_Estimate_Formula DESC,
    	A.PollID
    
    It's not pretty, but it'll do unless you have many poll IDs. I know you weren't looking for the whole solution, but I find it's easier for me to hammer out the entire thing than to try and work out which part of something wasn't understood.

    Regards,

    Joshua
     
  4. SrWebDeveloper thread starter macrumors 68000

    SrWebDeveloper

    Joined:
    Dec 7, 2007
    Location:
    Alexandria, VA, USA
    #4
    :cool:

    Thank you very, very much, Joshua. I could not ask for better advice than a complete working sample. I see what you did with the handling of the formula using subqueries as well as the recent changes and it is VERY much appreciated. I can now take relevant parts and optimize for my needs since your query shows me the overall context.

    I'm going to work on it today on my dev server and I will post a followup to let you know how it came out or if any questions along the way. Again, thank you for the above and beyond effort, you've inspired me.

    Cheers.

    -jim

    ps: Congratulations on your new project! :)
     
  5. SrWebDeveloper thread starter macrumors 68000

    SrWebDeveloper

    Joined:
    Dec 7, 2007
    Location:
    Alexandria, VA, USA
    #5
    After a few minor adjustments here is your query tested and working perfectly on my dev server using actual data:

    Code:
    Select
        count(Distinct B.PollID) + 1 as Ranking,
        A.PollID,
        round(A.Bayesian_Estimate_Formula,2) as Rating,
        A.Total_Votes as 'Total Votes'
    From
    (
    Select
        PollID, sum(Results.votes) as Total_Votes,
        ( sum(Results.votes )/( Sum(Results.votes) + 6) ) * 
            sum(Results.answers*Results.votes)/sum(Results.votes)
        + (6 / ( Sum(Results.votes) + 6) )  * Mean.mean_vote_across_dim 
                as Bayesian_Estimate_Formula
    From
        poll_answers as Results,
        (
            Select
                sum(answers*votes)/sum(votes)
                    as mean_vote_across_dim
            From
                poll_answers as pa
        ) as Mean
        group by PollID
    ) as A
    LEFT OUTER JOIN
    (
    Select
        PollID,
        ( sum(Results.votes )/( Sum(Results.votes) + 6) ) * 
            sum(Results.answers*Results.votes)/sum(Results.votes)
        + (6 / ( Sum(Results.votes) + 6) )  * Mean.mean_vote_across_dim
                as Bayesian_Estimate_Formula
    From
        poll_answers Results,
        (
            Select
                sum(answers*votes)/sum(votes)
                    as mean_vote_across_dim
            From
                poll_answers
        ) as Mean
        group by PollID
    ) as B
    ON    A.Bayesian_Estimate_Formula < B.Bayesian_Estimate_Formula
    GROUP BY
        A.PollID,
        A.Bayesian_Estimate_Formula
    HAVING
        count(Distinct B.PollID) + 1 <= 10 
        and A.Total_Votes > 1
    ORDER BY
        A.Bayesian_Estimate_Formula DESC,
        A.Total_Votes desc,
        A.PollID
    LIMIT 10
    
    I had to adjust a few minor syntax errors with the parenthesis, add group by statements in A and B sub queries and customized a tad, i.e. sort order, limit and rounding off of the rating. In the production version more information from other tables to create the report plus most static settings (min votes in Bayesian, min total votes and limit, sort order based on top/bottom report) will be replaced by variables.

    I'm sure others will find this a great starting point for their projects, so once again - thank you. :cool:

    Below is example result of a top 10 report (rating 1-5, 5=best) -- lookin' good...
     

    Attached Files:

    • sql.gif
      sql.gif
      File size:
      3.3 KB
      Views:
      4,727
  6. angryboffin macrumors newbie

    Joined:
    May 7, 2008
    Location:
    London
    #6
    You're very welcome. Looking at that list of my minor mistakes I see that IDEs are more helpful than I realised.

    Regards,

    Joshua

    P.S. Thanks for the congratulations; it's more of a full time job than a new project. There are a team of three of us in a race against time; we have to replace the data warehouse before we go mad maintaining their legacy systems.
     

Share This Page