A web developer's blog. PHP, MySQL, CakePHP, Zend Framework, Wordpress, Code Igniter, Django, Python, CSS, Javascript, jQuery, Knockout.js, and other web development topics.

Getting the Bayesian Average for rankings (PHP / MySQL)

When you need to correctly display the leader-board based on ratings, you can’t just display the average rating for each entry.

------------------------------------
Restaurant A | 1 Vote(s) | Rating 10
------------------------------------
Restaurant B | 3 Vote(s) | Rating 6
                         | Rating 5
                         | Rating 4
------------------------------------
Restaurant C | 2 Vote(s) | Rating 3
                         | Rating 10

To correctly rank the data above you need to get the Bayesian Average for each restaurant. We have to take into account the number of votes for each restaurant. More votes would push it up the ranking while less votes would have little weight. This means that we if 1 restaurant had 1 vote and with rating of 10, it would not be the number one. (If we just got the average, it would be at the top! – and that would be wrong)

//In PHP
$avg_num_votes = 2;   //The average number of votes for all restaurants (1+3+2)/3 = 2
$avg_rating = 6.3333; //The average rating for all restaurants (10+6+5+4+3+10)/6 = 6.3333
$this_num_votes = 3;  //The number of votes for current restaurant (Restaurant B)
$this_rating = 5;     //The average rating for current restaurant (Restaurant B: (6+5+4)/3 = 5)
 
$bayesian_average = (($avg_num_votes * $avg_rating) + ($this_num_votes * $this_rating)) / ($avg_num_votes + $this_num_votes);

Using the formula above, we would have the following ratings for each restaurant:

Restaurant A would have 7.55553 = ((2*6.3333) + (1*10)) / (2+1)
Restaurant B would have 5.53332 = ((2*6.3333) + (3*5)) / (2+3)
Restaurant C would have 6.41665 = ((2*6.3333) + (2*6.5)) / (2+2)

A quick solution using MySQL View Tables

Create View `ratings` AS
SELECT
    restaurant_id,
    (SELECT count(restaurant_id) FROM ratings) / (SELECT count(DISTINCT restaurant_id) FROM ratings) AS avg_num_votes,
    (SELECT avg(rating) FROM ratings) AS avg_rating,
    count(restaurant_id) as this_num_votes,
    avg(rating) as this_rating
FROM
    ratings
GROUP BY 
    restaurant_id

To get the ratings for the restaurants:

SELECT 
    restaurant_id, 
    ((avg_num_votes * avg_rating) + (this_num_votes * this_rating)) / (avg_num_votes + this_num_votes) as real_rating 
FROM `ratings`

Would output something like:

restaurant_id    real_rating
1                7.555533333333
2                5.533320000000
3                6.416650000000

Sources:
http://blog.linkibol.com/2010/05/07/how-to-build-a-popularity-algorithm-you-can-be-proud-of/
http://snipplr.com/view/22559/

This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>