A web developer's blog. PHP, MySQL, CakePHP, Zend Framework, Wordpress, Code Igniter, Django, Python, CSS, Javascript, jQuery, Knockout.js, and other web development topics.

Average rating solution

Quoted from evanmiller.org:

CORRECT SOLUTION: Score = Lower bound of Wilson score confidence interval for a Bernoulli parameter

Say what: We need to balance the average rating with the uncertainty of a small number of observations. Fortunately, the math for this was worked out in 1927 by Edwin B. Wilson. What we want to ask is: Given the ratings I have, there is a 95% chance that the “real” average rating is at least what? Wilson gives the answer. For simplicity we suppose that there are only positive ratings with value 1 and negative ratings with value 0. Then this lower bound on the average rating is given by:
(For a lower bound use minus where it says plus/minus.) Here p is the fraction of positive ratings (observed), zα/2 is the (1-α/2) quantile of the standard normal distribution, and n is the total number of ratings. If that doesn’t makes sense to you, maybe this Ruby code will:

require 'statistics2'
def ci_lower_bound(pos, n, power)
    if n == 0
        return 0
    z = Statistics2.pnormaldist(1-power/2)
    phat = 1.0*pos/n
    (phat + z*z/(2*n) - z * Math.sqrt((phat*(1-phat)+z*z/(4*n))/n))/(1+z*z/n)

pos is the number of positive rating, n is the total number of ratings, and power refers to the statistical power: I would pick 0.05.

Now for any item that has a bunch of positive and negative ratings, use that function to arrive at a score appropriate for sorting on, and be confident that you are using a good algorithm for doing so.

This entry was posted in General and tagged , , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>