Average rating solution

CORRECT SOLUTION: Score = Lower bound of Wilson score confidence interval for a Bernoulli parameter

Say what: We need to balance the average rating with the uncertainty of a small number of observations. Fortunately, the math for this was worked out in 1927 by Edwin B. Wilson. What we want to ask is: Given the ratings I have, there is a 95% chance that the “real” average rating is at least what? Wilson gives the answer. For simplicity we suppose that there are only positive ratings with value 1 and negative ratings with value 0. Then this lower bound on the average rating is given by:

(For a lower bound use minus where it says plus/minus.) Here p is the fraction of positive ratings (observed), zα/2 is the (1-α/2) quantile of the standard normal distribution, and n is the total number of ratings. If that doesn’t makes sense to you, maybe this Ruby code will:
require 'statistics2'
 
def ci_lower_bound(pos, n, power)
    if n == 0
        return 0
    end
    z = Statistics2.pnormaldist(1-power/2)
    phat = 1.0*pos/n
    (phat + z*z/(2*n) - z * Math.sqrt((phat*(1-phat)+z*z/(4*n))/n))/(1+z*z/n)
end
pos is the number of positive rating, n is the total number of ratings, and power refers to the statistical power: I would pick 0.05.

Now for any item that has a bunch of positive and negative ratings, use that function to arrive at a score appropriate for sorting on, and be confident that you are using a good algorithm for doing so.

Leave a Reply Cancel reply

Recent Entries

Pet Projects

Archives

Links