For quite a while, I've been wondering what to do with movies or games which
have been rated with a small number of people, such as the ones found on IMDb
or BoardGameGeek: users can give a 0 – 10 rating to items, and the users'
average is then shown. Is an obscure movie rated 8 by 42 people actually worth
an 8?
Putting aside the actual usefulness of those ratings, I wanted to know how
much uncertainty there is on those values, knowing only the arithmetic mean
and the number of people who have voted.
Given population (i.e., all the users who have seen the film or played the
game and who could vote), when we take a sample (i.e., the users who have
rated the item) of size n, we get a sample mean m. But if another sample
had been taken, we would have gotten a slightly different sample mean. How
much much different would these two values be? In other words, how close is
this value from the value we would get if the whole population would have
rated the item?
The standard error of the mean
indicates how much the sample mean m can vary by calculating its standard
deviation sm = s/√n. The problem is that it is
based on the standard deviation s of the population, which is unknown. Given
that we do not know the users' individual ratings, we cannot either calculate
the standard deviation of the sample.
All is not lost, however: since the ratings are always between 0 and 10, we can
estimate the maximum value of the standard deviation. My best guess is that
this value is maximal when half the ratings are 0 and the other half are 10,
leading to a standard deviation of 5.
Given that the means of 96% of the samples of size n falls within
±2sm of the population's mean, we know that the
mean of the population is very likely to be between
m−10√n (but no less than 0) and
m+10√n (but no more than 10). So this obscure movie
rated 8 by 42 users is worth between 6.5 and 9.5. Given that on IMDb movies
rated above about 7 are reasonably good, the obscure movie may be worth
watching.