I've been flipping around the idea of how to compare posts (or stories, or other objects) with given ratings, but different numbers of moderations. I think I've got a possible solution.
Percentiles.
The idea this is related to is to be able to do something akin to Slashdot's HoF (hall of fame), but meaningful. So, say, the top-rated comments for a day (or the prior 24 hours or other specified period). Or (more significantly), top diary entries -- diaries right now are a mess pretty much in that you have to wade through and find ones that are worth reading. Being able to find recent, highly-ranked diary entries would be a Good Thing.
Percentiles figures in as follows. Assume that the distribution of scrores (0-5) is uniform. A given moderation has an equal chance of being any one value. So, for a comment with a rating of '5', its probability is:
- 1: 20%
- 2: 4%
- 3: 0.8%
- 4: 0.16%
- 5: 0.03%
...or, inverting to get percentiles, 80th, 96th, 99.2th, 99.84, 99.97, respectively. Note that this assumes a uniform distribution of moderation values, not likely in the real world. In actuality, you'd count the number of objects within a cohort (having an equal number of moderations) and count the percentage of objects having a score lower than the given value. In practice, you'd probably want to compute specific percentile cutouts.
Percentiles would only be used to differentiate objects having equal scores. So, a given '5' moderated object, whether it has one moderation or 50, always rates above a '4.99'.
Hmmm...OK, while percentiles would be interesting, the practical upshot is that you're always placing the post with more moderations higher up the ranking than the one with fewer, for an equivalent moderation score. For ranking the algorithm suggested in the article I'm responding to is a simpler and less computationally intensive approach.