Here’s a post from Hitsville raising an interesting point about Metacritic scores. The case looks pretty clear-cut: movie critics are harsher and more discerning than music ones.

Harsher, yes. More discerning? Well. Allow me to get all research wonkish for a minute (or don’t, and scroll down to the section headed “WHY?”). If you look at the lists of current movies and music on Metacritic, it’s true that the average ratings are much higher for music, and it’s also true that the movie reviews “use” a wider range of scores. But not that much wider. The music reviews use a 53-point range: just over half the scale. The movie ones use a 67-point range: two thirds of the scale. That’s still a lot of scale going unused!

This is a problem of rating scales. If you have a 10 point rating scale, then ten points means good and one point means bad, so six means “OK”. Right? Not necessarily. What actually happens is a cultural skew effect: nearly always, you’ll find marks clustering at the top end of the scale, and in some places the bottom end is hardly used.

A direct comparison of Dutch and Chinese rating scale answers, for instance, wouldn’t be a lot of use: Dutch respondents use the lower end a whole lot more. This type of comparison is exactly what Metacritic is doing, though: its red-yellow-green overlay straddles the middle of the scale, which the movie critics use a lot and the music critics don’t. So the colour scheme exaggerates the extent to which the film guys use the whole scale.

So how do you deal with comparisons across cultures – those Dutch and Chinese respondents, for instance? Partly, you look at the distribution as well as the mean, which lets you normalise the results.

For an experiment I took all the scores – 47 for movies, 123 for music – on the front pages of metacritic, took an average, and indexed them on that average. The indices go higher and lower at their extremes for movies – as you’d expect, since the movie critics use more of the scale. But this gives you a better at-a-glance reading of an album or film’s relative standing within their artform than the raw rating data.

But what about the colour scheme – which is an even better at-a-glance guide? Well, the magic of stats gives us a way to reimpose red-yellow-green in a way that works better. If you take the standard deviation of the index scores, and give red (bad) to anything more than 1sd below 100 and green (good) to anything more than 1sd above, you’ve got yourself a method which makes the colours a bit more meaningful.

What does this do to the scores? Here’s an Excel spreadsheet to demonstrate! Around two-thirds of both films and music end up in yellow: music critics are slightly less likely to push an album into green but the distribution is pretty similar across both media. The movies list becomes better at discriminating between true stinkers and populist mediocrities: Beverley Hills Chihuahua and Paul Blart:Mall Cop sit at the bottom end of yellow, Bride Wars remains firmly in red. The music list on the other hand improves its discrimination between fine and excellent records – of the four artists Hitsville raises its eyebrow over, only Chris Isaak remains in green (so maybe he should check him out!).


None of which answers Hitsville’s original question of why music critics seem to have a different culture than movie reviewers. Hitsville blames it on our old friend “popism” – which he characterises as journalists giving favourable reviews to popular stuff because that’s where the money is. I’m not sure about this, because it doesn’t look to me like that many “big” and popular releases are getting glowing metascores here. The answer for me lies in the huge imbalance between the number of movies and the number of albums that are released.

As I understand it, most ‘tenured’ film critics are expected to see and review a high proportion of films on general release: this just isn’t the case for music critics, who tend to work by pitching or being assigned stuff they’re interested or expert in. Therefore they are far less likely to involuntarily encounter a record they think is crap – disappointment rather than rage is their primary negative emotion. And this is reflected in the skew of their marks.

There’s another important question though, which is: do readers care about the skew? My hunch is that a regular reader of a review source will have a pretty good idea of what each grade means. At Pitchfork, for instance, when I give a record less than 7.0 I will often get links or mail or tweets treating it as effectively a pan – even though a P4K score of 6.1-6.9 is green (good) in the metacritic score! The readers know where the average is, even if they haven’t sat down and calculated it.