---
title: "Analysing Pitchfork using Pandas"
date: 2013-12-24
tags: post
---

I spend a likely-unhealthy amount of time on [Pitchfork](http://www.pitchfork.com/), its where I get my music news and I can usually rely on their reviews to decide whether or not an album is worth a listen. Still, they often come under fire as being — amongst other things — self-serious and overly critical: allegations have been made that their albums are graded on a too-harsh scale, with their reviews being motivated by commercial reasons as much as musical ones.

So, naturally, I downloaded [all of them.](https://classic.scraperwiki.com/scrapers/pitchfork_review_data/)

* * *

I decided to load the thing into Python (using the wonderful [pandas](http://pandas.pydata.org/) library) and poke around.

```
import pandas as pd
DATE_INDEX = -2
review_data = pd.read_csv('./pitchfork_review_data.csv’, parse_dates=[DATE_INDEX])
```

The immediate curiosity for me was that of score distribution: Pitchfork grades on a 0.0 — 10.0 scale, so one would expect that the average is 5.0, right?

Well, let’s take a look:

```
review_data[‘score’].describe()
```

score

count

14919.000000

mean

6.969562

std

1.356199

min

0.000000

25%

6.400000

50%

7.200000

75%

7.800000

max

10.000000

**Out of all 14900 reviews, the average is 6.97 — talk about grading on a curve.** Additionally, half of all reviews fall between a 6.4 and a 7.8 — a pretty significant window considering the general sense of outrage given to reviews that throw out scores less than a 5.0 and the general ‘king making’ power of a Best New Music accolade (generally given to artists that score an 8.2 or higher).

Actually, speaking of Best New Music, let’s take a look at that.

```
review_data[review_data.accolade == ' Best New Music '].describe()
```

score

count

500.000000

mean

8.619400

std

0.328602

min

7.800000

25%

8.400000

50%

8.500000

75%

8.800000

max

10.000000

```
''' head() will give us the five lowest scoring reviews '''
review_data[review_data.accolade == ' Best New Music '].sort('score').head()    
''' tail() will give us the five highest '''
review_data[review_data.accolade == ' Best New Music '].sort('score’).tail()
```

Looks like the lowest score given to a BNM is 7.8 (given to [!!!’s Me and Giuliani Down by the Schoolyard](http://pitchfork.com/reviews/albums/1766-me-and-giuliani-down-by-the-school-yard-a-true-story-ep/), a groan-inducing name if I’ve heard of one. Conversely, the three highest scores handed down to new music are 9.6, 9.7, and a controversial 10.0 to _The Fiery Furnaces_, _Arcade Fire_, and _Kanye West_ respectively.

Back to the overall score distribution, though, percentile data only gives us one perspective at the data. Graphing the rounded scores yields some interesting results:

```
import matplotlib as plt
import matplotlib.pyplot as pyplt
pyplt.hist(review_data['score'])
pyplt.show()
```

![](http://i.imgur.com/uahiyEM.png)

As expected, there’s a clustering of reviews in the 6-8 range, with a long tail approaching 0 and a steep drop off to 10. But if we increase the granularity:

```
pyplt.hist(review_data['score’], bins=20)
pyplt.show()
```

![](http://i.imgur.com/Gd8IYC3.png)

```
pyplt.hist(review_data['score’], bins=50)
pyplt.show()
```

![](http://i.imgur.com/C6wnpJW.png)

We get a much more interesting perspective. In particular, Pitchfork loves their 7.5s and 8.2s. Also revealing is the relative frequency of perfect scores: mainly reserved for Beatles and jazz reissues, one can imagine the backlash if a reviewer deems _Kind of Blue_ less than perfect.

\--—

Another charge often levied at _Pitchfork_ is that their standards have diminished as they’ve gained a larger viewership. We can try simply plotting the reviews against their publish date, but it’s not much help:

```
daily_data = review_data.groupby("publish_date")['score'].mean()
daily_data.plot()
pyplt.show()
```

![](http://i.imgur.com/SsVUbka.png)

There’s too much noise to get a good impression of any overall trends: while it looks like things tend to oscillate around the 7.0 mark, we can try plotting the mean review score of each month to get a clearer picture.

```
monthly_data = daily_data.resample('M', how='mean')
monthly_data.plot()
pyplt.show()
```

![](http://i.imgur.com/Ra3kjXg.png)

Quite a bit clearer: we can attribute the early flux to the fact that Pitchfork’s first few years, they were only publishing one or two reviews per week as opposed to five a day. It looks like averages were relatively steady, with a slight dip from 2007 — 2010, but we can run a regression to make sure:

```
monthly_data.plot()
monthly_frame = monthly_data.reset_index()
total_points = len(monthly_data)
model = pd.ols(y=monthly_frame[0], x=pd.DataFrame(range(0, total_points)), intercept=True)
```

Wow: with a RMSE of .6757 (not great, but not awful), we get a line with an intercept of **6.977** and slope of **.000037** — as in, barely any change at all.  

\--—

Lastly, let’s take a look at the reviewers themselves: it’s not exactly out of the realm of possibility that certain critics are sticklers and others are more generous (I mean, anyone who gave _Merriweather Post Pavilion_ a 9.6 can’t have the highest standards, right?)

```
reviewer_data = review_data.groupby('reviewer')['score']
aggregated_reviewers = reviewer_data.mean()
aggregated_reviewers.sort(‘mean’)
```

Skipping over group reviews, the strongest authors at either extreme:

reviewer

average score

Bob O. McMillan

3.5

[Alan Smithee](http://en.wikipedia.org/wiki/Alan_Smithee)

4.0

Adam Ohler

4.2

…

…

Carl Wilson

8.5

Philip Welsh

8.6

Drew Daniel

8.6

* * *

That’s all I’ve got for now — I hope you found it interesting, either from a programming perspective or a musical one! Feel free to download the [csv](https://classic.scraperwiki.com/scrapers/pitchfork_review_data/) and play around with it yourself — if there are any questions you’d like me to answer (or suggestions for further analysis), please let me know via email or comment.