Goals for Analyse

So with the data collected in my previous blog, I saw where in the country the most Premier League goals have been scored so far this season. One surprising fact that I noticed was by county, if not overall too, it wasn’t Leicester at the top or Aston Villa at the bottom that came up. So both for that reason and in general, I decided to use the same data to see how home goals affected their position in the league.

So I downloaded the fusion table as a CSV file and then opened R Studio. I used read.csv command to read the file into R Studio. I then installed the ggplot2 library and used qplot to make the following graph.


Going across from left to right we have the team top of the league in position number 1 right down to the team at the bottom of the league in position 20. There are certainly some interesting things here. Both 2nd and 4th are ahead of 1st, in fact, 4th is ahead of everyone and by some margin (Man. City again).

The two obvious one’s for me is the team in 12th in the league is second highest in the graph, and the team in 19th (second last) actually has more then multiple teams above them, including the team in 3rd.

With some surprising results, I was very interested to see if there was any correlation between goals scored at home and position in the league. I used the cor.test function in R to do this. The result was the following:


As a general rule, if the p-value is below 0.05, then there is a correlation between the two data points. 0.009 is a very strong correlation with that in mind. Its not the biggest surprise but its important to prove.

So home goals are indeed important, so how are the teams doing with that in mind? Although you can also use the graph above, I decided to make a bar graph to further analyse the point. I made a bar graph with the data, then I got an average goals scored of 20.9 out of the 20 teams, and then I added a line representing the mean to the graph.

To push this one step further I used R to work out the standard deviation and, in combination with the mean, added lines representing the higher and lower deviations on the graph. The standard deviation is 6.42. This was the result:


It gives a reasonable indication of where the teams are at in this area. The deviation is mostly good even with 2 teams above the higher deviation and 3 teams (one narrowly) below the lower deviation. It more highlights some of the extreme cases we saw in the original graph.

So, that’s it, or at least so far. I think this data is very interesting and if I have the time I would like to do similar analysis on away results or even the normal table. Equally I could see how goals conceded compares to goals scored, and I can compare all this data at the end.

I’ll probably do that next but I have other ideas as well.

While doing my other maps, I had the idea to do one for average attendance for each club/county.

As you can guess from the ranges, the percentage attendance tends to be really high with 5 counties over 99%. Its also not hard to figure out the reasons for other ones. Aston Villa occupy the orange Under 90% county, and Newcastle and Sunderland occupy the yellow county furthest north. They are currently the bottom 3 in the league.

I may analysis this more later. I might research ticket prices related to this, maybe calculate value for money based on price per goal.




Leave a Reply

Your email address will not be published. Required fields are marked *