When I first pulled up WPRDC, I had no idea what I was going to do for my project. Luckily, one of the first datasets that came up was Pittsburgh arrest records. I am an avid football and hockey fan and after being uprooted from my hometown I find myself in a lot of heated sports arguments here at Pitt. In addition, since I’m a college student I am aware of how wild tailgates can get and I have many friends who have received citations. A combination of these two things helped me quickly decide on analyzing arrest records for trends in relation to the outcome of Steelers games over the past 3 seasons. I was unsure of what results I would find as sports fans are equally invincible after a win as they are frustrated after a loss.
With Pittsburgh being such a large city, my first task was honing in on what arrests were relevant and which were irrelevant. For the locations I mainly focused on zones 1, 2, and 3, which correlate to the Northshore, Downtown, and Southside, which includes the area around the stadium and popular places to view games (restaurants, bars, etc). For timing I only considered arrests made from 9am on the day of the game until about 1am the day after, which to me parallels the start of tailgating to when fans are done drinking at bars at the end of the night. The hardest part was deciding what arrests were relevant. I mainly focused on assaults, possession of substances, evading police, and public urination, essentially things I would imagine a drunk fan doing. Unfortunately, there were some outliers that came down to my judgement, but I made sure to be as consistent as possible to include similar arrests for each data point plotted.
I decided to use multiple visualizations to help readers digest the data. I used plot.ly for a simple bar graph to show the raw difference between arrests after a win and after a loss. I did this by year because there were variations in data by season. To show individual games, I used a scatter plot I made in excel because that’s my most familiar graphing tool and I knew how to make it how I wanted it to look.
One thing I realized as I completed my data collection is that it only shows correlation, not causation. There is no way for me to know which arrests were even remotely related to the Steelers, but I hope that the set parameters helped make my data more reliable. This also does not include fans arrested in the stadium or by other police factions like the Pitt police, but that was a limitation of the dataset.
I believe my investigation could have benefitted from extra seasons being included, but the dataset was only consistent from the beginning of 2016. With only 16 games per season and a few playoff games, I could only obtain 51 data points (I omitted the 2018 tie with Cleveland as an outlier). The Steelers have been mostly successful over the last three seasons and I think extra seasons would have been beneficial to establish a more concrete trend.
My main takeaway is no matter how big of a fan you are, it’s best not to get arrested on gameday. Feel free to check out my graphics and writeup on my website here .