Steven's Blog

Statement for Pittsburgh Dataset Project

By Steven Barash

January 30, 2019

When I started to look through WPRDC’s collection of Pittsburgh datasets I was overwhelmed by the sheer number of possible datasets to work with. Every single one seemed to have a promising story behind it, and I wasn’t sure where to even start. One of the first ideas that came to mind for me was to utilize the ever-popular dog ownership database and to merge the data points with data sets that depict incidence of various health issues such as diabetes and depression within Allegheny County. Unfortunately, I ran into the issue of the health datasets not containing any location data to draw conclusions from, so this idea was unfortunately scrapped. I knew that air quality was a pressing issue for Pittsburgh, and earlier that day I was reminded of this issue firsthand as I waited for the bus that morning and was engulfed in the stench of bus exhaust. I was hoping to somehow tie in the issue of Pittsburgh’s poor air quality with an interesting dataset to tie into this issue.


I chose to combine the Allegheny County Air Quality dataset with Port Authorities Bus Stop data set in order to compare the impact that Pittsburgh large fleet of diesel-powered buses has on its environment. I ran into some issues with Tableau not allowing me to merge the two datasets, as they didn’t have any relating values. In order to compromise, I instead just displayed the two datasets side by side. I was also fascinated by the sheer complexity of Pittsburgh’s bus system, having 7000 bus stops and over 750 buses. It was also interesting to see that one can plot out streets using the bus stop locations given by the dataset, and the grid-like nature of some of the bus stops in Pittsburgh’s downtown area.


During my research into Pittsburgh’s air quality issues I was also both surprised and disappointed to find out that Pittsburgh ranks as one of the worst cities in the United States for air quality, with much of Pittsburgh being with the “Moderate” air quality range on average according to the EPA. I was also surprised by the fact that 50% of Pittsburgh’s pollution is from outside of Pittsburgh, and in the end, I realized that Pittsburgh’s buses have comparatively very little impact on Pittsburgh’s environment.


While the results of my project weren’t exactly what I expected, I still found it very interesting to look into the nuances of Pittsburgh’s bus system as well as the reasoning behind Pittsburgh’s poor air quality. It was also interesting to see that there is indeed some form of correlation between bus stop density and pollution. For my website, I decided to use a grey color scheme, with a smoky background for the title, as I felt that it best conveyed Pittsburgh’s air quality. As for future projects I hope to somehow combine Python and D3.js as I’m interested in the more technical side of data visualization. In addition, I hope to become more proficient in Tableau in order to learn how to overlay two relatively unrelated datasets on the same map, as that will probably help reveal interesting trends in the future.


My data visualization can be found on