The basic idea is that we are going to be looking for patterns in late night T usage, which started in March of this year, and compare it to the same time period from the previous year. Through that we are also hoping to find any effects that late night T service may have had on the taxi industry. Part 1 focused on which T stops were most popular at night, and broke down aggregate T and taxi traffic by hour. Here, we are going to focus on where that late-night T and taxi activity happens.
To do that, we're going to need a couple of things:
- A properly partitioned data set. Since late-night service started on March 28 2014, the start range of our comparison window is going to be March 28 in both years. And since taxi usage data was released through May 31 2014, we will end our comparison on May 31 in both years. This matches up with the time range we analyzed previously.
- Position data: the released taxi data provides lat/lon location of each pickup, and we are going to add in a shapefile of Boston neighborhoods and a set of lat/lon locations for our T stations.
- Libraries to process/plot the new data. We're going to use four existing Python libraries: numpy for general data wrangling, matplotlib for plotting, as well as the geospatial libraries pyshp and pyproj to make sense of the raw shapefile data.
Part 1 showed which stops were particularly popular for late night service by name, so now let's look at their locations. Taking all of our position data and sizing our plot points in proportion to our aggregate transaction metrics from last time gives the following:
Most T activity is concentrated downtown as expected, and the Red line activity is highest at Harvard. A small caveat here: some of the Green line data was listed in the T challenge generically as "Green Line B", or "Green Line C", which won't match up well with our station-specific positions. The upshot is that late-night popularity is not accurately represented for Green line stops west of Kenmore. Rather than try and come up with an allocation method to divide up that traffic, I just plotted them at default size.
One of the main conclusions from part 1 of this analysis was that taxi activity is down compared to 2013, but arguably not because of T service because the drop is uniform in time. Here we want to explore where activity changes happened, so I made a heatmap of all pickup locations in 2014 that were in the months of interest:
The heatmap is in log scale, so portions of the grid with color index of three have 10 times the number of pickups as color index 2. I like that you can see street structure clearly here: Commonwealth Ave, and the loop around the North End for example. Or the airport, as well as Columbus/Tremont/Washington streets in the South End. It's too bad that the data do not extend into Cambridge/Somerville, and that pickup locations in the non-commericial zones were anonymized, but we'll work with what we have.
The above plot shows 2014 activity only; what we really want to know is how activity changed year-over-year. Taking the above heatmap, and subtracting out the equivalent map from 2013, leaves us with the difference year-over-year:
Results are again in log scale, with blue representing an increase in 2014 taxi activity relative to 2013, and red representing reduced 2014 traffic. Taxi pickups are down all along Commonwealth Ave including points west after it becomes Beacon, and also on the three arteries through the South End mentioned earlier. Even the airport has less taxi activity overall. Downtown looks like it has some increased activity, but it's hard to tell at this zoom. Zooming in:
Performance is more varied downtown; there are areas of large decrease (e.g. the area near MGH), but also some pockets where pickups are up. Let's combine this with our MBTA station data from earlier:
Although at a glance I would say taxi activity increases are generally near popular late-night T stations, I don't see an immediate relationship. Maybe if we zoom in further on downtown:
Interesting, but still no clear link. We need to express this as a metric. For each grid square on the whole graph with any taxi traffic, we can calculate the nearest T stop in miles and see how that compares with the traffic change for that grid square. Plotting all of these points at once would make for an indecipherable cloud, so how about a heatmap counting number of times that we see a given t-stop-distance/taxi-change combo:
The biggest changes in taxi activity do happen near T stops, but in both directions - increases as well as decreases.
We've looked now independently at changes by time and changes by location, but have yet to combine them. Stay tuned...