A team of graduate and undergraduate fellows at the University of British Columbia (UBC) has created a suite of data visualization tools to analyze bus route performance and demographics in Vancouver, Canada, with a focus on the City of Surrey.
Saeid Allahdadian, Lap-Tak Chu, Mina Park and William Qi worked with Surrey Transportation Planner Don Buchanan as their project lead to build an interactive public map of demographic and bus data, generate graphs to characterize and model bus networks, and analyze social media data to assess route volume and customer opinions. The data visualizations showing bus frequency and connectivity can be used to inform route planning and transit expansion in the region. The final report, Exploiting Open Data for Public Transportation Analysis, details the data cleaning and analysis processes used.
The collaboration was one of four projects completed through the Data Science for Social Good (DSSG) summer program launched this year at UBC’s Data Science Institute. The program complements the DSSG program run by the University of Washington’s eScience Institute. The Surrey Transportation Project was featured with the UW Cruising Traffic Analysis Project in a Microsoft blog post.
Surrey, the fastest growing municipality in metropolitan Vancouver and the second largest city in British Columbia, has six town centers and a mix of urban, rural, residential, commercial and agricultural zones. The city has more than 50 bus routes serving more than 100,000 passengers, with transit expansion underway. Data sources included the TransLink Open API, which incorporates transit and scheduling data provided through the General Transit Feed Specification (GTFS), a standard used by transit agencies to feed data to Google’s trip planning applications and other systems; and data collected from entrance and exit doors by Automated Passenger Counters from the Surrey Engineering Department.
In addition to creating the map, the team used Python and NetworkX to build a graph model for conducting network analysis of the Surrey bus system based on weekday schedules. The model uses stops and routes to show degrees of connectivity and centrality, and average clustering coefficient, which represents the ease of moving within a network. The model enables mathematical computation and comparisons to benchmark networks and sub-networks against each other. Computational methods for measuring connectivity and complexity in bus network graphs used a modified version of those outlined by the UBC Department of Civil Engineering in a 2014 paper. Based on the graphs created, the fellows found that Cloverdale, which is surrounded by agricultural land, is the town center with the smallest number of bus routes and the lowest degrees of connectivity and complexity, while City Centre and Whalley showed the highest connectivity.
Expanding on the graphs, a Bayesian model was used to analyze bus routes in the Frequent Transit Network (FTN), which run every 15 minutes from morning until night. The goal was to quantify differences in transit performance and service utilization between FTN and other routes; and to compare the eight FTN routes in Surrey with other areas in Vancouver. Data for selected routes was manually extracted using Tabula from Translink’s Transit Service Performance Review, which included annual service cost and peak passenger load for 2011-2015. The data was cleaned using Pandas in Python and analyzed in R. The data showed that FTN routes in Surrey had significantly lower employment density than those in the rest of Vancouver, despite having a similar population density. This may indicate opportunities for economic development along Surrey’s FTN corridors. FTN routes in Surrey had fewer passenger boardings, higher costs per passenger and a better on-time performance, with an average speed of 30.07 versus 19.74 kilometers per hour. The team observed that routes with low utilization play an important role in connecting Surrey to other municipalities.
To model low-cost methods for assessing commuter transit patterns, travel time and customer opinions, the team utilized the Twitter API, analyzing data over three weeks in July and August 2017. This consisted of 30,000 geotagged public Tweets representing 3,440 people in Vancouver, and 4,800 Tweets directed towards @Translink, with stop IDs and route numbers extracted through filters. While only a subset of the population was represented, the team recommends this method for showing relative demand on routes throughout the day and highlighting problems in real-time. Combining the data with historical Tweets, they created a map to estimate traffic volume. Through a clustering method, combined with transit times from Google Maps, the team observed that residents of Langley Township appear to have greater intra-regional transit than surrounding areas. Finally, they recommended three areas for improved connectivity: between Surrey and Langley, between Coquitlam/ Port Coquitlam and Surrey/ Langley, and the Newtown area within Surrey and New Westminster.