July 30, 2017
By Emily F. Keller
CUAC Summit Day 1 - Data Science for Social Good Presentations

The growing prevalence of data generated through diverse sources and methods such as regulated government statistics, sporadically produced social media feeds, variably distributed geolocation and travel sensors, and qualitative on-the-ground surveys paints a rich and complex picture of urban systems in the Cascadia region. The combined data presents new opportunities to understand and improve quality of life and services for residents and visitors at the neighborhood level. However, data frequency and availability also presents new risks of misinterpretation through information gaps, inconsistent methodologies and unintended biases in data collection.

This summer, student fellows and data scientists at the University of Washington (UW) and the University of British Columbia (UBC) partnered with local government agencies and organizations to apply data science methods to information on travel and transit patterns, early childhood education, tourism asset utilization, neighborhood equity, food security and economic investment opportunities. The Data Science for Social Good (DSSG) program participants at the UW eScience Institute and the UBC Data Science Institute partnered with agency representatives from Seattle, Surrey, B.C. and the B.C. Government to create interactive portals, data visualizations and granular data sets showing local trends. The results may be used to refine data gathering processes, establish pilot projects for further research, or contribute to urban planning, transit network decisions, or investment priorities for businesses and attractions.

The project teams presented their work at the first Cascadia Urban Analytics Cooperative (CUAC) Summit at UBC on July 13-14, 2017. The DSSG programs, modeled after those at the University of Chicago and Georgia Tech, are an integral part of the CUAC initiative to advance the application of data science to urban issues in the Cascadia region by connecting faculty, researchers, students, local stakeholders and government agencies in the Seattle and Vancouver, B.C. metropolitan areas. The partnerships are designed to generate data science that is actionable and applicable to ground-level urban issues. Establishing the DSSG program at UBC to collaborate with the existing UW program was one of the first activities of the Microsoft-funded initiative launched in February 2017 as part of the Cascadia Innovation Corridor.

In pursuit of CUAC’s goal to bring wide-ranging perspectives to complex urban issues, the graduate and undergraduate fellows represent fields such as mechanical and civil engineering, social sciences, psychology, mathematics, economics, statistics, computer science, physics, biology, public health and international relations. They were joined by faculty, researchers, students, and representatives of Microsoft and local agencies for a total of sixty participants.

In the teams’ midterm presentations on the first day of the Summit, some common themes and approaches emerged:

  • Aggregating data from government open data portals, Census statistics, social media, demographics, and socioeconomic indicators to look for patterns across populations, neighborhoods and regions.
  • Creating multi-layered tools for visualization, mapping and analysis of nuanced data about human activity, and offering easily comparable information displayed according to priorities selected by the user.
  • Partnering with local government and organizations to generate data products that have contextual relevance in collaboration with university data science expertise.
  • Devoting significant resources to data cleaning and processing to identify and account for inconsistencies or differing standards across resources; and providing suggestions for improving future data collection methods in addition to research findings.
  • Seeking additional resources when data limitations are discovered, such as finding summaries rather than raw data in open data portals, or finding PDFs instead of an accessible and comprehensive database.

== Transportation and Traffic ==

Traveling via public transit or private vehicle leaves data traces collected through a variety of methods: sensors that detect traffic flow through anonymized connections to personal WiFi devices; electronic payments made when entering or exiting a bus or train; overhead sensors that count passengers as they board a vehicle; social media posts with geolocation data; and transit agencies’ data feeds showing schedules, routes and geolocation information.

Utilizing publicly available data, UBC fellows Saeid Allahdadian, Lap-Tak Chu, Mina Park and William Qi partnered with City of Surrey Transportation Planner Don Buchanan to build a data visualization tool for transit planners and the public. The team, led by Buchanan, aggregated data from the National Census Journey to Work, TransLink, Googlemaps API, Twitter and City traffic counts. The tool is being designed to show stop location and transportation availability with relative frequency for comparability across the region. Data challenges have included discovering inconsistencies in sidewalk documentation over time and determining that correcting the inventory through a network analysis would be a larger project. Long-term goals are to support the City in carrying out a data-driven approach for identifying priority routes, planning service upgrades, conducting capital planning work, and making the case for better data management internally.

Even robust data sources may miss groups of travelers or types of trips, showing the importance of data processing and cleaning. UW fellows Daniel Dylewsky, Mayuree Binjolkar, Andrew Ju and Wenonah Zhang are studying One Regional Card for All (ORCA) data from seven regional transportation agencies across the Puget Sound area. Along with project leads Mark Hallenbeck and Michael Wolf, and data scientists Jake VanderPlas and Bryna Hazelton, the fellows have two goals: examining bias in the collection of data from electronic fare purchases that omit cash payments made by low-income passengers and others; and distinguishing “real transfers” from “financial transfers,” or trips that utilize the ORCA card’s two-hour free transfer window for purposes such as round-trips or errands. Challenges include working with bus data that counts passengers as they board but not when they disembark, leaving the team to infer passengers’ exit locations. The overarching project goal, in collaboration with the UW Transportation Data Collaborative, is to provide nuanced information on travel patterns that transit agencies can use for route planning.

UW fellows Brett Bejcek, Anamol Pundle, Orysya Stus and Mike Vlah are working to detect private or for-hire vehicles circling for parking or between customers as a portion of total traffic congestion. With data scientists Valentina Staneva and Vaughn Iverson and project lead Stephen Barham from the City of Seattle’s Department of Transportation, the Cruising team fellows are using traffic sensor data in downtown Seattle (with identifying information scrubbed daily), combined with OpenStreetMap data on intersection connectivity and speed limits. As the traffic sensor data catches only a portion of passing vehicles and identifies their presence within a two-block radius rather than pinpointing their exact location, the team is creating algorithms to estimate probable vehicle paths that indicate “cruising”. The final results will be represented by a traffic congestion heat map.

== Access to Resources, Education and Investment ==

UBC fellows Patricia Angkiriwang, Elba Gomez Navas Acevedo, Patrick Laflamme and Shenyi Pan partnered with United Way Avenues of Change and project lead Stacey Rennie at the City of Surrey to create the Early Child Education Project. The team is examining the relationship between early childhood development indicators such as physical, social, emotional and communication skills with indicators in the City’s data to assess vulnerability by neighborhood. They are measuring childhood development trends across B.C., and integrating school satisfaction and residential turnover data into their analysis, which uses the Guildford West neighborhood as a pilot. The project will provide data aimed at increasing access to services, information and support; supporting evidence-based improvements to quality and outcomes; and improving decision-making in planning, investment and research.

UW fellows Hillary Dawkins, Yahui Ma and Jacob Rich are working with professors Rachel Berney and Gundula Proksch, and data scientists Bernease Herman and Amanda Tan, to create the Equity Modeler. This project examines affordability and access to opportunity by neighborhood following unprecedented population growth and increased housing prices in the Seattle area. The project links data on demographics, socioeconomic profiles, housing and development, quality of life, income and education with proximity to public schools, parks, transit and libraries. The team is creating a structural equation model to generate predictions on the relationship between various indicators, and building an interactive graphic tool to visualize trends by neighborhood.

The popularity and use of tourist destinations in British Columbia is tracked by multiple government agencies and through social media. The B.C. Tourism Resources Project is aggregating this information to present a detailed picture of tourism based on disparate information. The project utilizes B.C. public data, Census statistics and social media photos with natural language processing applied to captions to shed light on the popularity of destinations and their proximity to transportation and infrastructure. Challenges have included discovering an Instagram cache limited to 100,000 pictures, which omits winter destinations when searched during the summer, leading the team to search for supplemental data. UBC fellows Raphaël Roman, Halldór Þórhallsson, Hailey Wu and Gary Zhu are working with project lead Ben Clark from the B.C. Government to create a value for each tourist destination. This value can be used to allocate emergency services and investments, such as upgrading tourist assets with lower utilization.

The Investment and Intergovernmental Project is a collaboration with the City of Surrey to develop an interactive heat map, or profile of the City, highlighting the economic competitiveness of individual communities, with features weighted by the user. The project combines City data with business licenses, job postings, property assessments and Canada’s National Household Survey. UBC fellows Rashed Hoque, Tony Hui, Natasha Mattson and Sarah Neubauer are working with Research Analyst Kristine Garrucho and Economic Development Manager Stephen Wu from the City of Surrey. The project is aimed at supporting job creation, innovation and competitiveness, and informing the City’s strategy for attracting investments according to each community’s unique attributes.

== Sustainable Agriculture ==

The Vital Signs project is part of larger effort by the organization Conservation International to improve data resources and optimize agricultural development and risk management decisions for smallholder farmers in Kenya, Ghana, Tanzania, Rwanda and Uganda. UW fellows Cara Arizmendi, Mitchell Goist, Krista Jones and Robert Shaffer are working with Vital Signs project leads Matt Cooper and Tabby Njunge, and data scientists Anthony Arendt and Joe Hellerstein. The team utilizes data from national governments along with socioeconomic surveys, groundwater data from NASA satellites, and information collected directly from farmers showing agricultural practices and yields, soil quality, biodiversity and human well-being. The data collection and analysis is being used to determine methods for increasing agricultural resilience to weather and climate events, such as improving seeds and training for farmers, and to build online platforms displaying interactive data.

== Next Steps ==

With the DSSG presentations setting the stage for discussions, the second day of the CUAC Summit featured lightning talks by faculty and agency representatives, along with group brainstorming sessions about long-term research ideas, detailed in the CUAC Summit Day 2 blog post. CUAC, which is run by the UW Urbanalytics unit and the UBC Data Science Institute, will resume discussions at a Fall Symposium at UW planned for September 2017.