During the summer of 2017, I worked with Professor Eric J. Miller from the Department of Civil Engineering to perform transit data analysis for his research.
Through the 16-week term, my work centered around path analysis—the modeling and investigation of individual trips taken in a computer traffic simulation of the Greater Toronto Area, to closely resemble the actual trips observed by the Transportation Tomorrow Survey (TTS). The goal is to allow the computer model system to generate a set of trips that match as closely as possible to the observed trip data. If the computer-generated trips are not similar to the observed trips, then investigate the reason. The eventual goal is to use the computer-generated trip data and the model system to estimate future transit demand, thus aiding the planning of transit infrastructure.
My work involved developing data analysis programs to analyze and compare the path data. These programs are made into a toolbox in the Emme software and can be repeatedly called from the path estimation model system. The model system is built using a software called XTMF, which uses a modular format for each function. XTMF can call and run Emme tools to perform tasks. One of my programs extracts the computer-generated trip data from Emme into a format that can be easily analyzed. Another program organizes the observed trip data and compares them to the computer-generated trip data using three different methods. The path estimation model system was built so that it repeatedly runs transit assignments and generates trips, each assignment using different parameter values. The system then learns to generate better trip data (i.e., more similar to the observed data) through particle-swarm optimization. Eventually, after a number of generations, the similarity should converge. Extensive testing and alterations were done to ensure correctness and improve runtime efficiency.
After a number of model runs, it was found that even the best computer-generated trip data were significantly different from the observed trip data. Therefore, further investigation was done to understand whether there is an error with the model system or an issue with the nature of how trips are generated by the computer. A lot of manual comparison, as well as numerical/program analysis, was performed for computer-generated and actual trips for certain origin-destination pairs. Various other experiments were made by making changes to the Emme network of the GTA. This, along with many discussions with Prof. Miller and other data analysts at TMG, led to the conclusion that this is indeed a deficiency of how the Emme software generates the trips.
Since the Emme network of the GTA is comprised of zones with centroids, routes, and centroid connectors that connect the centroids to the routes, each trip can only originate and end at the centroids of zones. The observed trip data only includes the start and end zones as well. Since a zone can cover an area of several km², the exact location of the origin and destination are unknown. Also, the way the centroid connectors connect the centroids to the routes limited the number of possible paths. Thus, Emme has a hard time generating a lot of the observed paths. Possible solutions include using a coordinate system and setting the start and end locations of trips to coordinates instead of zones, or decreasing the size of zones by adding more zones in an area. These solutions can be explored in the future.
For more detailed information about my work, a project report documenting my theories, experiments, and observations is available upon request, as well as documentation for my data-analysis programs.
Code for some of the programs and the final report can be found here.