Reproducibility in Transportation Research: A Hands-on Tutorial

RR banner

This half-day tutorial (afternoon session) is dedicated to reproducible research (RR) in transportation. As transportation researchers, it has been our experience that research in transportation is hard to reproduce. Needless to say, this holds back the scientific progress of the field; every time a student needs to re-implement another paper or collect a similar dataset, that is time that could have been spent on new research. Fortunately, tools and best practices supporting RR are maturing, so it is the perfect time for the ITS community to engage with RR. We hope that this tutorial will help to move the needle on reproducibility in transportation, so that our research collectively achieves greater impact.

This is the first of hopefully many tutorials on RR in transportation; as such, we want your feedback on your RR needs and interests! Please do not hesitate to get in touch.

Background

Reproducibility is a cornerstone of scientific research, providing the foundation for validating results and advancing knowledge. In research fields where computation-based scientific publication is pervasive, a credibility crisis has been warned. RR is gaining extensive attention in various fields, such as remote sensing, medicine, and data science, to ensure the validity and reliability of scientific findings.

In the field of traffic and transportation, Intelligent Transportation Systems (ITS) represent the most computationally intensive, fast-growing area of research with significant implications of outcomes for the general population. In ITS, the rapid evolution of technologies and methodologies has emphasized the need for RR practices. Reproducibility can refer to computational reproducibility (the focus of this tutorial), ensuring that the same data and analysis steps produce consistent results. However, when it is infeasible to replicate an entire scientific study, achieving reproducibility sets a minimum standard of scientific rigor by allowing others to validate and build upon the findings. With ITS being inherently interdisciplinary and data-driven, reproducibility forms the backbone for credible, reliable research outputs.

The objectives of the tutorial

  • An introduction to the fundamental concepts and importance of RR to the domain of ITS
  • Provide hands-on training in the use of tools and software that facilitate reproducibility
  • Guide researchers on how to properly document and organize data and project outcomes to foster open science
  • Encourage collaboration among tutorial participants and the broader ITS community to foster a community that values and practices reproducible research

Topics we will cover

  • Foundations of reproducible research (RR): What does it mean for research to be reproducible? Why is it important? Why is it worthwhile to engage in RR? What makes reproducibility hard? Even when provided the exact dataset and code of another research paper?!
  • RR challenges in ITS: What are challenges of RR in the field of ITS? Are they general across many fields or specific to this one? What are examples of (non-)reproducibility in ITS research?
  • The state of RR in ITS: Through a live participant survey, we will get a taste of the RR attitudes and needs of the ITS community.
  • Documenting code and data for RR: What kinds of missing metadata can be responsible for reproducibility failures? How can version control tools like git and github be used for RR? How can project files be organized for readability?
  • Hands-on activities: Through two hands-on activities, participants will attempt to create small reproducible projects. Another participant will then try to reproduce it. Will they succeed? The first activity will focus on a simple report. The second activity will use a small project with code and data.

Participant requirements

Participants should bring laptops with installed required software to participate in hands-on sessions.

Tentative Schedule

Time (MDT, UTC-6) Event
13:30 pm - 13:40 pm Introductory Remarks
Pre-tutorial survey
13:40 pm - 14:55 pm Session 1: Introduction to Reproducible Research
Speakers/Contributors: Bidisha Ghosh, Zuduo Zheng
Lecture 1: Introduction to reproducible research
Hands-on activity 1: Is your research reproducible?

👉 Please download the following two csv files for this activity:
14:55 pm - 15:30 pm Session 2: Documentation of Data and Code for Reproducibility
Speakers/Contributors: Cathy Wu, Nicholas Saunier
Lecture 2: How to badly document a research project

Learning objectives:
  • Fundamentals of organizing projects, code, & data
  • Working knowledge of Github
  • (Advanced skills) Automatically extract the computing environment for reproducibility
  • (Advanced skills) VS Code and powerful extensions (tools for Jupyter, Markdown, csv, lint)
Hands-on activity 2: Can you reproduce my simulation results in 5 minutes?

👉 Please make sure to go through the participant requirements prior to this session.

Learning objectives:
  • Main goal: Create an ITS project that someone can reproduce in less than 5 minutes.
  • Stretch goal: Organize the project and make it understandable.
15:30 pm - 16:00 pm Coffee Break and Office Hours
16:00 pm - 17:25 pm Session 2: Documentation of Data and Code for Reproducibility (cont...)
17:25 pm - 17:30 pm Concluding Remarks
Post-tutorial survey

We are also grateful to the REROUTE project funded by Horizon Europe Marie Skłodowska-Curie Actions (MSCA) for their help in organizing the tutorial.

Potential future topics

We simply did not have enough time to cover many important topics in the area! If you have suggestions for what you need to succeed in your research, we are all ears. Here are a few potential future topics:

  • Data sharing and management for RR: What are best practices for storing data? How to choose a data format, and does it matter? How to participate in RR inspite of data with sensitive information?
  • Reproducible Analysis: Our analysis pipeline are getting more and more sophisticated. What are best practices for leveraging Jupyter or Google Colab notebooks to perform and share interactive analysis?
  • Reproducible Reporting: What are dynamic documents? Reproducible reports? How does someone create one?
  • Best Practices for Peer Review and Publication: What role do reviewers and publication policies play in reproducible research? How can we move the needle forward? What are open science checklists? How well are they working?
  • Reproducibility in Machine Learning and AI: How can we standardize and manage ML experiments for reproducibility?
  • Addressing non-technical barriers to reproducible research: The elephant in the room is the pressure to ‘publish or perish’; researchers simply respond to the incentive structure. What or who can make a difference in RR?
  • Case Studies in ITS: Do we understand the extent of the RR issue in the field of ITS? How can we find out? What are the success stories of RR in ITS? What are the horror stories?
  • Hands-on RR workshop: Participants bring a research project and work on making it reproducible (lectures, work sessions, collaborative sessions, discussion).

Speakers

Speaker Bio

Bidisha Ghosh (Member, IEEE) is an Associate Professor in the School of Engineering and a fellow at Trinity College Dublin. She is a member of the IEEE Intelligent Transportation Systems Society. Prof. Ghosh has authored over 175 peer-reviewed conference and journal papers. Her research has been extensively cited in policy documents related to cycling and sustainable transport by the World Health Organization (WHO) and government bodies. She has been an investigator in multiple national and EU projects in the field of traffic & transportation and environmental modeling.

Cathy Wu is an Associate Professor at MIT in LIDS, CEE, and IDSS. She holds a Ph.D. from UC Berkeley, and B.S. and M.Eng. from MIT, all in EECS, and completed a Postdoc at Microsoft Research. Her research aims to leverage machine learning to solve hard optimization problems for next-generation mobility systems. She is broadly interested in leveraging modern computing and AI to advance decision making. Cathy has received a number of awards, including the NSF CAREER, PhD dissertation awards, and publications with distinction. She serves on the Board of Governors for the IEEE ITSS, is a Program Co-chair for RLC 2025, and is an AC/AE for ICML, NeurIPS, and ICRA. She is also helping spearhead efforts towards reproducible research in transportation.