GLOBAL PROGRAM
           RESILIENT HOUSING




Detecting
Urban Clues
for Road Safety
Leveraging Big Data
and Machine Learning
in World Bank
Transport Projects




DECEMBER 2021

 1
Table of Contents


List of Figures and Tables�������������������������������������������������������������������������������������������������������������������������������������� 3
Acknowledgments�������������������������������������������������������������������������������������������������������������������������������������������������� 4
Objective, Audience and Structure����������������������������������������������������������������������������������������������������������������������� 5
Abbreviations���������������������������������������������������������������������������������������������������������������������������������������������������������� 6
Introduction������������������������������������������������������������������������������������������������������������������������������������������������������������� 7
PART 1: The Demand for Data to Assess Risks and Conduct Safety Assessments��������������������������������������� 10
             1.1 Assessing Road Safety Across the Project Cycle��������������������������������������������������������������������������� 10
             1.2 Demand for Data to Assess Road Safety���������������������������������������������������������������������������������������� 12
                   Assessing Overall Project Traffic and Road Safety Risk (OPTRSR)���������������������������������������������� 14
                   Traffic and Road Safety Assessments�������������������������������������������������������������������������������������������� 17
                   Results Frameworks and Monitoring Plans������������������������������������������������������������������������������������ 19
                   Key Challenges with Current Approaches to Road Safety Analysis���������������������������������������������� 21
PART 2: Big Data and Machine Learning to Strengthen Road Safety in Transport Projects�������������������������� 23
             2.1 New Data (and Big Data) in Road Safety Analysis�������������������������������������������������������������������������� 24
                   How to Access Big Data������������������������������������������������������������������������������������������������������������������� 28
                   Key Considerations for Selecting the “Right” Big Data Source������������������������������������������������������ 31
             2.2 Machine Learning in Road Safety Analysis������������������������������������������������������������������������������������� 37
                   How to Use Machine Learning��������������������������������������������������������������������������������������������������������� 40
                   Key Considerations for Using Machine Learning���������������������������������������������������������������������������� 43
             2.3 Big Data, Machine Learning and the Future of Road Safety Assessments����������������������������������� 46
PART 3: Case Studies: Applying Big Data and Machine Learning to Assess Road Safety����������������������������� 49
             3.1 Objectives of the Case Studies�������������������������������������������������������������������������������������������������������� 49
             3.2 Methodology������������������������������������������������������������������������������������������������������������������������������������� 51
             3.3 Case Study 1: Bogotá, Colombia������������������������������������������������������������������������������������������������������ 55
             3.4 Case Study 2: Padang, Indonesia����������������������������������������������������������������������������������������������������� 59
             3.5 Findings��������������������������������������������������������������������������������������������������������������������������������������������� 60
Conclusion������������������������������������������������������������������������������������������������������������������������������������������������������������ 62
Annex 1: Most Relevant Big Data Types for Road Safety Analysis������������������������������������������������������������������ 64
Annex 2: Overview of Big Data Sources�������������������������������������������������������������������������������������������������������������� 65
Annex 3: Hotspots and Heatmaps: Uncovering Data Patterns for Road Safety���������������������������������������������� 69
Annex 4: Classes Detected Using Mapillary Vistas Dataset in RIC Model and Input Classes for the RRE Model��76
Annex 5: Average Precision of the Bounding Box Detection and Classification��������������������������������������������� 77
Glossary of Terms������������������������������������������������������������������������������������������������������������������������������������������������ 78
References������������������������������������������������������������������������������������������������������������������������������������������������������������ 79


                                                                                     2
List of Figures


Figure 1: Road safety is a serious concern in low- and middle-income countries��������������������������������������������� 8
Figure 2: Potential applications of big data and ML in road safety projects������������������������������������������������������ 9
Figure 3: Key road safety activities across the project cycle����������������������������������������������������������������������������� 13
Figure 4: Street view and OSM����������������������������������������������������������������������������������������������������������������������������� 26
Figure 5: Hotspot analysis of major crashes reported by Waze application users������������������������������������������ 27
Figure 6: ML lifecycle�������������������������������������������������������������������������������������������������������������������������������������������� 37
Figure 7: Categories of ML and the tasks they can perform����������������������������������������������������������������������������� 38
Figure 8: ANN structure���������������������������������������������������������������������������������������������������������������������������������������� 39
Figure 9: ML algorithms and street view������������������������������������������������������������������������������������������������������������� 42
Figure 10: Labeling a crosswalk in Padang, Indonesia using the Computer Vision Annotation Tool (CVAT)���� 45
Figure 11: Framework for automatic road safety analysis and management powered by ML����������������������� 48
Figure 12: Training phase for road safety segment analysis using ML������������������������������������������������������������ 52
Figure 13: Deployment phase to predict road safety����������������������������������������������������������������������������������������� 53
Figure 14: RIC and RRE applied to predict road segment risk��������������������������������������������������������������������������� 54
Figure 15: Image segmentation in Bogotá���������������������������������������������������������������������������������������������������������� 56
Figure 16: Six study areas and crash frequency in Bogotá�������������������������������������������������������������������������������� 57
Figure 17: Confusion matrix showing the accuracy of the RRE model������������������������������������������������������������� 57
Figure 18: Road risk prediction in Bogotá����������������������������������������������������������������������������������������������������������� 58
Figure 19: Road risk prediction in Padang���������������������������������������������������������������������������������������������������������� 60




List of Tables


Table 1: Methods for calculating OPTRSR and identifying risk factors������������������������������������������������������������ 13
Table 2: Methods for assessing OPTRSR and identifying risk factors�������������������������������������������������������������� 16
Table 3: The World Bank Road Safety Screening and Assessment Tool���������������������������������������������������������� 17
Table 4: Overview of primary tools for traffic and road safety assessments��������������������������������������������������� 19
Table 5: Example of indicators that may be included in the Results Framework��������������������������������������������� 20
Table 6: SWOT analysis of using big data in road safety analysis�������������������������������������������������������������������� 25
Table 7: Overview of potential big data sources for Methods I-VII�������������������������������������������������������������������� 32
Table 8: Method I — Crash data-based risk assessment����������������������������������������������������������������������������������� 32
Table 9: Method II — iRAP Star Rating (alternative data sources using big data)�������������������������������������������� 33
Table 10: Method III — Estimating road infrastructure risk without crash or iRAP data���������������������������������� 34
Table 11: Method IV — RSSAT������������������������������������������������������������������������������������������������������������������������������ 35
Table 12: Example of big data sources for road safety indicators in the Results Framework������������������������ 36
Table 13: Categories of ML and algorithms�������������������������������������������������������������������������������������������������������� 39
Table 14: ML and DL algorithms�������������������������������������������������������������������������������������������������������������������������� 39
Table 15: Frequently used ML techniques for road safety analysis������������������������������������������������������������������ 40
Table 16: SWOT analysis of using ML in road safety analysis��������������������������������������������������������������������������� 44
Table 17: Potential applications of big data and ML in Methods I to VII���������������������������������������������������������� 47
Table 18: Data used for case study in Bogotá, Colombia���������������������������������������������������������������������������������� 55
Table 19: Data used for case study in Padang, Indonesia��������������������������������������������������������������������������������� 59


                                                                                  3
Acknowledgments


This Guidance Note was prepared by a team from the Global Program for Resilient Housing at the
World Bank. The team was led by Sarah Elizabeth Antos (Data Scientist) and Luis Miguel Triveno
Chan Jan (Senior Urban Development Specialist). Overall managerial support was provided by Fran-
cis Ghesquiere (Practice Manager, Urban EAP) and Radoslaw Czapski (Senior Transport Specialist).
The core team included Jessica Gosling-Goldsmith, Charles Wang, Bushra Syed Shafat Ali, and Se-
bastian Anapolsky.
The Global Program for Resilient Housing supports safe and resilient housing by creating new,
cost-saving tools to evaluate homes from the air and the street to help identify those vulnerable to
natural and health hazards. While the program focuses on housing, it developed a methodology to
extract urban clues from street view imagery with multiple applications including those related to
urban mobility and road safety.
The note incorporates valuable input and review from Holly Krambeck (Program Manager), Said
Dahdah (Lead Transport Specialist), Satoshi Ogita (Senior Transport Specialist), Veronica Ines Raffo
(Senior Infrastructure Specialist), Li Qu (Senior Transport Specialist), and Glenn S. Morgan (ESF Con-
sultant).
During the drafting of this note several industry experts were interviewed. The team would like to
express gratitude for the external inputs of: Anthony Germanchev (Principal Professional Leader,
Advanced Technologies Lab, Australian Road Research Board), David Hynd (Chief Scientist, TRL),
Monica Olyslagers (Safe Cities and Innovation Specialist, iRAP), Professor George Yannis (National
Technical University of Athens), and Spencer Rigler (Account Director, TRL).
Design was done by Xavier Conesa.
This note would not have been possible without generous support from the Global Road Safety Facil-
ity and UK Aid.




                                                  4
Objective, Audience and Structure


The purpose of this Guidance Note is to provide concrete guidance on how big data and machine
learning (ML) can be leveraged in road safety analysis. The document presents opportunities to use
these new technologies to improve current road safety assessment procedures across the project
cycle, in accordance with the World Bank’s latest Environmental and Social Framework (ESF) guide-
lines.
This Guidance Note is for World Bank task teams who are interested in using new data sources
and analytical methods for road safety analysis across various types of projects. In addition, re-
searchers, road safety experts, data scientists, and government agencies responsible for road safety
assessments, transportation management, and infrastructure development would also find this doc-
ument useful to understand how these new technologies can be implemented across World Bank
investment projects.
This document consists of three parts. Part 1 discusses the World Bank’s current guidelines for
incorporating road safety analysis across the project cycle, examines existing data and approaches
and identifies opportunities to improve current methods using big data and ML. Part 2 provides an
overview of these new technologies and concrete guidance on how they can be integrated into World
Bank projects. Part 3 presents case studies on two regions of interest – Bogotá, Colombia and Padang,
Indonesia – to demonstrate how ML can be implemented to evaluate road safety. The document con-
cludes with recommendations for using big data and ML in road safety assessments in the future.




                                                 5
Abbreviations


ADB	     Asian Development Bank
API	     Application Programming Interface
DDP	     Development Data Partnership
DL	      Deep Learning
DRIVER	 Data for Road Incident Visualization, Evaluation and Reporting
ESCP	    Environmental and Social Commitment Plan
ESF	     Environmental and Social Framework
FSI	     Fatalities and Serious Injuries
GRSF	    Global Road Safety Facility (World Bank)
ICR	     Implementation Completion Report
IoT	     Internet of Things
iRAP	    International Road Assessment Programme
ITS	     Intelligent Transport System
LMICs	   Low- and Middle-Income Countries
ML	      Machine Learning
OPTRSR	 Overall Project Traffic and Road Safety Risk
OSM	     OpenStreetMap
PCN 	    Project Concept Note
PDO	     Project Development Objective
RIC	     Road Information Collector
ROI	     Region of Interest
RRE	     Road Risk Evaluator
RSA	     Road Safety Audit
RSI 	    Road Safety Inspection
RSIA	    Road Safety Impact Assessment
RSO	     Road Safety Observatory
RSSAT	   Road Safety Screening and Appraisal Tool
SDGs	    Sustainable Development Goals
TTL	     Task Team Leader
UAV	     Unmanned Aerial Vehicle




                                                    6
Introduction


Transportation services and infrastructure connect people, businesses, and places. They allow
citizens to access opportunities, such as jobs, education, health services, recreation, and enable the
movement and distribution of goods. As a result, transport services and infrastructure are key to the
economic development of cities and regions.1
While the development of transportation systems and infrastructure is vital to economic growth,
it is also important to evaluate and mitigate its potential negative externalities and costs to soci-
ety.2 According to the World Health Organization (WHO), around 1.25 million people are killed on the
world’s roads every year and between 20 and 50 million are seriously injured. These costs are dispro-
portionately higher in low- and middle-income countries (LMICs), which are estimated to endure 93
percent of the world’s fatalities on the road, despite having 60 percent of the world’s vehicles (figure
1).3 According to a 2019 study of select countries, road crashes cost World Bank client countries an
estimated 7 percent to 22 percent of their GDP over a 24-year period.4
Road fatalities and injuries are predictable and preventable.5 Research indicates that roughly 70
percent of serious crashes are due to simple and unintentional errors of perception or judgement.6
The most vulnerable road users are pedestrians, bicyclists, and motorcyclists, accounting for more
than 50 percent of reported fatalities in LMICs.7 Effective transport planning and management that
carefully considers and incorporates measures to address safety risks.8 Speed reductions and the
design of infrastructure to promote safer streets have demonstrated clear results in Colombia and
India. In Bogotá, Colombia, the speed management program resulted in a 21 percent decrease in
traffic fatalities compared to the average for the three preceding years (2015-18).9 In India, Pune has
become a regional leader in complete streets, in which streets are designed for all users, rather than
only for cars; pedestrians, cyclists, motorists, and transit riders are given safe access with the com-
plete streets approach.10
The World Bank is a key supporter of the United Nations (UN) Decade of Action for Road Safety
and related Sustainable Development Goals (SDGs). These include SDG 3.6, which seeks to reduce
deaths and injuries from road crashes by 50 percent, and SDG 11, which focuses on making cities and
human settlements inclusive, safe, resilient, and sustainable. The World Bank is also a proponent of

1
  World Bank, Mobile Metropolises: Urban Transport Matters: An IEG Evaluation of the World Bank Group’s Support for Urban
Transport (Washington, DC: World Bank, 2017).
2
  Word Bank, Making Roads Safer (Washington, DC: World Bank, 2014).
3
  WHO (World Health Organization), Global Status Report on Road Safety 2018 (Geneva: World Health Organization, 2018), 4.
4
  World Bank, The High Toll of Traffic Injuries: Unacceptable and Preventable (Washington, DC: World Bank, 2017).
5
  Makhtar Diop, “All Road Deaths Are Preventable. We Can Make It Happen,” World Bank, accessed May 14, 2021,
https://blogs.worldbank.org/transport/all-road-deaths-are-preventable-we-can-make-it-happen
6
  International Transport Forum, Zero Road Deaths and Serious Injuries: Leading a Paradigm Shift to a Safe System (Paris: OECD
Publishing, 2016). https://doi.org/10.1787/9789282108055-en
7
  World Bank, Good Practice Note on Road Safety (Washington, DC: World Bank, 2019). https://pubdocs.worldbank.org/
en/648681570135612401/Good-Practice-Note-Road-Safety.pdf
8
  International Transport Forum, “Best Practice for Urban Road Safety: Case Studies,” International Transport Forum Policy
Papers, no. 76 (2020).
9
  International Transport Forum, “Best Practice for Urban Road Safety: Case Studies.”
10
   Institute for Transportation and Development Policy, “Pune, India Wins 2020 Sustainable Transport Award,” last modified
June 27, 2019, https://www.itdp.org/2019/06/27/pune-india-wins-2020-sustainable-transport-award/


                                                              7
the Sustainable Mobility for All (SM4A) initiative, which highlights               FIGURE 1: Road safety is a
                                                                                   serious concern in low- and
safety as one of the pillars of sustainable mobility.11
                                                                                   middle-income countries
The World Bank hosts the Global Road Safety Facility (GRSF) to
provide funding, knowledge, and technical assistance to help de-
veloping countries create safer roads. The Facility addresses road
safety issues across a wide range of projects, from infrastructure
design and vehicle safety to traffic law enforcement, post-crash re-
sponse systems, data collection, and institutional strengthening.
Since its inception in 2006, the Facility has disbursed a total of
USD 44.6 million to improve road safety in 64 countries.
                                                                                   93%
It is important, and often required, to incorporate road safety                    of road fatalities occur in low- and
management procedures in transport projects to identify and                        middle-income countries, despite these
                                                                                   countries having 60 percent of the world’s
mitigate risks in a timely manner. Governments, international                      vehicles.
development organizations, and other agencies have established                       SOURCE: Original figure for this publication, based on
                                                                                                                           data from WHO.
various tools and systems to facilitate road safety analysis. How-
ever, the absence of valid, representative data presents significant
challenges to developing a good understanding of road safety risks
and reducing crash fatalities and injuries through data-driven, ev-
idence-based interventions.12

 World Bank, Good Practice Note on Road Safety, 1.
11

 World Bank, Guide for Road Safety Opportunities and Challenges: Low and Middle Income Country Profiles (Washington, DC:
12

2020). https://openknowledge.worldbank.org/handle/10986/33363


                                                            8
New technologies such as big data and machine learning (ML) provide promising opportunities to
improve existing data sources and methods for road safety analysis. From analyzing anonymized
GPS data to understand traffic flows in the Philippines to partnering with data providers that crowd-
source information about crash sites in Kenya, governments, World Bank task teams, and other
stakeholders are adopting innovative approaches to identify, monitor, and mitigate fatalities and in-
juries in high-risk areas.13 Unsupervised learning techniques have been applied in Lima, Peru, using
records of different crash types to identify safe areas along routes and safer pedestrian pathways,
decreasing the likelihood of pedestrians suffering an crash.14 The Urban Traffic Modeling and Control
project at the National University of Medellín has been using deep learning (DL) techniques to clas-
sify traffic and identify motorbike usage. In Cartagena, Colombia, data mining and ML algorithms
were used to analyze road records and predict the severity of traffic crashes using classification algo-
rithms.15 Figure 2 provides an overview of the potential uses of big data and ML in road safety analy-
sis that will be discussed in this note.

FIGURE 2:   Potential applications of big data and ML in road safety projects
BIG DATA OR
SPATIAL DATA
SOURCE
                       Street view            Satellite and          Internet of         Incident             Natural                     Social
                        imagery              aerial imagery            Things             reports           phenomena                     media

MACHINE/           Identify road           Delineate road        Analyze vehicle   Identify road crash   Find patterns in         Extract traffic or
                   conditions, barriers,   curvature, complex    and population    patterns and          weather and time         road condition data
DEEP               crosswalks,             intersections, road   movement          develop prediction    of day
LEARNING           pedestrian paths,       gradient; provide                       models
                   street signs, traffic   car and truck count
                   lights                                                                                   SOURCE: Original figure for this publication.




13
   World Bank, “Open Traffic Data to Revolutionize Transport,” last modified December 19, 2016, https://www.worldbank.
org/en/news/feature/2016/12/19/open-traffic-data-to-revolutionize-transport; Guadalupe Bedoya Arguelles, et al., “Smart and
Safe Kenya Transport (SMARTTRANS)” (Washington, DC: World Bank, 2019), https://documents1.worldbank.org/curated/
en/723411574361015073/pdf/Smart-and-Safe-Kenya-Transport-SMARTTRANS.pdf
14
   Jesús Lovón-Melgarejo et al., “Identification of Risk Zones for Road Safety through Unsupervised Learning Algorithms,” in
16th LACCEI International Multi-Conference for Engineering, Education, and Technology: Innovation in Education and Inclusion,
http://www.laccei.org/LACCEI2018-Lima/full_papers/FP413.pdf
15
   Holman Ospina-Mateus et al., “Using Data-Mining Techniques for the Prediction of the Severity of Road Crashes in
Cartagena, Colombia,” in Applied Computer Sciences in Engineering, eds. J. Figueroa-García et al., vol. 1052 (2019): 309-20,
https://doi.org/10.1007/978-3-030-31019-6_27


                                                                         9
PART 1:
The Demand for Data to Assess Risks and Conduct
Safety Assessments

1.1 Assessing Road Safety Across the Project Cycle

World Bank projects follow a project cycle to design, prepare, implement, support, and evaluate
projects. The project cycle identifies six stages between project identification and project completion
(see figure 3).16 Bank staff work closely with developing country borrowers throughout the project
cycle to ensure that projects meet relevant World Bank economic, financial, procurement, and envi-
ronmental and social standards.
The World Bank has adopted seven pillars to identify key priorities for road safety interventions.
These pillars, that aim at preventing road crashes, fatalities, and injuries across all projects include:
Road Safety Management, Safer Roads and Mobility, Safer Vehicles, Safer Roads Users, Post-Crash
Response, Safer Speeds, and Reduced Exposure. The first five pillars are from the UN Global Plan
for Road Safety with the last two new pillars added for the Road Safety GPN.17 Road safety objectives
of World Bank projects should be aligned with these pillars and performance indicators must track
progress against them.
All World Bank investment projects are required to follow the World Bank’s Environmental and
Social Framework (ESF), which went into effect on October 1, 2018. The ESF is a set of operational
policies and procedures designed to ensure that projects are economically, financially, socially, and
environmentally sound. The ESF includes protections for people and the environment from potential
adverse risks and impacts that could arise from Bank-financed projects and promotes sustainable
development. Within the ESF, ten Environmental and Social Standards (ESS) set out a range of re-
sponsibilities for Borrowers designed to help them manage project risks and impacts. In addition, the
standards aim to improve environmental and social performance, consistent with good international
practice and national and international obligations.
The World Bank’s ESF calls for road safety risks to be considered in all investment projects. As rel-
evant, Borrowers are required to undertake technical assessments and implement operational mea-
sures to avoid or minimize community exposure to project-related traffic and road safety risks. In the
context of the ESF, road safety assessments are carried out as part of a project’s Environmental and
Social Assessment (ESA). The overall approach to ESA is defined in the standard on Environmental
and Social Assessment (ESS1) that describes the requirements for project risk assessment, expec-
tations for stakeholder engagement, and for establishing grievance mechanisms. Details describing
road safety requirements are provided in the standard on Community Health and Safety (ESS4).
The standard on Labor and Working Conditions (ESS2) would also apply in situations where traffic
management measures are necessary to address the safety of workers and local communities in and
around construction worksites. The ESF standard on Stakeholder Engagement (ESS10) will also play
an important role in addressing road safety issues in most projects. The participation of road users


16
   The World Bank‘s Guidance Note on preparing the Project Appraisal Document for investment project finances may be
useful to prepare its content.
17
   World Bank, Road Safety Indicators for Project Monitoring (Washington, DC: World Bank, 2021).


                                                           10
of all types in the planning and decision making can provide essential user perspectives, information
and insights on all aspects of road safety, especially if users are expected to play an active role in
implementing project activities related to monitoring, incident reporting, and grievance and dispute
resolution.
ESS4 anticipates that project activities, equipment, and infrastructure can increase community
exposure to risks and impacts. To manage this risk, transport or transport related projects must
“identify, evaluate and monitor the potential traffic and road safety risks to workers, affected commu-
nities and road users throughout the project life-cycle.” The ESF requires Borrowers to “incorporate
technically and financially feasible road safety measures into the project design” to minimize road
safety risks and impacts.18 Where appropriate, the Borrower will initiate a road safety assessment
for each phase of the project, monitor incidents, and prepare regular reports reviewing outcomes and
observations.
The ESF standard on Stakeholder Engagement (ESS10) will also play an important role in address-
ing road safety issues in most projects. The participation of road users of all types in the planning
and decision making can provide essential user perspectives, information, and insights on all aspects
of road safety, especially if users are expected to play an active role in implementing project activities
related to monitoring, incident reporting, and grievance and dispute resolution. ESS10 requires the
preparation of a Stakeholder Engagement Plan which systematically identifies project stakeholders
and defines approaches and methods for meaningful engagement throughout the project cycle. Dif-
ferent stakeholders that could be affected by road safety include: all road users; project workers in-
volved in construction; affected communities; and vulnerable groups within those communities and
user groups. ESS10 also requires the preparation of project Grievance Mechanisms which could be
structured as one or more channels for raising concerns about road safety, contractor performance
or overall project implementation.
A Good Practice Note on Road Safety accompanies the ESF to support its implementation and
to address road safety on World Bank financed operations.19 The World Bank’s Road Safety GPN
guides Borrowers and World Bank task teams in meeting the ESS4 road safety requirements by im-
plementing the Safe System approach. Based on the guidelines recommended by the Global Plan for
the UN Decade of Action for Road Safety, the Safe System approach considers risks to all types of
road users, including drivers, motorcyclists, passengers, pedestrians, bicyclists, and commercial and
heavy vehicle drivers. The Safe System framework recognizes that while a certain degree of human
error and crash risk is always likely, it is possible to prevent crashes that lead to death or serious
injury. The Road Safety GPN recommends strategies and technical approaches to incorporate such
a holistic view of road safety that considers interactions among roads and roadsides, travel speeds,
vehicles, and road users. The document’s guidelines on evaluating risks across the project cycle in
various types of projects, and the data requirements of these procedures are discussed in the follow-
ing section.




18
   World Bank, Environmental and Social Framework for IPF Operations, ESS4: Community Health and Safety (Washington, DC:
World Bank, 2018).
19
   World Bank, Good Practice Note on Road Safety.


                                                           11
1.2 Demand for Data to Assess Road Safety

The Road Safety GPN recommends a variety of data-driven tools and methods to evaluate road
safety risks and determine mitigation measures across the project cycle. Comprehensive road
safety evaluation tools and procedures require both crash and non-crash data to identify issues and
measure their associated risks. The variety, quantity, and quality of data available is an important
determinant of the tool for measurement and analysis of various road safety indicators.
This section provides an overview of the primary road safety assessment tools that can be used at
different stages of the project cycle as well as their data requirements. Figure 3 summarizes the
primary road safety activities that may need to be included in the project cycle (depending on the
type of project and potential level of road safety risk). A brief description of road safety assessment
procedures and tools across the project lifecycle can be found in table 1. This brief review of existing
approaches informs the suggestions for improving data collection and analysis for road safety evalu-
ation procedures through big data and machine learning (ML).




                                                  12
FIGURE 3:   Key road safety activities across the project cycle

                    1   Identification                          KEY ACTIVITIES
                                                                Assess project risks (tables 1 and 2)
                               2 Preparation                    Conduct road safety assessments and develop mitigation measures (table 3)
                                                                Prepare ESF documents
                                                                Include road safety indications in the results framework
                                   3 Appraisal
                                                                Collect baseline data and define targets for indicators in the results framework

                                   4 Negotiations &             KEY ACTIVITIES
                                     Board Approval
                                                                Prepare Implementation Status and Results (ISR) report
                                                                Regular reporting in Aide Memoire, Memos, and minutes
                               5 Implementation
                                 Support
                                                                KEY ACTIVITIES
                    6 Completion &                              Prepare Implementation Completion Report: assess if targets for indicators in
                      Evaluation
                                                                the results framework were achieved
                                                SOURCE: Modified from Remote Project Supervision and Construction Management of IPF Projects. World Bank (2020).



TABLE 1:   Methods for calculating OPTRSR and identifying risk factors
TYPE OF          WHEN TO USE (PROJ- WHEN TO USE (PROJECT              RELATIVE COST         DATA                  EXAMPLES OF TOOLS
ASSESSMENT       ECT STAGE)         ACTIVITY)                         (HIGH, MEDIUM,        REQUIREMENTS
                                                                      LOW, DEPENDS)         (HIGH, MEDIUM,
                                                                                            LOW, DEPENDS)

Crash data-      Preparation,         Pre-Planning and                Depends, low-   Depends                     Crash frequency, crash risk factors,
based risk       Implementation,      Design, Monitoring              cost models are                             crash severity analysis
assessment       Post-Project         and Evaluation, Error           available
                 Operations           Correction and Hazard
                                      Elimination
Road Safety      Preparation          Pre-Planning and                Low                   Low
Impact                                Design
Assessment
(RSIA)
Road Safety      Preparation,         Planning and Design,            Medium to High Medium/                      iRAP Road Safety Audit Toolkit,
Audit (RSA)      Implementation       Construction and Pre-                          Depends                      Austroads Road Safety Audit Toolkit
                                      Opening                                                                     (currently unavailable), ADB Road
                                                                                                                  Safety Audit Toolkit
Road Safety      Implementation,                                      High                  High                  iRAP
Inspection       Post-Project
(RSI)            Operations
Road          Preparation, Post- Planning and                         High                  High                  iRAP, EuroRap, usRAP
Assessment    Project            Design, Independent
Program (RAP) Operations         Assessment
                                  SOURCE: Modified from Remote Project Supervision and Construction Management of IPF Projects (Washington, DC: World Bank, 2020).




                                                                            13
Assessing Overall Project Traffic and Road Safety Risk (OPTRSR)

At the identification stage of the project, Task Teams are required to assess the Overall Project
Traffic and Road Safety Risk (OPTRSR). Road safety risks arise from the interaction of many dif-
ferent elements, including the road and roadside design, engineering, travel speeds, the extent and
type of road use, road user behavior, vehicle safety features (both active and passive), and post-crash
response. The OPTRSR estimates potential traffic and road risks, and their associated risk level will
inform project Preparation and help define the Borrower’s responsibilities. Assessing OPTRSR also
requires the identification of road safety risks that could arise as a result of project activities, for ex-
ample, as a result of changing of vehicular or pedestrian traffic patterns, flows or speeds, or from the
use of construction equipment or vehicles. This assessment should also identify stakeholder groups
that could be affected (project workers, affected communities, or road and vulnerable road users),
and institutional risks (i.e., lack of regulations, technical-knowledge, or capacity). Operational road
safety risks should be addressed at this stage, not only in the context of the project implementation
and construction but also the long-term project operation. The OPTRSR will identify the road safety
risk level of the project as Low, Moderate, Substantial or High.20
The Road Safety GPN recognizes four different types of World Bank transport projects that require
estimating the OPTRSR. Type A projects include operations which involve road construction or reha-
bilitation (such as urban transport projects) or any project which affects existing infrastructure or re-
quires the creation of new transport infrastructure such as bus rapid transit lines, metro-lines, ports,
railways and aviation infrastructure. Type B projects encompass other transport initiatives which
do not finance transport infrastructure directly but which introduce policy changes or management
measures intended to promote road safety. These may include measures such as changes to traffic
speed; regulations on allowable traffic mix or volume; protections for vulnerable road users (pedes-
trians, bicyclists, motorcyclists); or other changes affecting vehicles, routes or facilities (e.g., vehicle
import regulations). Type C projects primarily involve transport infrastructure construction with
road safety impacts during the construction period only. Type D projects involve vehicle procure-
ments, such as procurement of bus fleets or even project vehicles. OPTRSR can arise in any project
as a result of the road infrastructure, operating speeds (km/h), road user behavior, vehicle standards,
and/or post-crash trauma care.21
Different methods may be implemented for assessing the OPTRSR for each project type at the proj-
ect identification stage. Based on data availability, and project type, the assessment of risk should
consider all these factors: road infrastructure, operating speeds, road users, vehicle standards, and
post-crash trauma care (in particular, response time and readiness of emergency care staff), three
methods may be used for identifying the potential traffic and road risks and their associated level in a




20
   The principal purpose of this report is to emphasize and explain the OPTRSR risk rating and the methodologies for
estimating those risks. The reader should take care to note that the OPTRSR risk rating is distinguished from the project’s
overall Environmental and Social risk ratings which are required for every project under the World Bank’s ESF. While the
overall E&S risk rating uses similar terminology, its purpose is to define the entire project risk profile taking into account all
environmental and social risks and impacts. The overall project E&S risk rating takes account of the OPTRSR rating but there
is not necessarily a direct correlation between them (i.e., a high OPTRSR rating may not necessarily be categorized as high
E&S risk and vice versa). Each investment project will make the final determination of overall E&S risk rating and the OPTRSR
rating on a case-by-case basis.
21
   According to Annex 3 in the Road Safety GPN, the Borrower and task team should ensure that the scope of the assessment
is proportional to the potential risks and estimated Fatalities and Serious Injuries (FSI) for the project. This may vary for
different project types. The OPTRSR process helps determine what further assessments will be relevant to the project.


                                                               14
project. The Road Safety GPN recommends identifying ratings and risk levels for each user group as
Low, Moderate, Substantial or High.22 Table 2 provides an overview of these methods.
Method I: Crash data-based risk assessments are the most reliable method for estimating the OP-
TRSR for Type A projects. This method effectively captures the first three criteria (infrastructure,
users and speeds), and will also reflect the other two criteria (vehicle standards and post-crash trau-
ma care). It is the go-to method when reasonable crash data from the previous three to five years is
available for the road or can be estimated from data available from similar road(s) in the country and
it can be used to inform the expected levels in the project. Crash data is evaluated along with an as-
sessment of vehicle standards and post-crash trauma care to calibrate the overall risk.
Method II: When reasonable crash data is not available, and iRAP analysis of the existing road is
available, iRAP results and estimated risks for other factors could be used. Dedicated to saving lives
through safer roads, the International Road Assessment Programme (iRAP) provides tools and train-
ing to help countries make roads safe. iRAP Star Ratings are an objective measure of the likelihood
and severity of road crashes. iRAP results are often used to deliver broad network level analysis that
provide road authorities and others with risk assessment. The focus is on identifying and recording
road attributes which influence the most common and severe types of crashes based on scientific evi-
dence-based research. This approach determines the risk level of a specific road segment or network
without requiring detailed crash data, which is advantageous for developing countries where data
may be limited. One-star (black) roads are the least safe – a person’s risk of death or serious injury is
highest on these roads – while five-star (green) roads are the safest.23
Method III: When crash data and iRAP Star Ratings are unavailable, subjective estimates of road
infrastructure risk and estimated risks for other factors should be used. In the absence of sound
crash data, exposure and relative risk can be estimated especially based on WHO estimates for coun-
tries, volume by transport mode, well-established relationships between risk and operating speeds
and other road design and operating features. Road infrastructure risk can also be estimated by ana-
lyzing attributes of the existing infrastructure, such as the extent of separation of pedestrians from
traffic and crossing locations, extent of median separation, and presence of roadside safety barriers
as well as dedicated bike, or motorcycle lanes. For both Methods II and III and for Type B, C or D
projects, the OPTRSR is estimated as the weighted average of each of the identified risks.




22
   The Directive for implementing the Environmental and Social Policy for Investment Project Financing (October, 2018)
Section III C defines these risks with regard to crashes as: High: “high probability of serious adverse effects to human
health…”; Substantial: “there is medium to low probability of serious adverse effects to human health … and there are known
and reliable mechanisms available to prevent or minimize such incidents”; Moderate: “low probability of serious adverse
effects to human health”; and, Low: “if its potential adverse risks to and impacts on human populations … are likely to be
minimal or negligible.”
23
   iRAP (International Road Assessment Programme), iRAP Star Rating and Investment Plan Implementation Support Guide
(London: iRAP, March 2017).


                                                            15
TABLE 2:   Methods for assessing OPTRSR and identifying risk factors
METHOD                RISK FACTORS                                                           DATA REQUIREMENTS

Crash data-based      •	 FSI crashes                                                         •	 Crash data from the previous 3–5 years or
risk assessment                                                                                 estimated from data available from similar
                                                                                                roads in the country
                                                                                             •	 Assessment of vehicle standards (safe
                                                                                                vehicles)
                                                                                             •	 Post-crash trauma care (response time, quality
                                                                                                of attention)
iRAP Star Rating      •	 iRAP Star Rating – existing conditions for vehicle occupants        •	 iRAP scores (Low, Medium, Substantial, High)
                      •	 iRAP Star Rating – existing conditions for motorcyclists (if        •	 Estimates for non-infrastructure risks
                         motorcycles are present on the road or likely to be present
                         post-project)
                      •	 iRAP Star Rating – existing conditions for bicyclists (if
                         bicycles are present on the road or likely to be present post-
                         project)
                      •	 iRAP Star Rating – existing conditions for pedestrians (if
                         pedestrians are present on the road or roadside or likely to
                         be present post-project)
                      •	 Assessment of non-infrastructure risks: operating speeds,
                         road users, vehicle standards, and post-crash trauma care
Estimating road       •	 Extent of separation of pedestrians from traffic with               •	 Subjective estimates of road infrastructure risk
infrastructure risk      provision of safe walking spaces and crossing locations (if            for each of the risk criteria
without crash or         pedestrians are present on the road or roadside or likely to        •	 Estimates for non-infrastructure risks
iRAP data                be present post-project)
                      •	 Extent of roadside safety barriers (omit this factor from
                         consideration if the operating speed is <40 km/h)
                      •	 Extent of median separation (omit this factor from
                         consideration if the operating speed is <60 km/h for a rural
                         road and <40 km/h for an urban road)
                      •	 Extent of separate well-designed motorcycle lanes (if
                         motorcycles are present on the road or roadside or likely to
                         be present post-project)
                      •	 Extent of separate off-road bicycle lane (if bicycles are present
                         on the road or roadside or likely to be present post-project)
                      •	 Assessment of non-infrastructure risks: operating speeds,
                         road users, vehicle standards, and post-crash trauma care
                                                                                                                          SOURCE: Road Safety GPN.



The Road Safety Screening and Appraisal Tool (RSSAT) developed by the Transport Global Practice,
is required for all World Bank transport projects (Type A) and also recommended for other projects
that may involve road safety risks. Its results must be reported in conjunction with the OPTRSR. The
RSSAT tool (Method IV) considers the likely fatality rate with and without the project and it is designed
to undertake a quick road safety screening of World Bank projects during the concept and preparation
stages. It evaluates the safety effects of different design options, and conducts a cost-benefit analysis of
the project’s impact on road safety, estimating change in potential Fatalities and Serious Injuries (FSI)
due to the project. At the identification stage, RSSAT should be applied, and the results reported in con-
junction with the OPTRSR.24 RSSAT does not require crash data to identify likely change in FSI risk, and
it is now required for all World Bank financed transport projects to estimate the economic cost of road
crashes on project roads.Type A projects should demonstrate Project Safety Impact of 1 or below for all
road segments before approval. Table 3 summarizes the data requirements for RSSAT.



24
     World Bank, Good Practice Note on Road Safety.


                                                                      16
TABLE 3: The   World Bank Road Safety Screening and Assessment Tool
METHOD         PROJECT SAFETY COST/BENEFIT IMPACT      DATA REQUIREMENTS

RSSAT          •	 Project safety impact analysis and   Baseline and projected estimates for:
                  safety impact model                  •	 Fatalities by mode
                                                       •	 Speeds by fleet type
                                                       •	 Segment characteristics and road features
                                                       •	 Traffic flows
                                                                              SOURCE: Road Safety GPN.



Traffic and Road Safety Assessments

During the project Preparation stage, the Borrower may need to conduct more in-depth assess-
ments to identify and evaluate potential traffic and road safety risks. When traffic and road safety
issues are likely to be significant for the community or road users, the objective of the road safety
assessment is to consider these risks in more detail to determine the most appropriate mitigation
(control) measures that can be implemented in the project. The assessment should consider the Safe
System principles to confirm that all opportunities to minimize risks have been realized. The Safe
System approach addresses all of these interactive elements in an integrated manner and emphasizes
sharing accountability with designers and users of the road network to achieve road safety targets.25
Assessments prepared early in the project cycle help to identify and evaluate potential traffic and
road safety risks that may arise from the project activities and/or their implementation. Such as-
sessments are intended to help the Borrower mobilize appropriate resources, analyze risks in detail,
and identify and adopt the most appropriate mitigation measures. This assessment also guides the
preparation of the environmental and social documents, such as the Environmental and Social Im-
pact Assessment (ESIA), Environmental and Social Management Plans (ESMP), and the Environment
and Social Commitment Plan (ESCP).
For projects with High or Substantial road safety risks, assessments should be completed before
the project is fully appraised to inform project objectives, components and activities, and the re-
sults framework.26 Type A projects, or Type B and C projects with major construction activities re-
quire more robust or detailed assessments. Substantial and High-risk projects should, as a minimum,
include intermediate indicators related to traffic and road safety risk mitigation. Table 4 summarizes
the different types of assessment tools (Methods V-VII) that can be used for this purpose as well as
their data requirements.
One or more of these assessments may be conducted at once or at different phases of project
Preparation. Road Safety Audits (RSA) and Road Safety Impact Assessments (RSIA) involve exam-
ining a traffic project, which may involve new construction or altering an existing road, to improve
traffic and road safety performance. An RSA is a formal procedure to assess the crash risk potential
and expected safety performance of a design for a road or traffic scheme. RSIA is a strategic assess-
ment of the impact of different planning options. Safe System Assessments (SSA) evaluate the design
against Safe System principles to confirm that all opportunities to mitigate risks and maximize road
safety have been realized.



25
   Tony Bliss and Jeanne Breen, “Meeting the Management Challenges of the Decade of Action for Road Safety,” IATSS Res., 35
(2012): 48–55, https://doi.org/10.1016/j.iatssr.2011.12.001
26
   For projects with Moderate or Low safety risks, the Borrower and the Bank may agree on more flexible timelines for the
completion or road safety assessments and/or mitigation or management measures. Such agreements would be specified In
the project’s ESCP.


                                                             17
These assessment procedures enable road safety engineering and crash analysis to be used for the
prevention of crashes on new or modified roads. They can be conducted at different stages of the
project cycle to identify key road safety challenges to guide designers, confirm that safety elements
are correctly captured, check for any unsafe feature not apparent at previous stages and check that
all the design details have been correctly implemented, identify deficiencies that need to be correct-
ed, or to evaluate the road’s performance with traffic and determine areas that require further atten-
tion. The earlier road safety risks are assessed within the design and development process the better
to ensure that safety is fully integrated into all elements of the project’s infrastructure, with minimal
risk of redesign or physical rework at a later stage.
The main data needed to perform these types of assessments are FSI, traffic flows, and road fea-
tures. Data analyses, modelling or estimates quantify and forecast traffic volumes and road crash
FSI. Depending on data availability, these would aim to identify crash locations and crash types,
at-risk individuals and groups, and key risk factors influencing exposure to risk, crash involvement,
crash severity and post-crash outcomes. Even in the absence of sound crash data, exposure and
relative risk can be estimated based on estimates for countries, volume by transport mode, well
established relationships between risk and operating speeds, and other road design and operating
features. Capacity reviews to assess the efficiency and effectiveness of road safety measures can be
relevant when the project involves road safety policy change.




                                                   18
TABLE 4:   Overview of primary tools for traffic and road safety assessments
METHOD                     OBJECTIVES                                 DATA REQUIREMENTS

Road Safety Audits         Identify safety concerns. It audits the    Analysis of project designs and interventions: specialists assess
(RSA) (performed by an     safety of the specific design of the       road options, such as intersections, signs, crossings; design
independent team of        chosen scheme.                             standards, and the relationship of this intervention to main
specialists)                                                          network. Main data needed includes:
                                                                      •	 Scheme plans
                                                                      •	 Crash and FSI data
                                                                      •	 Traffic mix and volumes
                                                                      •	 Road features (e.g., design elements, such as bypasses, cycle
                                                                         routes, junction improvements, installation of traffic signals,
                                                                         roundabouts, traffic calming, bend realignment, safety fence
                                                                         schemes and pedestrian crossing facilities)
Road Safety Impact         Assess the impact of each of the           The evaluation of each alternative is based on several factors,
Assessments (RSIA)         planning options on the safety             some of which includes:
(performed by members      performance of the current road            •	 The scheme objectives
of the project design      network. It estimates the impact           •	 Crash and FSI data
team with road design      of possible schemes on safety for          •	 Traffic mix and volumes
and road safety auditing   an entire geographic area at the
                                                                      •	 Road features
experience)                strategic level.
                                                                      •	 Categorization of roads and streets of that network
Safe System Assessment     Assess how closely road design and         The core of the SSA approach is the “Safe System Matrix”
(SSA)                      operation align with the Safe System       framework, which is essentially a risk assessment. The
                           objectives, and to clarify which           assessment is done by scoring the risk exposure, likelihood
                           elements need to be modified to            and severity from 0–4. The Austroads approach can be used to
                           achieve closer alignment with these        perform this type of assessment. Data needed includes:
                           objectives.                                •	 Traffic mix and volumes
                                                                      •	 Road features
                                                                                                                    SOURCE: Road Safety GPN.



Since the key objectives of these assessments (i.e., identifying risk elements and estimating crash
exposure, likelihood, and severity for different road users) are complex and not standardized,
the scoring system is subjective. This can complicate comparisons between sites, especially when
these have been assessed by different individuals or teams. It is, therefore, usually most suitable for
comparing options at a single site, identifying sources of risk and identifying solutions, rather than
for comparing different sites.

Results Frameworks and Monitoring Plans

In addition to these assessments, a Results Framework that articulates the expected outcomes and
impact of the project on road safety should also be developed before project Appraisal. A Results
Framework is a management tool that presents how the development objective(s) of an operation will
be evaluated, measured and monitored, based on the results chain (outputs, outcomes, and impacts).
The Results Framework is based on the Project Development Objective (PDO) that indicates expected
project outcomes. Depending on project design, intermediate indicators for each project component
can be used to track implementation progress including the units of measurement, baselines, and
final target for each indicator. Such details are typically provided in the project’s Monitoring Plan.




                                                                     19
The Results Framework and Monitoring Plan should include a TABLE 5: Example of indicators
road safety indicator with baseline and target values. The Trans- that may be included in
                                                                        the Results Framework
port Global Practice has committed to including a road safety indica-
                                                                        EXAMPLE OF INDICATOR TYPE OF DATA THAT
tor in all road projects and to increase the road safety focus of urban                      CAN BE COLLECTED

mobility projects. All substantial and high-risk projects should in- Reduction of road Crash data
clude at least one indicator that addresses road safety in the Results crashes
Framework or as a Disbursement Linked Indicator, as relevant.27         Speed reductions     Traffic flows
                                                                                       Increased use of   Number of helmet
There are two types of indicators that should be considered. The helmet and seat belts and seat belt users
first kind are intermediate indicators, which mark the progress             SOURCE: Original table for this publication.

toward fulfilling the development objectives before the final project
outcomes are achieved (these may also measure progress in project outputs). Some examples of inter-
mediate indicators that may be relevant to transport projects include the number of speed managing
devices installed and safety audit compliance. The second are outcome indicators, which evaluate the
uptake, adoption, and use of outputs by the target group within the project period. FSI is considered
the most important indicator for monitoring the outcome of road safety interventions.28 Table 5 pro-
vides some examples of indicators that can be included in the Results Framework, as well as the type
of data that can be collected to monitor and evaluate them.
Change in FSI is the most frequently tracked metric for impact evaluation of projects and inter-
ventions for monitoring the outcome of road safety interventions. In cases where data cannot be
obtained, other methodologies to estimate safety risks can also be used. Projects need to undertake
baseline data collection to not only establish the appropriate project interventions to address road
safety risks, but also as a way of assessing whether the project will improve or worsen the situation.
Target values are to measure progress towards a particular indicator. For example, the number of
workers killed (zero baseline because the project has not started; and zero target because the objec-
tive should always be to avoid fatalities). This indicator should be based on one or more of the World
Bank’s seven road safety pillars.
During the Implementation phase, the focus shifts toward executing planned activities, and mon-
itoring and evaluating indicators. Activities that are included in the Project Appraisal Document
(PAD) are to be carried out during this phase. When key information or data for indicators included
in the results matrix must be collected, it is important that procurement processes and supervision
activities are planned and executed in a timely fashion to achieve expected results. It is also vital
that the project design includes close monitoring of the safety performance until the project closes.
In some cases, impact evaluations may also be required to monitor the long-term effects of imple-
mented interventions. For example, the results matrix would identify the extent of progress towards
achieving a particular milestone, like enumerating the number of physical features to separate traffic
(e.g., footpaths, cycle lines, traffic signals) installed in the project to address the safety of vulnerable
group users, such as pedestrians, bicyclists, or motorcyclists.
The Implementation Completion Report (ICR) addresses the targets achieved at the completion of
the project. At project completion, the ICR carries out an ex-post analysis of project interventions,
and measures outcome and intermediate indicators from the results framework to assess wheth-
er targets were achieved during implementation. The ICR will collect the indicators for the results
framework for the last time to evaluate whether PDO and intermediate indicators meet their targets.


27
     World Bank, Road Safety Indicators for Project Monitoring (Washington, DC: World Bank, 2021).
28
     World Bank, Road Safety Indicators for Project Monitoring.


                                                               20
If, for example, a PDO is to contribute to the reduction of road traffic injuries and fatalities in selected
corridors, with intermediate indicators to track progress towards some interventions, like imple-
menting a certain number of physical features to separate traffic, the ICR will quantify this at the end
of the project cycle so it can be compared with baseline indicators and expected targets.

Key Challenges with Current Approaches to Road Safety Analysis

Since data is the cornerstone of all road safety assessments, the availability of high quality, reli-
able data is key to extracting useful, actionable insights and improving road safety conditions.
Without quality information, it is difficult to estimate crash locations and crash types, at-risk individ-
uals and groups, and key risk factors influencing exposure to risk, crash involvement, crash severity,
and post-crash outcomes. Meeting data requirements for road safety assessments can be a challenge
for various reasons, such as the lack of open data, or data collection costs.
There can be a lack of adequate crash data or road ratings in data scarce countries and regions
for identifying risk factors (Methods I to III). Governments often lack adequate and reliable data
to identify road safety risks and perform road safety assessments. In addition, road crashes tend to
be underreported, especially in LMICs. There may also be significant gaps in the data in terms of
geographic or temporal coverage, or the data may be missing important variables and categories.
Access to data can also be limited for certain data types, or the process of obtaining the data may be
too complex, costly, and time-consuming.


                                                    21
Collecting data on road safety attributes through manual detection or special equipment can be
expensive, time-consuming, and complex.29 Budgeting for data collection can be a challenge for
both Borrowers and World Bank task teams, especially for Methods I to IV which are required at
the project identification stage. In these cases, data is most often estimated through existing road
designs or by local transportation agencies. For Methods V to VII, the most cost-effective method for
data collection is the installation of cameras and sensors that record street imagery, speed informa-
tion and other data. Images and video are then analyzed by road safety experts to identify relevant
attributes, assess road conditions and identify potential risks. Commissioning equipment and hiring
resources to manually collect data on road features and design may be a hindrance, especially for
smaller-scale projects where the opportunity to benefit from economies of scale is low.
In addition to the quality and availability of data, preparing and analyzing road safety data can also
be costly, resource-intensive, and technically demanding. Most road safety assessments require
data to be combined from various sources, which often involves aggregating, cleaning and preparing
the data. Additional resources and specialist expertise may be necessary for this process, and also to
analyze the data and extract useful insights using methods such as clustering and developing spatial
models. Conventional statistical techniques can also be limited in their ability to identify complex
correlations and underlying factors that may contribute to road safety risks across various projects.
The purpose of this Guidance Note is to identify new methods for the collection and analysis of
road safety data that could overcome the limitations of existing approaches, and also improve
their efficacy in identifying risks and opportunities to mitigate crashes. Conducting road safety
assessments is a required component of most road investment and infrastructure development proj-
ects. Advanced technologies such as big data and ML have the potential to not only supplement
existing methods, but also significantly reduce costs while improving the efficacy of road safety as-
sessments in identifying risks and opportunities to mitigate crashes.
The following section explains how big data and ML can be practically implemented by Borrowers
and World Bank task teams for various road safety assessment procedures that are required by
World Bank investment projects at various stages of the project cycle. It introduces these methods
and provides an overview of big data sources and ML techniques that are useful for road safety as-
sessments (tables 6 to 10). Part 2 also discusses best practices and key considerations that are vital to
implementing these new methods effectively. A framework for integrating these technologies in road
safety assessments is also proposed, and subsequent sections demonstrate how this framework can
be applied in LMICs through two original case studies.




 OECD (Organisation for Economic Co-operation and Development)/ITF (International Transport Forum), Big Data and Transport:
29

Understanding and Assessing Options (Paris: OECD/ITF, 2015), https://www.itf-oecd.org/sites/default/files/docs/15cpb_bigdata_0.pdf


                                                               22
PART 2:
Big Data and Machine Learning to Strengthen Road
Safety in Transport Projects

The World Bank and Global Road Safety Facility are keen to use new technologies, such as big
data and ML, in data collection and analysis for road safety to overcome the limitations of existing
approaches. As these technologies become more sophisticated and accessible, a growing body of re-
search indicates their potential to complement, and eventually even surpass conventional methods.
World Bank teams have demonstrated various applications of big data and ML in road safety and
other transport and infrastructure projects over the past few years. For example, a task team de-
veloped an open data platform in 2015 based on a pilot in Cebu City, Philippines, which sourced data
from a taxi company to generate insights for traffic management.30 Another team has developed a
“Simplified Methodology” to implement ML in video analysis to extract data on road attributes. The
new tool was piloted across over 500 kilometers of road in Mozambique and Liberia in 2019.31 The
World Bank, in collaboration with the Philippine government, has also launched the Data for Road
Incident Visualization Evaluation and Reporting (DRIVER) system to facilitate data sharing for road
safety analysis. This free web-based, open-source platform connects traffic crash data from multiple
agencies through a standardized reporting system. DRIVER also provides tools to geo-spatially an-
alyze road crash data, predict blackspots, estimate the economic costs of crashes, and evaluate the
effectiveness of various interventions to support investments and policy-making for improved road
safety.32
World Bank teams are increasingly turning to data partnerships to obtain crash, traffic, and oth-
er types of data for road safety analysis. For example, in Kenya, the WHO estimates that up to 75
percent of crashes go unreported.33 SmarTTrans – a collaboration between the Kenyan government
and the World Bank – has worked to fill this gap by bringing together crash information both from
administrative records and from bystander crash reports from Twitter.34 In addition, the team has
leveraged the Development Data Partnership (DDP) to access Waze API and Uber congestion and
speed information for all 6,200 km of the city’s road network. Using all data sources, the smarTTrans
team is creating near real-time analytics to facilitate the identification of crash hotspots, speeding,
and congestion patterns.




30
   World Bank, Open Traffic: Easing Urban Congestion (Washington, DC: World Bank, n.d.), https://olc.worldbank.org/system/
files/WBG_BD_CS_OpenTraffic_1.pdf
31
   World Bank, Innovative Road Safety Risk Assessment Tool with Automated Image Analysis Technology (Washington, DC: World
Bank, 2019).
32
   World Bank, GRSF DRIVER Completion Report (Washington, DC: World Bank, 2019), https://documents1.worldbank.org/
curated/en/245151560919065747/pdf/Data-for-Road-Incident-Visualization-Evaluation-and-Reporting-Lowing-the-Barriers-to-
Evidence-Based-Road-Safety-Management-in-Resource-Constrained-Countries.pdf
33
   WHO, Global Status Report on Road Safety 2018.
34
   Sveta Milusheva et al., “Applying Machine Learning and Geolocation Techniques to Social Media Data (Twitter) to Develop a
Resource for Urban Planning,” PLoS ONE 16, 2 (2021),
https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0244317


                                                             23
2.1 New Data (and Big Data) in Road Safety Analysis

Big data is generally understood as extremely large datasets that are generated by a wide range of
data sources, including machines, sensors and other Internet of Things (IoT) devices. Big data can
also be captured over the internet through social media and other types of applications, especially
those that track locational or transactional data.
The large volume of such data is one of many characteristics that make big data especially useful
for road safety and other applications in transport and infrastructure development. For example,
big data can be generated at immense velocity, especially as more such data is collected real-time
and for large populations. It also occurs in a variety of data formats, from structured databases to
unstructured text documents, emails, videos, audios, stock ticker data and financial transactions. Big
data is also characterized by a high degree of variability since data flows can change over time, de-
pending on seasons, off-peak hours or availability of collection methods across an entire population
under study. Table 6 provides a SWOT analysis of the use of big data in road safety analysis.
For transport, the increasing use of personal mobile devices and vehicle sensors to collect traffic
and location data presents a significant opportunity to augment traditional sources of transport
data. Annex 1 discusses the most relevant big data types for road safety analysis. It also provides
guidance on the potential applications of these sources for evaluating road safety, and the advantages
and disadvantages of each source. The following sections discuss how big data can be used for the
various road safety assessment methods and tools discussed in Part 1.




                                                 24
TABLE 6:   SWOT analysis of using big data in road safety analysis
STRENGTHS                                                                 WEAKNESSES

•	 Recent and broad geographic coverage allows researchers to             •	 Requires investment in expertise, software and computing
   dive deeper into transport issues and get a comprehensive and             power to store, access and process big data.
   current picture of risks.                                              •	 Availability of data can vary significantly by geography and
•	 Can help obtain real-time data and track up-to-the-minute                 context.
   changes in traffic flows and other important variables.                •	 Coverage can be inconsistent or exclude important segments
•	 May be faster and easier to obtain and process, compared to               of the population.
   manual collection.                                                     •	 Most big data sources are not set up to support road safety
•	 Can offer higher spatial and temporal resolution than                     assessments—it is often data that was collected for other
   conventional sources.                                                     purposes but gets repurposed for road safety analysis. This
•	 Can be more affordable and easier to scale.                               can lead to the data being biased, incomplete and/or difficult to
                                                                             incorporate in road safety analysis.
•	 Vast quantities of data can limit bias from outliers and other
   sources of “noise” since data gets aggregated across vast              •	 Need to consider the interoperability of different datasets (i.e.,
   populations.                                                              how easy it is to combine different datasets for complex road
                                                                             safety assessment models).
•	 Can help improve data quality since often covers large
   geographic and/or temporal scope, also allowing for                    •	 Changes in privacy laws and other relevant policies can impact
   comparison against “control” datasets and scenarios.                      quality, consistency and coverage of data.

OPPORTUNITIES                                                             THREATS

•	 Provides an alternative approach to road safety data collection        •	 Privacy concerns – data should be de-identified and
   and analysis that may complement or supplement traditional                anonymized before use.
   approaches or datasets. For example, big data sources may be           •	 Data providers may be reluctant to share data.
   able to collect more accurate crash data.                              •	 Governments, local municipalities, and other stakeholders
•	 Big data analysis can uncover new dynamics, complex                       must invest in technological infrastructure to support big data
   behavioral patterns and relationships, and correlations that              collection and analysis.
   conventional statistical methods and data may not be able to           •	 Need to enforce quality control to limit risk of data bias.
   detect.
                                                                          •	 Licensing constraints – most private companies, such as
•	 Growing interest in autonomous vehicles is generating more                Google, provide limited licenses for data use.
   data about road systems, vehicles, and vulnerable users that
   can be integrated into road safety analysis.
•	 Rising momentum for the creation of a “big data platform”
   where data providers can sell or share data.
                                                                                                            SOURCE: Original table for this publication.



Big data, especially when combined with ML, which is discussed in the following section, can
enhance the capabilities of current systems and road safety assessment tools. The increasing use
of IoT devices, which range from smartphones to vehicle sensors, as well as Intelligent Transport
Systems (ITS), is making it possible to collect, access and utilize real-time data about a large range
of variables that are relevant to road safety analysis. This includes traffic flows, crash sites, peak
timings, travel times and road usage by pedestrians, bicyclists, and motorists. The availability of
such extensive data creates new possibilities for crash risk modelling, especially to predict the out-
comes of various types of road safety interventions as well as possible impacts of road infrastructure
projects.
As mobile phone use rises globally, smartphones have become a prominent source of big data,
though there are many other sources to consider. In addition to the location and velocity of road
travelers collected passively through mobile devices, transportation projects can take advantage of
street view, aerial, and satellite imagery, traffic monitoring systems, connected vehicles for road safe-
ty analysis, as well as crowdsourced data provided by the community through mobile devices.35 An-
nex 2 provides an overview of the most relevant and accessible big data sources for World Bank task
teams and is a useful starting point to find relevant data sources. TTLs are advised to look for relevant


35
   Alex Neilson et al., “Systematic Review of the Literature on Big Data in the Transportation Domain: Concepts and Applica-
tions,” Big Data Res. 17 (2019): 35-44. https://doi.org/10.1016/j.bdr.2019.03.001


                                                                     25
local and regional data providers based on the region(s) of interest that concern their project(s). As big
data infrastructure advances globally and new companies and startups begin data collection for var-
ious purposes, it is likely that the list of available big data sources in World Bank member countries
will expand significantly in coming years.
Street view imagery can complement or potentially substitute manual or commissioned road sur-
veys to collect data on road safety attributes for various types of assessments. For example, street-
view imagery can help obtain baseline data for RSIA more quickly and cheaply, especially if the data
is not already readily available. By applying ML algorithms to street view images, road attributes
and other data can be detected that are important for road safety assessments. Similarly, there may
be instances where satellite imagery or aerial imagery, those collected by an unmanned aerial vehi-
cle (UAV) or drone, can be analyzed to detect road or road user attributes. Figure 4 shows the same
crosswalk visible in satellite imagery and street view imagery using OpenStreetMap in OSM. ML is
discussed in greater detail in the next section.

FIGURE 4:   Street view and OSM
Road safety data can be extracted from images such as road markings and signs, types of road users, and designated paths for
vulnerable users. Each image and relevant attributes are geolocated for further analysis. In this instance, the crosswalk identified in
OSM can be verified in street view imagery.




                                                            SOURCE: Original figure for this publication derived from OSM, Mapillary, and Maxar Technologies.



Mobile applications and telematics can provide data related to vehicle movement to identify road
infrastructure risks. This data includes current and historical average speeds along road segments
as well as irregularities, like traffic jams and incidents. This data is useful for most proactive road
safety assessment tools, including RSIA, RSA, and RSI. It can be geographically visualized and ana-
lyzed, such as through heatmaps or hotspot analysis as shown in figure 5 (see Annex 3 for additional
examples and descriptions). Telematics data has also been used to assess driver behavior, facilitate
the prediction of crash-prone locations and create geographic visualizations, as discussed in inter-
views with researchers at the ARRB and Professor George Yannis from the National Technical Uni-


                                                                  26
versity of Athens. However, data privacy is an especially important concern when it comes to the use
of telematics data.36

FIGURE 5:   Hotspot analysis of major crashes reported by Waze application users

     Bogotá, Colombia
     Waze Major Crash
         Cold Spot - 95% Confidence
         Cold Spot - 90% Confidence
         Not Significant
         Hot Spot - 90% Confidence
         Hot Spot - 95% Confidence
         Hot Spot - 99% Confidence




                 SOURCE: Original figure for this publication (data provided by Waze App; learn more at waze.com). Basemap provided by Esri, HERE, Garmin, METI/NASA, USGS.



Mobile applications are helping overcome underreporting of road crashes by crowdsourcing inci-
dent reports. For example, in Kenya, road crashes have been shown to be largely underreported, es-
pecially in areas where incident reporting mechanisms are lacking or underdeveloped.37 Navigation
applications such as Waze are providing a valuable new source of crash and traffic data by allowing
users to report incidents through their smartphone applications. Each incident report submitted by
a user is geolocated and timestamped, which allows it to be combined with other geospatial data to
identify segments of a road that are experiencing major or minor crashes, light to stand still traffic
jams or hazardous conditions (hazards on the road or on the shoulder, weather alerts or dangerous
road surfaces). Additionally, social media platforms like Twitter are used by many people on the
ground to report on crashes and traffic conditions and can be leveraged using machine learning al-
gorithms to produce additional data on crashes, as was done by the smarTTrans team in Nairobi.38
Lastly, mobile application data can be generated in real-time to assist with monitoring or collected
and analyzed over time to develop models.

36
   Anthony Germanchev (Principal Professional Leader, Advanced Technologies Lab, Australian Road Research Board) and
Professor George Yannis (School of Civil Engineering, National Technical University of Athens), in discussion with the
authors, April 2021.
37
   Guadalupe Bedoya Arguelles, et al., “Smart and Safe Kenya Transport (SMARTTRANS).”
38
   Sveta Milusheva et al., “Applying Machine Learning and Geolocation Techniques to Social Media Data (Twitter) to Develop a
Resource for Urban Planning.”


                                                                                  27
A growing number of countries and regions are focusing on developing a big data infrastructure
to collect official incident reports. Collecting comprehensive and accurate information about road
incidents is an important objective for government transportation agencies. There is growing inter-
est in gathering and analyzing the information in big data formats to provide deeper and more com-
prehensive insight into road safety risks and the impact of different interventions. The collection of
real-time data would also be beneficial for this purpose, for which collecting, storing, and analyzing
the information as big data would be most realistic and feasible.

How to Access Big Data

Big data for road safety generally falls into two categories: public sector and private sector. Tra-
ditionally governments have collected and provided data for road safety analysis, such as police re-
ports of crash incidents. However, alternative sources are becoming increasingly available as mobile
apps are used to crowdsource reports of roadside incidents and companies aggregate traffic speeds
from proprietary mobile applications. Often data quality from such sources can vary significantly by
location, with certain sources being more effective, reliable, and better developed in some regions
compared to others. Task teams are advised to use the list provided in Annex 2 as a starting point and
find the most relevant data providers for their project’s region(s) of interest.
This Guidance Note focuses on big data sources that are most easily and readily accessible to
World Bank task teams. Different sources require different approaches to obtaining relevant data
quickly and efficiently. It is important to understand the licensing restrictions that accompany each
source. For example, even though a dataset is crowdsourced, it may have licensing restrictions. It is
best to consult the World Bank Legal team and data provider to clarify terms of use when necessary.
Public sector. Governments can collect, manage, and share data relating to transport, infrastruc-
ture, and mobility. Many governments, whether at the national level or even local municipalities, are
establishing open data platforms where datasets can be accessed by running a simple search query.
Such platforms have already been created in the Philippines as well as in Australia and the United
States.39 In other instances, particularly where the data infrastructure is not as advanced, data may
have to be requested through the relevant department. It is often possible to obtain datasets relating
to crash histories or collected by road sensors from government sources which are extensive enough
to be processed as big data in road safety analysis.
The World Bank’s Road Safety Observatories (RSO) initiative also has the potential to become an
important source of government-generated big data in the future. The Observatories provide a for-
mal network of government representatives to share and exchange road safety data and experience
in order to improve road safety throughout the region. The World Bank established its first RSO in
Latin America (OISEVI), before introducing the initiative in Africa (ARSO) and Asia-Pacific (APRSO).
By enhancing road safety data and information systems, the Observatories play a pivotal role in help-
ing countries monitor, evaluate and develop more impactful road safety policies and interventions.40
In other cases, publicly available datasets with a global reach may be considered. A good example

39
   Australian BITRE (Bureau of Infrastructure and Transport Research Economics), “Australian Road Deaths Database (ARDD),”
Australian BITRE, updated May 13, 2021, https://data.gov.au/data/dataset/australian-road-deaths-database; ODPH (Open Data
Philippines), “Open Data Philippines,” ODPH, accessed June 3, 2021, https://data.gov.ph/; US NHTSA (United States National
Highway Traffic Safety Administration), “Data,” US NHTSA, accessed May 28, 2021, https://www.nhtsa.gov/data
40
   World Bank, “Better Data for Safer Roads: The Powerful Mission of Road Safety Observatories,” last modified November 5,
2020, https://www.worldbank.org/en/news/video/2020/11/05/better-data-for-safer-roads-the-powerful-mission-of-road-safety-
observatories


                                                            28
of this is OSM, which offers freely available geographic data generated by volunteers who trace satel-
lite images around the world to create and update the map consisting of road networks (detailing road
types, bridges, tunnels, direction of traffic flow), among other features. OSM data can be combined
with other datasets for road safety analysis. While OSM provides an overview of the road geometry,
the recency and accuracy of the data requires validation. Due to variability in quality and coverage,
OSM data would be considered a starting point and is not recommended for detailed assessments.
Private sector. Mobility datasets are generated through ride-hailing services, delivery services, so-
cial media, and other mobile applications that collect user location and movement. Companies in the
transportation and logistics sector use smartphone applications to digitize their operations and take
advantage of higher quality, real-time data to improve efficiency as well. Other companies provide
telematics software to track vehicle movement and safety features. Companies and start-ups invest-
ing in autonomous vehicle research are providing valuable sources of big data for road safety analy-
sis. Some companies also provide APIs that allow developers to access these datasets (often on a lim-
ited basis). However, proprietary or commercial data may have to be purchased in some instances, or
data partnerships need to be established to access such data. It is also crucial to understand how the
data is licensed and can be legally used for different types of analysis. For example, Google restricts
digitizing and tracing information as well as using applications to analyze and extract information
from street view images, although annotation and labelling is permitted.41
Data Partnership Agreements. World Bank task teams can apply for access to various datasets for
road safety analysis through the Development Data Partnership (DDP), which is a formal collabora-
tion of private sector companies and international organizations to use third-party data in research
and international development.42 It is accessible to all World Bank staff and partners. Upon submit-
ting a proposal through the DDP site and signing a licensing agreement, companies provide datasets
relevant to road safety, such as human movement (Orbital Insight, Unacast, and Veraset), traffic
speed (Mapbox and Waze), social media (Twitter), and weather (tomorrow.io). In addition, the site
shares guidance on accessing the datasets and contains a searchable inventory of Development Part-
ner projects. DDP provides a seamless, efficient, and secure manner for World Bank teams to access
data from a broad range of data providers across various regions of interest. It includes templates of
data license agreements, access to multi-disciplinary teams for end-to-end support and a centralized
IT architecture and processes for ingesting, storing, and pre-processing data, as well as for coding
collaboration. Task teams can also benefit from extensive, up-to-date documentation that provides
guidelines, code snippets and examples from data partners’ products and services to facilitate their
project.43 DDP datasets are primarily intended for experimental purposes. If proven successful, gov-
ernments may consider implementing a five-year agreement directly with the company to continue
to use the data for road safety analysis. It is also possible to benefit from the platform by becoming a
World Bank Data Fellow.
Waze for Cities is one example of a data sharing agreement that can be leveraged using the DDP
platform. The program allows cities to utilize data standards designed by Waze for closure and inci-
dent reporting to reduce data fragmentation and promote transport and government data aggrega-
tion. It now has more than 500 global partners including city, state and country government agen-
cies, nonprofits and first responders. Moovit, an app focused on public transport, offers Mobility as a


41
   Google, “Google Maps, Google Earth, and Street View,” accessed May 14, 2021, https://about.google/brand-resource-center/
products-and-services/geo-guidelines/
42
   Development Data Partnership, https://datapartnership.org/
43
   Development Data Partnership Documentation, https://docs.datapartnership.org/pages/documentation.html


                                                            29
Service (MaaS) solutions for cities, providing personalized apps, payment solutions, real-time transit
information, and other analytics.
In many cases, data providers help local governments by exchanging data. For example, the city of
Tokyo in Japan has partnered with a private firm to develop a smartphone compatible app, Zenryoku
Annai!. The app analyzes nearly 360 million observations every second to generate real-time infor-
mation on the shortest and least-congested travel routes. A similar intelligent transport system (ITS)
in Denmark, Copenhagen Connecting, was implemented to promote transport sustainability through
real-time digital traffic control and weather adaptation options. World Bank task teams should con-
sider seeking the support of local governments to establish data partnership agreements, particular-
ly if the provider is not already a part of the DDP.
Data marketplaces. Business leaders are keen to explore the value of the big data they collect as a
tradable commodity. This has given rise to data marketplaces which are essentially online platforms
dedicated to the buying and selling of data. These marketplaces can provide a more cost-effective
source of data compared to other data mining techniques. Dedicated marketplaces for traffic and
transport data have also emerged in recent years, although their coverage of LMICs tends to be low.
As part of its efforts to establish an artificial intelligence tool for road safety analysis (called Ai-
RAP), iRAP is seeking to establish a data marketplace where public and private data providers can
trade data for road safety analysis. The data marketplace will focus on three types of data products,
according to Monica Olyslagers (Safe Cities and Innovation Specialist at iRAP), who was interviewed
for this Guidance Note.44 The first is raw datasets that need to be processed to extract relevant in-
formation. The second is datasets that have been at least partially cleaned up and processed by data

44
     Monica Olyslagers (Safe Cities and Innovation Specialist, iRAP), in discussion with the authors, April 2021.


                                                                 30
providers or Ai-RAP and are ready to be plugged into road safety assessments. The third is pre-
pared-for-purpose datasets that are specifically commissioned for road safety assessments in differ-
ent types of projects. This data marketplace model is currently being piloted in Africa, as part of a
project to set up a regional road safety observatory there in collaboration with the World Bank.
The new data marketplace will initially focus on aggregating and trading conventional datasets.
However, the project team plans to bring on big data providers and incorporate ML in the Ai-RAP tool
to allow for more sophisticated analysis in road safety assessment procedures. Borrowers and TTLs
are advised to search data marketplaces as a lesser-cost alternative to commissioning data collection
for their projects.

Key Considerations for Selecting the “Right” Big Data Source

This section provides an overview on how different big data sources can be used. The data sources
covered in the tables for each method or assessment type should be viewed as guides, rather than
concrete, all-inclusive lists. The most appropriate choice of data sources should eventually be deter-
mined by considering the costs and benefits of each source. A list of factors that may be useful to
consider for this purpose are discussed toward the end of this section. It is also worth noting that
while big data may not be a feasible alternative to conventional data for every project or assessment
(if only at present), it can still complement and supplement current approaches or be used to validate
their outcomes and analyses.45
As discussed in Part 1, assessing OPTRSR is a procedure that must be conducted at the project
Identification stage to inform design and other assessments at the Preparation stage. Table 7 pro-
vides an overview of potential big data sources for the road safety assessment procedures discussed
in Part 1 (Methods I-VII). Tables 8 to 10 discuss data sources that could be useful for each of the three
primary methods for estimating OPTRSR, based on their respective data requirements.




45
  Holly Krambeck, Magreth Kakoko, and Mireille Raad, Using Computer Vision to Automatically Detect Road Features for Road
Safety Audits and Assessments: Inception Report (Washington, DC: World Bank, 2019).


                                                           31
TABLE 7:   Overview of potential big data sources for Methods I-VII
TYPE OF DATA REQUIRED        WHICH METHODS                    POTENTIAL BIG DATA SOURCE                               EXAMPLES
                             IT’S USED FOR

Crash data from 3–5 years    Methods I, V and VI              Government                                              Government portal or contact
                                                              Mobile applications and telematics                      Waze
                                                              Crowdsourced                                            Waze
Operating speeds             Methods II to IV                 Mobile applications and telematics                      Mapbox, Waze
Road features (road          Methods III, V, VI, and VII      Street view imagery                                     Mapillary
markings, signs, traffic                                      Crowdsourced                                            OSM
calming measures, etc.)
                                                              Aerial and satellite imagery                            Maxar, UAV
Road type (urban road,       Methods III, V, VI, and VII      Street view imagery                                     Mapillary
pedestrian area, etc.)                                        Crowdsourced                                            OSM
                                                              Aerial and satellite imagery                            Maxar, UAV
                                                              Mobile applications                                     Orbital Insight
Vehicle fleet mean speed     Methods III to VII               Mobile applications and telematics                      Mapbox, Waze
Traffic flow                 Methods IV to VII                Traffic imagery                                         Mapillary
                                                              Aerial and satellite imagery                            Maxar, UAV
                                                              Mobile applications and telematics                      Mapbox, Waze
                                                                                                                  SOURCE: Original table for this publication.



For crash data-based risk assessments (Method I), at least three years of historical crash data is required
to cover three assessment criteria: infrastructure, road users, and speeds. Government data can be sup-
plemented with data from mobile applications and telematics software, which may also have crowdsourc-
ing capabilities, such as Waze. However, it may be a challenge to access three or more years of historical
mobile or crowdsourced data. Table 8 summarizes the different data sources that can be used, although it
does not include sources for two assessment criteria (vehicle standards and post-crash trauma care).

TABLE 8:   Method I — Crash data-based risk assessment
REQUIREMENTS        DATA SOURCE                   COMMENTS

Crash data          Government                    May be underreported; see Road Safety GPN
from 3–5            Mobile applications and       Companies providing mobile map apps or crash-related
years
                    telematics                    data within apps could be a resource for crash data
                    Crowdsourced                  Waze incident reports (minor or major crash)

                                                  Incident reports from delivery drivers

                                                  Social media text analysis, such as from Twitter
                                                                             SOURCE: Original table for this publication.



If crash data is not available, Method II uses iRAP Star Ratings on the existing road to evaluate
road infrastructure risk, and an assessment of the other criteria. Big data can be considered for
evaluating road features, traffic flows and users’ behaviour and complement iRAP Ratings. Table 9
highlights alternative big data sources that can be used to assess non-infrastructure risk. iRAP is also
exploring the use of big data such as geo-located crash data to produce iRAP Risk Maps of the his-
torical crashes per kilometer, and analyze road attributes, traffic flows, and speed data and map the
safety performance and Star Rating.46 Such a methodology would also require the use of ML, which
is discussed in the next section.


 Omdena, “Rating Road Safety Through Machine Learning to Prevent Road Accidents,” accessed May 28, 2021, https://
46

omdena.com/projects/ai-road-safety/


                                                                    32
TABLE 9:   Method II — iRAP Star Rating (alternative data sources using big data)
REQUIREMENTS                          DATA SOURCE                             COMMENTS

Road users (behavior)
Seat belt use for front passengers    Traffic imagery                         Road surveillance images have been used to
                                                                              monitor front-row passengers wearing seat belts;
                                                                              potential to apply this to images (or video).
Child restraint and rear seat         N/A                                     N/A
passenger seat belt use
Motorcycle helmet use                 Street view imagery                     Potential to identify helmet use among
                                                                              motorcyclists.
Operating speeds (km/h) during non-peak hours (not speed limits) for each road type
Traffic video                         Government data or collected by team    Video images can be used to calculate traffic
                                                                              flows and speeds.
Operating speeds                      Mobile applications and telematics      Often provided as average speed per road
                                                                              segment in varying temporal resolutions.
                                                                                               SOURCE: Original table for this publication.



Big data can also be used to evaluate road safety risk without crash or iRAP data (Method III). Road
infrastructure, operating speeds and other risks to road users may be estimated using various sourc-
es of big data (table 10). Combined with ML, clustering and other advanced analytical techniques,
these data sources can also be used to model high-risk crash sites to project crash risk probability,
frequency, and severity. This is discussed more in the following section.




                                                                33
TABLE 10:   Method III – Estimating road infrastructure risk without crash or iRAP data
REQUIREMENTS                                    DATA SOURCE                    COMMENTS

Road infrastructure
Extent of separation of pedestrians from        Street view imagery            Identify safe walking paths and crosswalks, traffic lights
traffic with provision of safe walking spaces                                  and signals
and crossing locations (if pedestrians are      Crowdsourced                   OSM footways, intersections
present or likely to be present post-project)
                                                Aerial and satellite imagery   Identify walking paths and crosswalks
Extent of roadside safety barriers (omit this   Street view imagery            Identify barriers
factor from consideration if the operating      Crowdsourced                   OSM (e.g., cable barriers or guard rails)
speed is <40 km/h)
                                                Aerial and satellite imagery   Depending on image resolution and type of barrier in
                                                                               the ROI
Extent of median separation (omit this          Street view imagery            Identify road medians
factor from consideration if the operating      Crowdsourced                   OSM (e.g., cable barriers)
speed is <60 km/h for a rural road and <40
km/h for an urban road)                         Aerial and satellite imagery   Depending on image resolution and type of median in
                                                                               the ROI
Extent of separate well-designed                Street view imagery            Identify motorcycle lanes
motorcycle lanes (if motorcycles are            Crowdsourced                   OSM (e.g., motorcycle lanes)
present on the road or roadside or likely to
be present post-project)                        Aerial and satellite imagery   Identify motorcycle lanes

Extent of separate off-road bicycle lane (if    Street view imagery            Identify bicycle lanes
bicycles are present on the road or roadside    Crowdsourced                   OSM (e.g., cycleways)
or likely to be present post-project)
                                                Aerial and satellite imagery   Identify bicycle lanes
Road users
Seat belt use for front passengers              Street view imagery            Road surveillance images have been used to monitor
                                                                               front-row passengers wearing seat belts; potential to
                                                                               apply to images or video
Child restraint and rear seat passenger seat    N/A                            N/A
belt use
Motorcycle helmet use                           Street view imagery            Potential to identify helmet use among motorcyclists
Operating speeds (km/h) during non-peak hours (not speed limits) for each road type
Operating speeds                                Mobile applications and        Often provided as average speed per road segment in
                                                telematics                     varying temporal resolutions
Road type (pedestrian area; urban area          Street view imagery            Identify pedestrian and non-pedestrian areas, open
without pedestrians; open road, not median                                     roads, and medians
separated; open road, median separated)         Crowdsourced                   OSM roadways, footways, cable barriers or guard rails
                                                Aerial and satellite imagery   Pedestrian area, area without pedestrians, medians
                                                Mobile applications            Foot traffic, such as from Orbital Insight
                                                                                                        SOURCE: Original table for this publication.




                                                                  34
Projects that require reporting RSSAT results (Method IV) in addition to the OPTRSR (such as
Type A projects, see Part 1) can turn to the big data sources highlighted in table 11 as an alternative
or complement to traditional sources. Where existing data may be scarce or of poor quality, these
sources may provide faster, more comprehensive and reliable data to estimate baseline risks.
Similar big data sources can be used for road infrastructure evaluations that involve Methods
V-VII. Speed limits may be provided by the government. Roadside attributes, intersections, and mid-
block attributes can be detected by ML algorithms applied to street view images.

TABLE 11:   Method IV – RSSAT
REQUIREMENTS                        DATA SOURCE                     COMMENTS

Crash data from 3–5 years           Government                      May be underreported; see Road Safety GPN
(annual fatalities, serious         Mobile application              Companies providing mobile map apps or traffic-related data
injury/fatality ratio; fatalities                                   within apps might be a resource for crash data
by vehicle occupant,
motorcyclist, bicyclist, or         Crowdsourced                    Waze incident reports (minor or major crash)
pedestrian)                                                         Incident reports from delivery drivers
                                                                    Social media text analysis, such as from Twitter
Vehicle fleet mean speed            Mobile applications and         Often provided as average speed per road segment in varying
                                    telematics                      temporal resolutions
Segment characteristics             Street view imagery             Number of lanes, lane width, paved shoulder width, terrain type,
(number of lanes per travel                                         median type, road marking and signs, pedestrian and bicycle
direction; lane width, paved                                        facilities, service road
shoulder width, terrain type,       Crowdsourced                    OSM
median type; road marking and
signs; pedestrian and bicycling     Aerial and satellite imagery    Number of lanes, lane width, paved shoulder width, terrain type,
facilities, service road)                                           median type, road marking, pedestrian and bicycle facilities,
                                                                    service road; road signs will be a limitation
Dominant roadside object            Street view imagery             Safety barriers, static roadside objects, minor hazards; in some
(safety barrier; minor hazards;                                     cases, cliff or steep drop may be possible
slope; trees, poles, and fixed      Crowdsourced                    OSM (e.g., barriers)
objects; cliff or steep drops)
                                    Aerial and satellite imagery    Elevation for slope and steep drops; in some cases, static
                                                                    roadside objects, minor hazards or safety barriers
Speed management or traffic         Street view imagery             Identify physical speed inhibitors
calming measures (percentage        Crowdsourced                    OSM traffic calming features by type
of road length)
                                    Aerial and satellite imagery    Identify physical speed inhibitors
Intersection characteristics        Street view imagery             Grade separated, roundabout, signalized junction, unsignalized
(grade separated, roundabout,                                       junction
signalized junction,                Aerial and satellite imagery    Grade separated, roundabout
unsignalized junction)
Pedestrian crossing (grade          Street view imagery             Grade separated, signalized crossing, marked crossing
separated, signalized crossing,     Crowdsourced                    OSM pedestrian crossing features by type
marked crossing)
                                    Aerial and satellite imagery    Grade separated, marked crossing
Traffic flow (motorized             Street view imagery             Static camera at a set location is preferrable
and non-motorized; both             Aerial and satellite imagery    Presents temporal limitations
directions, per day)
                                    Mobile applications and         Provides temporal granularity
                                    telematics
                                                                                                         SOURCE: Original table for this publication.



Big data sources may also be useful to monitor and evaluate indicators for the Results Framework.
Table 12 provides examples of a few big data sources that could be used for the indicators covered in
table 5.




                                                                   35
TABLE 12:   Example of big data sources for road safety indicators in the Results Framework
EXAMPLE OF INDICATOR                       TYPE OF DATA THAT CAN BE COLLECTED     EXAMPLE OF BIG DATA SOURCE

Reduction of road crashes                  Crash data                             Government, open source data, Waze
Speed reductions                           Traffic flows                          Video images, telematics, mobile applications
Increased use of helmet and seat belts     Number of helmet and seat belt users   Street images and security video
                                                                                                 SOURCE: Original table for this publication.



As a broader variety of big data sources become available, Borrowers and TTLs are advised to
carefully consider the trade-offs involved when collecting data from various sources. Here is a list
of factors to consider, as well as some guidance on how each of these can affect project outcomes
and constraints. This is not an exhaustive list. Some factors may be more relevant to some projects
than others, while additional considerations may be required for certain projects. In some cases,
data from existing sources may not be available and will need to be collected using cameras, sensors,
and/or other tools. The World Bank Data Lab provides resources to find, collect, manage, and gain
insights from data, including access to Lab Leads who can give project-specific advice.47
•	 It is worth noting that many of these factors are also interrelated. For example, the types and quan-
   tity of data required could impact costs of obtaining and processing it. Costs can also vary by region,
   as can the availability of resources to process and analyze the data. This list may be used in tandem
   with Annex 2, which provides an overview of the most relevant big data sources for road safety anal-
   ysis as well as their relative costs, data attributes and formats, and possible limitations.
•	 Type of road safety assessment or procedure. As discussed in Part 1, a broad range of tools and
   procedures are used for road safety assessments across World Bank projects. Each tool has its
   own specific data requirements. It is important to consider these before determining appropriate
   big data sources to complement analysis.
•	 Context/Region(s) of Interest. The types and variety of big data sources available can vary great-
   ly from region to region, country to country, or even different provinces or localities within the
   same country. For example, Waze crowdsourced crash data is especially useful for urban regions
   that are more densely populated compared to rural regions.
•	 Type of data required. As more big data sources become available for road and traffic data, the
   task team should carefully consider which variables and data types are most relevant to their
   model before selecting a source. For example, Google offers a number of APIs that may be useful
   for road safety analysis. This includes Google Maps, Google Traffic and Google Street View. It is
   important to consider the quantity, duration, and extensiveness of the data required. For exam-
   ple, some data sources include time-series information, others do not. Some may include specific
   road features or road user data, while others may just be focused on traffic flows.
•	 Data formats. Big data is collected, stored, and transmitted in a wide range of formats. It is
   important to consider the usability of available big data formats as well as their interoperability
   with other types of data. Since many big data sources that are currently available are not custom
   designed for road safety analysis, task teams should be prepared to have some expertise and re-
   sources to extract, aggregate, clean, and convert the data into a format that can be combined with
   other data and/or used with analytical tools and models.
•	 Cost. Given the size of big datasets, costs can arise from accessing, storing, handling, process-
   ing, and analyzing the data. The cost may be in the form of data licenses, software licenses or

47
     World Bank Data Lab, https://wbdatalab.org/


                                                               36
    equipment (if the data is being collected specifically for the project at hand). Besides the cost of
    obtaining the data, it is also important to consider the cost of using it, such as by acquiring the
    necessary expertise, software tools and processing power for analysis. Annex 2 discusses the
    relative costs associated with using different big data sources.
•	 Resources required to make data usable. In addition to relevant data sources and the costs that
   may be associated with accessing them, other resources could also be required to utilize the data
   in road safety assessment and analysis. This includes technical skills and expertise required to
   handle and analyze the data.
•	 Time constraints. Some big data sources are faster to access and obtain data from compared to
   others. For example, open data platforms allow you to run a search query and instantly obtain
   relevant datasets. Other avenues, such as data sharing agreements, may take longer to deliver
   the required data. It is important to consider the project timeframe to determine which data
   source may be more useful for road safety analysis at a given stage.
•	 Licensing constraints. Any official and legitimate data source is accompanied by licensing reg-
   ulations that outline the terms of use of the provided dataset. Big data sources are no exception.
   Different data sources have different licensing agreements associated with them. Some, such as
   open data platforms, may have minimal licensing restrictions. Others, such as APIs and datasets
   obtained through data partnership agreements, can have more restrictive terms of use. It is im-
   portant to carefully consider these limitations before choosing a source. TTLs are advised to con-
   sult the World Bank’s legal team or the data provider to fully understand licensing restrictions
   associated with different big data sources to avoid legal ramifications.

2.2 Machine Learning in Road Safety Analysis

ML is a branch of artificial intelligence. It involves creating algorithms that “learn” patterns, trends
and behaviors from data and improve accuracy over time without further programming. As figure 6
illustrates, the lifecycle of an ML model can be typically divided into two phases: training and deploy-
ment. In the training phase, training data is fed into the algorithm to obtain a trained model. In the
deployment phase, new input data is fed into the trained algorithm (or model) to predict the output.

FIGURE 6:   ML lifecycle




     Training data         Training the algorithm        Trained model                                   New input data




                                                          Prediction
                                                                         SOURCE: Modified from https://randomtrees.com/data-science




                                                    37
As shown in figure 7, ML algorithms can be divided into three categories: supervised learning,
unsupervised learning, and reinforcement learning. The specific tasks they are capable of and the
corresponding algorithms that are most widely used for this purpose are also listed in table 13. One
significant difference between these categories is the format and the source of the training data.

FIGURE 7:   Categories of ML and the tasks they can perform

Meaningful compression                                                                          Fraud detection           Image classification
                                          Structure discovery
                 DIMENSIONALITY                                                                                                              Customer
                                            Feature elicitation                                             CLASSIFICATION                   retention
                   REDUCTION

     Big data visualization             UNSUPERVISED                                 SUPERVISED                                 Diagnostics
      Recommendations                     LEARNING                                    LEARNING                            Weather forecasting

                                                                                                                                            Advertising
                    CLUSTERING                                                                                  REGRESSION                  popularity
                                                               MACHINE                                                                      predictions
                                                               LEARNING
          Customer               Targeted                                                        Estimating life           Market forecasting
        segmentation             marketing                                                        expectancy


                              Real-time decisions                                         Game
                                                            REINFORCEMENT
                                                               LEARNING
                                Robot navigation                                          Skill acquisition
                                 SOURCE: Modified from https://towardsdatascience.com/coding-deep-learning-for-beginners-types-of-machine-learning-b9e651e1ed9d



Supervised learning is a family of algorithms that learn from previous data to map an input (X) to an
output (Y). For example, a supervised learning algorithm can be used to predict the risk level or crash
frequency (Y) of a road segment given its characteristics (X). “Supervised” means the training data is
labelled (i.e., the training data should be pairs of X-Y, where Y is usually called labels).
Unsupervised learning algorithms find structures in a dataset in order to group or cluster data points
based on their similarity. As the name suggests, these algorithms do not require “supervision” or
human intervention in the training phase. This means that, unlike supervised learning, the training
data for unsupervised learning algorithms has no labels (Y). These algorithms learn to group X based
on similar characteristics. The most common unsupervised learning task is clustering. For example,
given the characteristics of a road segment, an unsupervised learning algorithm can classify it into
a group of similar segments. It does not need to understand the characteristics that the group rep-
resents to complete this task.
Reinforcement learning trains a software agent to make decisions that maximize rewards from
interactions with an external environment.48 As opposed to supervised learning and unsupervised
learning, which require training data to be prepared before training, reinforcement learning gener-
ates the training data during the training phase. The data is generated when the agent interacts with
the environment. For example, reinforcement learning can be used to train an agent to control traffic
lights based on traffic conditions.



48
     This agent is a piece of software that makes a decision based on the environment.


                                                                          38
TABLE 13:   Categories of ML and algorithms*
                              ALGORITHMS                    TASKS
                                                                                                        *The algorithms listed in this table are not exhaustive.
Supervised Learning           SVM, DT, RF, KNN, ANN         Classification                              SVM: support vector machine
                                                                                                        DT: decision trees
                                                            Regression                                  RF: random forest
Unsupervised Learning         K-means, PCA, ANN             Clustering                                  KNN: k-nearest neighbors
                                                                                                        ANN: artificial neural networks
                                                            Dimensionality Reduction                    PCA: principal component analysis
                                                                                                        DQN: deep Q-network, which includes and ANN in its
Reinforcement Learning        Q-Learning, DQN               Robotics/Decision-making
                                                                                                        algorithm
                                                         Source: Original table for this publication.



Artificial neural network (ANN) is a family of ML algorithms that have been inspired by the human
brain. ANN is the most versatile ML algorithm – it can be used for supervised learning, unsuper-
vised learning, and also reinforcement learning. As shown in figure 8, ANN structures the data and
the computation in different layers. Every layer adds more depth to the algorithm; therefore, more
layers indicate that it is “deeper”. Such ANNs are called deep neural networks or deep ANN or DNN.
ML algorithms that use deep ANN are called deep learning (DL) algorithms. Therefore, from another
perspective, ML algorithms can be divided into conventional ML and DL (table 14).

FIGURE 8:   ANN structure
Input 1




Input 2                                                                                            Output 1




Input 3



              INPUT LAYER                     HIDDEN LAYER                   OUTPUT LAYER
                                                                      SOURCE: Original figure for this publication.



TABLE 14:   ML and DL algorithms
                                    CONVENTIONAL ML*                           DL

Supervised Learning                 SVM, DT, RF, KNN, shallow ANN              Deep ANN
Unsupervised Learning               K-means, PCA                               Deep ANN
Reinforcement Learning (RL)         RL without deep ANN                        RL with deep ANN
*The conventional ML algorithms listed in this table are not exhaustive.
                                                            SOURCE: Original table for this publication.



Most ML algorithms are conventional ML, such as conventional supervised learning algorithms
like support vector machine (SVM), which can be used for classification or regression, for exam-
ple, classifying the risk level of a road segment based on its characteristics. Conventional unsu-
pervised learning algorithms, such as K-means clustering, automatically identify spatial patterns in
datasets, which can be applied to locate clusters or areas with recurring road crashes. Conventional
ML works well for small, low dimensional datasets. Meanwhile, DL is a subset of ML that learns the
complex patterns from high dimensional (e.g., an image) and large quantities of data (e.g., big data).
Supervised, unsupervised, and reinforcement learning algorithms that use deep ANN technique be-


                                                                             39
long to the DL category. DL’s first successful application is in the computer vision area. For example,
image classification is a supervised learning task that utilizes deep neural networks to classify imag-
es into different classes (e.g., cars, pedestrians, etc.).

How to Use Machine Learning

The use of ML methods in road safety analyses is being widely explored.49 As ML methods become
more advanced, economical, and accessible, their potential applications in various disciplines contin-
ue to grow and become more feasible. In road safety analyses, ML has great potential to overcome
the limitations of traditional statistical models in crash analysis and crash probability modeling. The
applications of ML in road safety analyses are discussed under three categories: conventional ML,
DL, and reinforcement learning, as listed in table 15. It should be noted that some reinforcement
learning algorithms using deep ANN belong to DL, but all reinforcement techniques are discussed
separately.

TABLE 15:   Frequently used ML techniques for road safety analysis*
ML CATEGORIES                 SUBCATEGORIES         ALGORITHMS          TASKS                       EXAMPLES

Conventional ML               Supervised            SVM                 Classification              Predict risk level based on road
                              Learning              DT                                              characteristics.
                                                    RF                  Regression                  Crash frequency prediction based on road
                                                    KNN                                             characteristics.
                                                    shallow ANN
                              Unsupervised          K-means             Clustering                  Group road segments by characteristics
                              Learning                                                              similarity; group drivers based on their
                                                                                                    driving behaviors.
                                                    PCA                 Dimensionality Reduction    Identify critical factors of road safety.

DL                            Supervised            CNN                 Image Classification/       Detect road features from images.
                              Learning                                  Object Detection/
                                                                        Segmentation
                              Unsupervised          GAN                 Clustering/Dimensionality   Find the hidden features related to road
                              Learning                                  Reduction                   safety from map and satellite images of
                                                                                                    the road environments.
Reinforcement Learning        N/A                   Q-Learning          Robotics/Decision-making    Control traffic lights based on traffic
                                                    DQN                                             conditions.
*The algorithms and examples listed in this table are not exhaustive.
CNN: convolutional neural network, a type of deep ANN
GAN: generative adversarial networks, a type of deep ANN
                                                                                                               SOURCE: Original table for this publication.



A growing body of research explores various ML techniques to predict the probability of road
crashes and assess their severity by training on historical datasets that encompass diverse fac-
tors. Conventional ML algorithms are the most frequently used ML algorithms for this purpose.
They are summarized in table 15. ML-based approaches to road safety analysis can be used to com-
plement, supplement or even potentially substitute conventional road safety assessments.
Conventional supervised learning algorithms learn functions that take vectors of variables as in-
put to predict the output. Most conventional supervised learning algorithms that are frequently
used in data science have been used in road safety analyses, including but not limited to: decision


49
   Philippe Barbosa Silva, Michelle Andrade, and Sara Ferreira, “Machine Learning Applied to Road Safety Modeling: A
Systematic Literature Review,” Journal of Traffic and Transportation Engineering (English Edition), 7, no. 6, (2020),
https://www.sciencedirect.com/science/article/pii/S2095756420301410


                                                                           40
trees (DT), random forest (RF), support vector machine (SVM), k-nearest neighbors (KNN), and artifi-
cial neural networks (ANN).50 It should be noted that there is no “best” algorithm. Determining which
algorithm may be most appropriate for an ML-based road safety analysis is essentially a data science
problem for which there are usually no set rules. One algorithm may perform well for a dataset, but
badly for another. It is common practice for data scientists to try different algorithms in order to
find a suitable one for a specific problem. When using the aforementioned conventional supervised
learning algorithms for road safety assessments, the problem is often framed as a classification or
regression problem, in which the output (Y) of the ML algorithm is either a class (e.g., risk level or
severity: low, moderate, substantial or high) or a scalar (e.g., crash probability, crash frequency) and
the input (X) to the ML algorithm could be any parameter (including but not limited to weather, time,
road factors, human factors, etc.) that is related to the output. For example, one way to calculate OP-
TRSR is to frame it as a classification problem, in which the output of the model is the OPTRSR risk
level, while the input is a vector of variables describing road features and typical vehicle operating
speeds, or other factors that could be used for evaluating the OPTRSR risk level. Any aforementioned
conventional supervised learning algorithm would be suitable for this example.
Conventional unsupervised learning algorithms are mainly used for clustering and dimensional-
ity reduction purposes. In road safety analyses, K-means can be used for grouping tasks that help
find clustering patterns in the data. For example, it can be used to group road segments by similar
characteristics or group drivers based on their driving behaviors, so that dangerous road segments
or drivers can be identified based on the similarity. In another example of unsupervised learning ap-
plication, principal component analysis is used for reducing the dimensions of input data to identify
the most critical factors that affect road safety.
DL has been applied in various disciplines and achieved impressive performance. DL technologies
have progressed significantly over the past few years, especially in image analysis and computer
vision, the method’s first successful application. The core technique in this domain is deep convo-
lutional neural network (CNN), which is the state-of-the-art approach for object detection, semantic
segmentation, and instance segmentation of images. Object detection is a task in which, given an
image, the model outputs a bounding box of detected objects (figure 9). Semantic segmentation is a
task in which, given an image, the model classifies every pixel into predefined classes (e.g., road lane,
traffic light, etc.). Instance segmentation is a task, in which, given an image, the model groups pixels
belonging to an instance of the object.




50
     Silva, Andrade, and Ferreira, “Machine Learning Applied to Road Safety Modeling: A Systematic Literature Review.”


                                                               41
FIGURE 9:   ML algorithms and street view
After applying an object detection algorithm to a street view image, a bounding box surrounds each predicted object, which also
contains a confidence level for each prediction.



             Logo 90%                                                                   Window 72%
                                                  Buildings 85%

                         Merchandise 77%                                                              Commerical sign 85%


                                                                                         Window 75%

                                                                                                                                       Commerical sign 45%

                                                                                                            Street sign 69%

                                                                                                                                   Door 96%



                                           Person 72%
                                                   Car 69%        Car 98%   Truck 92%
                                                                                                                                                                                        Person 81%
                                                                                                                                                  Person 78%

                                                                                                                    Person 96%                               Merchandise 83%
                                                                                                                                                                          Merchandise 83%
                         Merchandise 71%




                                                                                                                                                                     BOGOTÁ, COLOMBIA.

                                                                                                                              SOURCE: World Bank Global Program for Resilient Housing.



DL-based image analysis has been successfully used in various industries for applications ranging
from facial recognition to autonomous driving. It has great potential to be used in road safety
analysis to automatically analyze images and infer road attributes that are relevant to road safety
assessments. Large sets of images with annotations such as road lanes, traffic lights, speed limit
signs, and pedestrians can be compiled for training deep CNNs so that they learn to recognize these
objects through images that the models have not previously encountered. If successful, this approach
should equip the model to detect road attributes at a regional scale.
The detected information can then be used for safety and risk analysis. For example, if the DL mod-
el can infer the road segment characteristics (e.g., number of lanes, terrain type, road markings and
signs, and pedestrian, bicycling, and motorcycling facilities), the inferred information can readily be
used as input for the RSSAT tool (Method IV). This would allow the process of detection and analysis
to become fully, or at least significantly automated and scalable at a low cost.
DL can also provide a lower-risk alternative to manual detection of certain road attributes and
other important variables in road safety analysis. For example, a team used imagery from Baidu
Street View to provide a practical, automated alternative to the manual detection of street cracks,
which can be labor-intensive, hazardous and difficult to conduct on a large scale.The authors use the
Deeplabv3+ network model, a DL neural network, to develop an automated road crack identification
system and demonstrate its practicality as a method to generate faster, more accurate and efficient
information about road cracks at lower cost compared to manual detection.51



51
  Min Zhang et al., “Research on Baidu Street View Road Crack Information Extraction Based on Deep Learning Method,”
Journal of Physics: Conference Series, no. 1616 (2020). https://iopscience.iop.org/article/10.1088/1742-6596/1616/1/012086/pdf


                                                                                        42
Reinforcement learning is widely used to design intelligent control and decision-making systems.
In road safety and traffic management, reinforcement learning is most commonly employed to devel-
op intelligent signal control algorithms. A typical reinforcement learning-based traffic light system
makes divisions based on specific input traffic parameters, such as the length of time for which vehi-
cles wait at the intersection, the cumulative delay caused by waiting at the intersection, the length of
time for which the light stays green for each signal head, etc. The output of the system would be the
next color of the light and length of time for which it should remain switched on. Designing traffic
systems using reinforcement learning helps save time and improve safety standards.

Key Considerations for Using Machine Learning

Road safety can be evaluated explicitly using rule-based reasoning systems, such as iRAP star score
and RSSAT. However, developing such systems can be complex if there are many input variables.
Compared with rule-based evaluation systems, ML algorithms are data-driven and don’t require devel-
oping rules; therefore, they are relatively inexpensive to implement. ML algorithms are more suitable
for high dimensional inputs. As a broader spectrum of ML algorithms become available, TTLs are
advised to carefully consider the trade-offs involved when applying them to road safety analysis. This
section discusses various factors that task teams must consider before deciding to use an ML algo-
rithm for road safety analysis in their project. Again, this is not an exhaustive list. Some factors may
be more relevant to some projects than others, while additional considerations may be required for
certain projects. It is worth noting that many of these factors are also interrelated. For example, the
feasibility of using ML for a project can be affected by time and budget constraints, the availability of
data, and the anticipated resource intensiveness of the data preparation process. Table 16 provides a
SWOT analysis of the use of ML in road safety analysis.

                                                   43
TABLE 16:   SWOT analysis of using ML in road safety analysis
STRENGTHS                                                          WEAKNESSES

•	 Offers tools and techniques to process big data that may be     •	 Algorithms can be limited in their applicability; models may not
   more precise compared to traditional methods.                      perform well on data that is different from the training data’s
•	 Especially effective for feature learning, parameter               distribution.
   optimization, and processing large amounts of big data.         •	 Large amounts of data are needed to train the models and yield
•	 ML algorithms tend to perform better than traditional              more accurate models, which may be difficult in data-scarce
   statistical techniques in cases where high-dimensional and         contexts.
   high-nonlinear data is involved.                                •	 Some ML algorithms (e.g., ANN) works like a black box, and can
•	 As the technology develops, novel techniques create new            be hard to interpret, therefore an ML algorithm usually requires
   opportunities to understand complex relationships between          thorough validation and test processes before it can be deployed in
   multiple, interrelated variables and predict outcomes with         the real environment and assist decision-making.
   greater accuracy.                                               •	 The technology still needs further development before it can be
•	 ML algorithms can be improved continuously as more data            mainstreamed for use in road safety assessments.
   is generated or made available for training.

OPPORTUNITIES                                                      THREATS/CHALLENGES

•	 May eliminate the need for manual coding of road safety         •	 Requires specialist expertise, tools, and knowledge which may
   data in the future, making the process less labor-intensive        make its usefulness limited in some contexts, especially in
   and time consuming.                                                developing countries.
•	 Possible to train datasets in one location or for one purpose   •	 May require additional investment in computer power and analytical
   and use them for another.                                          software.
•	 Provides a powerful method for complex crash risk               •	 Complexity of ML algorithms can make them difficult to implement
   modelling and other types of predictive analytics in road          and analyze.
   safety.                                                         •	 Ethical considerations, such as bias in ML systems.
•	 As the technology develops, a platform powered by ML            •	 As a data-driven approach, ML relies on high-quality data for
   could be used across geographies for road assessments.             training. Significant bias in the training data could lead to the
•	 As more and more data is generated and collected everyday,         failure of model training. Quality control of training data could be
   this could be potentially analyzed with ML algorithms to           difficult, especially when annotating the data requires professional
   discover new patterns and insights.                                knowledge.
                                                                                                        SOURCE: Original table for this publication.



Feasibility with project objectives and client requirements. Before deciding to use ML for any proj-
ect, it must be ascertained if ML is suitable for the project. Some ML algorithms, such as neural net-
works, are not interpretable. They work like a black box. Clients may not have confidence in using
them for significant decision-making, unless their predictions can be sufficiently validated.
Preparing data to train ML algorithms. ML is a data-driven approach. Therefore, as with any da-
ta-related project, it is important to plan the data collection and preparation process. To facilitate this
process, make sure to have clearly defined the inputs and outputs of the model at the outset of the
project. Section 2.1 provides guidance on how to select data sources, especially where big data may
be involved. It is common that, during the training stage, an ML team may find the data is not enough
to train a model with satisfactory performance. In this case, more data needs to be collected. In terms
of data preparation, teams should be aware of the need to aggregate, clean and annotate data before
it can be used for ML modelling. Annotation of data is especially necessary for supervised learning
algorithms and entails manually identifying an object drawing a box or polygon around it and giving
it a label such as “pothole” or “crosswalk” (figure 10).




                                                                   44
FIGURE 10:   Labeling a crosswalk in Padang, Indonesia using the Computer Vision Annotation Tool (CVAT)




                                                                                    SOURCE: World Bank Global Program for Resilient Housing.



Teams are advised to incorporate a quality control process to ensure data being used for any ML
model, especially test data, is of good quality and truly valid and representative of the population
or situation under study. For an ML-based project, steps include: (i) identifying data required for the
model; (ii) data collection, cleaning, annotation; (iii) trial and error training; (iv) validation; (v) deploy-
ment. Task teams should estimate the duration of these tasks, considering their expected complexity
and potential challenges (which can vary by context and availability of resources such as expertise
and processing power). This will help them determine if ML is feasible for their project, how it com-
pares to traditional methods and how incorporating ML can impact project timelines. It is worth not-
ing that once deployed in the production environment, ML provides significant acceleration for the
whole process, for example, DL-based image analysis can exponentially save the time for collecting
data to be used in the road risk estimation.
A challenge for most ML algorithms is generalization, or how well a model can perform based on
test data (also called unseen data). Models may not perform well on unseen data that is different
from the training data’s distribution. For example, a model that is trained on images collected on
rural roads in an arid climate may not achieve the same level of performance on images in urban
roads in another country. The transferability of the model depends on how similar the features in
the images are. Therefore, before training ML algorithms, it is prudent to consider the diversity of
the training data, especially in terms of where, how and when it was collected. It is worth noting that
some researchers have found that artificial intelligence and ML algorithms can be easily and accu-
rately applied to different types of urban networks within the same city.52
To determine if using ML fits a budget or can even deliver a cost-advantage, it is important to un-
derstand associated costs. Costs of using ML can arise from the hiring of experts to develop and pro-
gram models, as well as from the data collection and preparation process (which includes cleaning


52
   Apostolos Ziakopoulos and George Yannis, “Using AI for Spatial Predictions of Driver Behavior” (presentation, ITF
International Transport Forum Roundtable on Artificial Intelligence in Road Traffic Crash Prevention, 2021).
https://www.nrso.ntua.gr/geyannis/conf/cp450-using-ai-for-spatial-predictions-of-driver-behavior/


                                                              45
and annotation). The cost of storing data (on local hardware or on the cloud) should also be accounted
for, especially if the inputs involve big data. Depending on the model and quantity of data being input,
and especially if a DL model is employed, you may also need to invest in additional computational
resources (graphics processing unit-equipped local computers or nodes on the cloud). Front-end and
back-end systems may also need to be established for automatic analysis services.
Deploying ML algorithms requires specialized expertise, often in the form of dedicated team
members that are ML experts. TTLs can choose to hire experts and manage the process internally
or acquire resources externally. An in-house, “do-it-yourself” approach ensures more control over
every aspect of the process, which may be especially important where significant customization or
trial and error may be required. However, this approach requires labor and time, and may be more
costly in the long run. Using an external resource or tool, on the other hand, may be a faster option
but can come at the expense of some visibility and control over the development of the model. It is
important to consider these trade-offs to ensure the team is adequately resourced to use ML effec-
tively in the project.

2.3 Big Data, Machine Learning and the Future of Road Safety Assessments

Artificial intelligence presents many exciting possibilities for automation and analysis in trans-
port and infrastructure development. ML is increasingly used for road safety analysis. ML’s inher-
ent capability of managing uncertainties in data and models makes it extremely suitable for solving
road safety related issues. Uncertainty is a defining element of crash risk modelling and, in fact, a
source of complexity that has thus far limited the usefulness of traditional statistical models. More-
over, ML algorithms such as deep ANN can capture nonlinear patterns in data, making them the first
choice for processing road safety big data. Table 17 provides a summary of possible applications of big
data and ML in road safety analysis given the current state of the technologies.


                                                  46
TABLE 17:   Potential applications of big data and ML in Methods I to VII
POTENTIAL APPLICATIONS                HOW BIG DATA CAN HELP                      HOW ML CAN HELP

Estimating Road Infrastructure Risk   Video and photo images, APIs, satellite    •	 Process images to evaluate road attributes
(Methods III, V, VI, and VII)         imagery and/or crowdsourced images         •	 Identify road features that could cause crashes
                                                                                 •	 Identify risk factors contributing to crash
                                                                                    occurrence
                                                                                 •	 Identify safety conditions in infrastructure
Traffic Flows                         APIs, aerial imagery, open-source          •	 Process images to classify vehicles, identify
(Methods IV to VII)                   traffic data, road sensor data, wireless      congestion hotspots, vehicle detection, or speeds
                                      technology, street cameras, GPS data,      •	 Assess traffic flows
                                      mobile devices, real-time traffic data     •	 Develop risk maps
                                                                                 •	 Map the safety performance and Star Rating
                                                                                 •	 Traffic flows prediction
Crash Risk Assessment                 Meteorology data, geo-located              •	 Create crash prediction models
(Methods III to VII)                  crash data, video and photo images,        •	 Develop risk maps
                                      APIs, open-source traffic data, road       •	 Analyze different conflict scenarios and high-risk
                                      sensor data, historical crash data,           behavior
                                      crowdsourced crash data (e.g., Waze)
Incident Reporting/Crash Data         Video recording, crash data, photo         •	 Identify hotspots through clustering techniques
(Methods I, V and VI)                 images, crowdsourced data (Google
                                      Maps, Waze)
Analyzing Crash Severity              Video and photo images, sensor data        •	 Process images to evaluate road attributes
(Methods III to VII)                                                             •	 Develop crash prediction models
                                                                                                     SOURCE: Original table for this publication.



Combining big data and ML can provide an integrated framework for automatic road safety analy-
sis and management. This framework, demonstrated in figure 11, employs platforms (such as Mapil-
lary) to provide geo-tagged street level imagery for inputs to the DL model to infer useful information
(e.g., road characteristics). The DL-inferred data is then combined with multi-source big datasets
(e.g., region-specific historical crash data) for better analysis and management of road safety. For ex-
ample, the combined information can readily be used as the input to Method I-VII for estimating the
OPTRSR. Moreover, ML algorithms (e.g., ANN) have the potential to substitute traditional methods
and tools (iRAP, RSSAT, etc.) for evaluating risks and safety indicators like OPTRSR.




                                                                47
FIGURE 11:   Framework for automatic road safety analysis and management powered by ML
Geo-tagged street level images                           (Big) data sources                     Complementary
                                                                                                information
                                                                                                Road curvature
                                                                                                Historical crash
                                                         APIs                                   Baseline fatalities
                                                         In-house data                          …
                                                         Third-party data
                                                         …


                                                                                                 Methods/tools
                                                                                                 iRAP Star Rating Score
                                                                                                 RSSAT
                                                                                                 RSA
Deep learning model        Image analysis                    DL inferred information             RSIA
                                                             Lanes                               SSA
                                                             Shoulder                            ML models
                                                             Street lighting                     …
                                                             Pedestrians crossing
                                                             …



                                                                                    SOURCE: Original figure for this publication.



At present, much of the research and innovation in the use of ML for advanced road safety and risk
modelling is being driven by universities and other research institutions. As other stakeholders,
such as governments, developers of road safety tools and international organizations such as the
World Bank look to apply ML in their projects, there is an opportunity to create dedicated tools that
would harness big data and ML for road safety analysis. Such applications have the potential to re-
duce the risk of human error and allow road safety assessments to be mostly, if not fully, automated.
The following section presents practical examples of how big data and ML can assess urban road
safety. It applies an integrated framework introduced in section 2.3 to explore the opportunities
and limitations of new data sources and assess the ML models. To evaluate the robustness of the
proposed framework, the Integrated Framework for Road Risk Prediction was applied in two cities
of different sizes, regions, and data availability were chosen: Bogotá, Colombia, a rapidly urbanizing
metropolis in Latin America, and Padang, Indonesia, a secondary city in East Asia. The study found
that ML applied to street view imagery identified relevant road (and road user) characteristics to gen-
erate a model that predicts road risk with 72.5 percent accuracy in Bogotá. This framework was ap-
plied in Padang to test its replicability; preliminary results are encouraging for its potential to predict
road safety for areas with limited crash data. The section concludes with a reflection and guidance
for replicability.




                                                    48
PART 3
Case Studies: Applying Big Data and Machine
Learning to Assess Road Safety

3.1 Objectives of the Case Studies

This section presents how the Integrated Framework for Road Risk Prediction can be applied in two
different cities of interest: Bogotá, Colombia and Padang, Indonesia. The study examines how useful
ML is in evaluating road safety and how easily the integrated framework can be replicated. All code
is freely available for other teams to use and develop further.53
The objectives of the case studies are to:
1.	 Learn how well big data and ML can be used to identify road features, estimate road safety, cate-
    gorize road segments based on their risk level, and identify high-risk segments.
2.	 Evaluate the utility of several big data sources that are freely available for road safety analysis in
    diverse geographic areas.54
3.	 Assess the replicability of the proposed approach.
Located on two different continents, the selected locations offer an opportunity to apply the frame-
work on paved, urban roads in contrasting environments, particularly related to data availability
and usability. For example, the government of Bogotá has made significant efforts to increase crash
data collection and dissemination. The government offers an online portal with the location of each
crash over the past year publicly available. In addition, there was high coverage for data derived
from mobile phones, such as crowd-reported crashes. In contrast, information on the crash locations
for Padang could not be found online, and methods for data collection are largely manual or paper
based.55 In addition, mobile application data was scarce for crowdsourced crash reports. As a result,
Padang offers the opportunity to explore the utility of ML when data coverage is limited.




53
   The code for the Integrated Framework for Road Risk Prediction is open source and accessible on GitHub:
https://github.com/datapartnership/IntegratedFrameworkForRoadSafety. However, some datasets require partnership with
DDP to access.
54
   Freely available meaning at no cost; however, some data sources are not publicly available and require a license.
55
   World Bank, Indonesia Public Expenditure Review 2020: Spending for Better Results (Washington, DC: World Bank, 2020).
https://openknowledge.worldbank.org/handle/10986/33954


                                                            49
     BOGOTÁ AND PADANG: BACKGROUND AND CONTEXT
     With a population of more than 7 million, the capi-
     tal district of Bogotá is Colombia’s largest city. As a
     critical economic hub with a growing population, Bo-
     gotá stands out as one of the most congested cities
     in the world.56 The government has prioritized road
     safety and achieved significant gains over the past
     few decades, reducing the city’s traffic fatality rate by
     more than 60 percent between 1996 and 2006 alone.57
     More recent interventions during the UN Decade for
     Action for Road Safety include establishing a Nation-
     al Road Safety Plan and a National Road Safety Agency (Agencia Nacional de Seguridad Vial) fea-
     turing a National Road Safety Observatory in collaboration with the World Bank.58 In addition, in
     2017, the city’s government launched “Vision Zero,” which aimed to implement a range of speed
     management strategies to eliminate pedestrian and driver fatalities. The program has delivered
     measurable results, such as a 27 percent reduction in fatalities across corridors where speed limits
     have been introduced, and further interventions are planned to sustain its impact.59 Despite these
     initiatives and road safety improvements in Bogotá, challenges remain, and new policies would
     benefit from timely and affordable analytics on road safety.
     Padang is the capital of the Indonesian province
     of Western Sumatra with a population of around 1
     million. The government of Indonesia introduced
     various initiatives to address road safety during the
     UN Decade of Action for Road Safety. Established in
     2011, the National Road Safety Master Plan achieved
     a 10 percent reduction in annual road fatalities be-
     tween 2013 and 2016. However, data collection and
     management systems that rely on manual screen-
     ing significantly challenge the country’s progress in
     road performance and safety.60 Initiatives such as the establishment of the Integrated Road Asset
     Management System and the World Bank’s new Asia-Pacific Road Safety Observatory present a
     valuable opportunity for the country to improve its road safety data systems.61 For this case study
     in Padang, crash data was scarce from alternative sources. Therefore, it offers the opportunity to
     explore the utility of the pre-trained ML models in a new region with limited data coverage.


56
   INRIX 2018 Global Traffic Scorecard. In 2018, drivers lost 272 hours in road congestion.
57
   ODI (Overseas Development Institute), “Bogotá,” ODI: Think Change. Accessed October 12, 2021, from
https://odi.org/en/about/features/bogot%C3%A1/
58
   World Bank, Colombia - Programmatic Productive and Sustainable Cities Development Policy Loans (Washington, DC: World
Bank, 2020). http://documents.worldbank.org/curated/en/426591583968971309/Colombia-Programmatic-Productive-and-
Sustainable-Cities-Development-Policy-Loans
59
   Darío Hidalgo and Claudia Adriazola-Steil, “Bogotá’s Vision Zero Road Safety Plan Is Saving Lives,” TheCityFix, last modified
September 26, 2019, https://thecityfix.com/blog/bogotas-vision-zero-road-safety-plan-saving-lives-dario-hidalgo-claudia-
adriazola-steil/
60
   World Bank, Indonesia Public Expenditure Review 2020: Spending for Better Results.
61
   DT Global, “Indonesia: Establishment of Integrated Road Asset Management Systems,” accessed October 4, 2021,
https://dt-global.com/projects/irams-dc


                                                              50
3.2 Methodology

The ML-based framework implemented in these case studies was developed to provide a quick screen
to evaluate road safety. The framework ascertains road characteristics traditionally collected or an-
notated to provide a road safety prediction. ML models were developed specifically for this frame-
work during these case studies, one to extract road characteristics from street view images and one
to determine road risk based on the derived road characteristics. To do so, first, the models needed
to be trained to extract road characteristics and determine the road risk based on crash data. Then
the models could be applied to make predictions in new areas without crash data. Therefore, there
were two phases in this framework, first the training phase to train the models (figure 12), and then
the deployment phase to make new predictions with the models (figure 13). In each phase there were
three steps, both of which began with data collection and preparation. OpenStreetMap (OSM), Waze,
and Mapillary were used to develop this framework (additional examples of these datasets and relat-
ed analysis can be found in Annex 3).
                                                   The OSM road network provided the foundation for analysis. It is free-
                                                   ly available and scalable. OSM uses lines to represent roads and points
                                                   to represent links among the roads. In OSM, the geometric road lines
                                                   are split into road segments (called ways) that are connected by the
                                                   points (called nodes). No modifications were made to the OSM geom-
                                                   etry to maintain its synchronicity with other big datasets referencing
                                                   OSM ways and nodes.



                                                   The Waze crash data consists of coordinates representing the location
                                                   where users of the Waze application are when they see and report a
                                                   crash.62 The Waze crash points were joined to the nearest OSM road
                                                   segment (within 20 meters). For each road segment, the crash frequen-
                                                   cy, or crash per meter, was calculated to normalize the frequency of
                                                   crashes. Since OSM road segments vary in length and there could be
                                                   multiple reports per crash, calculating the crash frequency provided
                                                   crash trends. To identify road segments with more frequent crashes per
                                                   meter, the crash frequency was split into high and low risk.
                                                   Mapillary was used to obtain street view images, which were primari-
                                                   ly collected by the World Bank’s Global Program for Resilient Housing.
                                                   Since many images are captured along a street, and many images can
                                                   be linked to a single road segment, the image closest to the centroid of
                                                   the road segment was selected. The radius for this selection was with-
                                                   in three meters of the centroid. This approach standardizes the image
                                                   selection and classification: one image represents the scene of one road
                                                   segment. For each OSM road segment, a street view image taken near
                                                   the centroid of the segment was downloaded using Mapillary API v4.
  SOURCE: Original examples for this publication
  based on data from OSM, Waze, and Mapillary.
Copyright OpenStreetMap contributors, Microsoft,
   Esri Community Maps contributors. Basemap
    from Esri, HERE, Garmin, METI/NASA, USGS.



62
     Data provided by Waze App. Learn more at waze.com.


                                                                      51
The Training Phase
The training phase consisted of two significant steps that were powered by ML to extract information
from street view images and to make predictions on risk level based on extracted data. Each step had
an ML model at its core that needed to be trained based on data. Therefore, there were three steps in
the training phase.
Step 1. Select the region of interest and prepare data
A generalized polygon of the region of interest was used to collect data from OSM, Waze, and Mapil-
lary. The road network database was prepared, and the street view images closest to the centroid of
the road segment were downloaded as inputs for the models.

FIGURE 12: Training     phase for road safety segment analysis using ML
Geo-tagged street level images                                                                                (Big) data sources
                                                                                          Road network


                                                                                                              OSM Waze




                                                                                                  Road network
                                                                                                  (crash frequency)
                                       Mapillary                                                  database




     Deep learning model          Image analysis                           DL inferred                      Neural Network
     Road Information                                                      information                      classifier
     Collector (RIC)                                                       Lanes                            Road Risk Evaluator
                                                                           Shoulder                         (RRE)
                                                                           Street lighting                  Low risk
                                                                           Pedestrians crossing             High risk
                                                                           …


                                                                                              SOURCE: Original figure for this publication.



Step 2. Develop ML model for identifying road characteristics
The first custom ML model developed for this case study was the Road Information Collector (RIC),
shown in figure 12. It is a deep convolutional neural network, Mask R-CNN, which can classify and
count objects detected in images.63 The RIC model was trained with images from the updated Map-
illary Vistas Dataset (initially released in 2017), which provides detailed characteristics for types of
road markings and barriers, traffic lights and signs, and vulnerable road users such as pedestrians,
motorcyclists, and bicyclists.64 Other identifiable characteristics include flat terrain, which charac-
terizes road gradient, and the presence of potholes, which could indicate paved, urban road quality.
The RIC takes street view images as the input and can detect more than 100 classes of objects as the
output (for a complete list of the features the RIC model detects, refer to Annex 4). The model can


 Kaiming He et al., “Mask R-CNN,” 2017 IEEE International Conference on Computer Vision (2017): 2980-2988.
63

 G. Neuhold et al., “The Mapillary Vistas Dataset for Semantic Understanding of Street Scenes,” 2017 IEEE International
64

Conference on Computer Vision (ICCV) (2017): 5000-5009, doi: 10.1109/ICCV.2017.534


                                                            52
detect and classify some road features better than others (for the precision score in detecting and
classifying the objects, see Annex 5).
Step 3. Develop ML model for evaluating road risk
The second ML model developed was the Road Risk Evaluator (RRE). The RRE is a neural network
classifier with two hidden layers; each has 50 neurons. The RRE was trained using paired data for
each road segment, the road attributes from the RIC and the assigned road risk from the road net-
work database. Similar work was conducted by a team using a neural network to predict the crash
frequency of road segments.65
The Deployment Phase
Once the two ML models are trained, they can be added to an automated workflow in the deployment
phase. This means the trained ML models can now predict the risk level for any road segment with
the required input data – a street view image. Crash data is not required in the deployment phase.
FIGURE 13:   Deployment phase to predict road safety
Street level images for each road segment                                Road network

                                                                                                       Region of interest

                                                                         OSM




                                      Mapillary




     Deep learning model        Image analysis                           DL inferred                     Neural Network
     Road Information                                                    information                     classifier
     Collector (RIC)                                                     Lanes                           Road Risk Evaluator
                                                                         Shoulder                        (RRE)
                                                                         Street lighting                 Low risk
                                                                         Pedestrians crossing            High risk
                                                                         …


                                                                                           SOURCE: Original figure for this publication.



The deployment phase uses three steps to predict risk within an automated workflow (figure 13).
Step 1. Select the region of interest and download data
For the selected region of interest, the code will download the road network from OSM and calculate
the centroid of each road segment. The code will then download from Mapillary API a street view
image taken near the centroid of the road segment.




 Qiang Zeng et al., “Rule Extraction from an Optimized Neural Network for Traffic Crash Frequency Modeling,” Accident
65

Analysis & Prevention 97 (2016): 87-95.


                                                           53
Step 2. Identify road characteristics
For each road segment, the downloaded image will be fed into the RIC to extract road characteristics.
For each image, the RIC will output the numbers of detected objects for each class (refer to Annex 4
for classes). These numbers are put together to form a vector for each image.
Step 3. Evaluate road risk
Each vector produced by the RIC will be fed into the RRE to calculate the risk level: high or low. To illus-
trate the automated workflow of the deployment phase, figure 14 shows the risk prediction for a road
segment. The RIC detected a flat road, car, and motorcycle; therefore, the RRE predicted the road seg-
ment as low risk. This framework requires no historical crash data to identify high- or low-risk roads.

FIGURE 14:   RIC and RRE applied to predict road segment risk




                                                                                                                      RIC




                                                               RRE                            construction--flat—road x 1
                                   Risk level: Low                                            object--vehicle—car x1
                                                                                              object--vehicle—motorcycle x1

                                   SOURCE: Original figure for this publication, based on data from Mapillary and annotated with classifications from the model.



The two case studies presented illustrate the training and deployment phases.
The training phase was conducted in Bogotá, where data was collected to train the ML model RRE,
while the RIC model was trained on the Mapillary Vista Dataset. Then the models were applied in the
deployment phase to predict the risk level for each road segment in Bogotá, Colombia.
The second case study was in Padang, Indonesia. The RIC and RRE models trained in the previous
case study were applied directly (i.e., without re-training) in a deployment phase to predict road risk
in Padang. This demonstrates that, ideally, there is no need to re-run the training phase for future
applications since the RIC and RRE are already trained.




                                                                    54
3.3 Case Study 1: Bogotá, Colombia

The Training Phase

Step 1. Select the region of interest and prepare data
In Bogotá, a road network database was created to prepare training data for the ML models. First, a
generalized polygon of the region was used to retrieve roads from OSM and six months of crash re-
ports from Waze (July–December 2020). The crashes were joined to the nearest OSM road segment
within 20 meters. The crash frequency, or crash per meter, was calculated and road segments were
divided into high risk (crash frequency >0.5) and low risk (crash frequency <=0.5) in the road network
database. This means a crash per meter of 1 represents one crash per meter in the six months of the
Waze data collected. Street view imagery was downloaded using the Mapillary API to collect images
close to the centroid of each road segment. Table 18 provides an overview of the data sources for this
case study.
TABLE 18:   Data used for case study in Bogotá, Colombia
DATA SOURCES           ATTRIBUTES                                             REMARKS
ROAD NETWORK
OSM                    Road network (road segment length)                     Provided through an open license.
CRASHES
Waze                   Road alerts (crashes reported by users, coordinates)   Obtained through DDP.
ROAD CHARACTERISTICS
Mapillary              Street view image detections (crosswalk, curb,         Selection of image annotation tags used
(images and tags)      guard rail, human, marking, pothole, sidewalk, sign,   in study; more available through Mapillary
                       streetlight, traffic sign, utility pole)               Traffic Sign and Vistas. Multiple detections
                                                                              per image are possible.
                                                                                             SOURCE: Original table for this publication.



Step 2. Develop ML model for identifying road characteristics
The RIC was developed and trained to perform instance segmentation. It is a deep convolutional
neural network that identified the classes, or objects in the image, and provided the count of these
classifications. The model was trained using the Mapillary Vistas Dataset using a total of 124 classes
(Annex 4).66 The resulting output is a count of the classes identified by the bounding boxes, shown in
figure 15, which is represented through a series of integers.

                                     Training data: Mapillary Vistas Dataset (124 classes)
                                     Input: Street view image near the centroid of a road segment
                                     Output: A vector of integers (each element represents the
                                     count of detected objects that belong to a class)


Figure 15 depicts the RIC in action on an image from Bogotá. The bounding boxes surrounding each
object in the image indicate classes the model identified. Confidence levels are provided next to the
name of the object segmented by the bounding box. The closer the confidence level is to 1, the higher
the confidence in the prediction. Looking at the center of the image, the bicyclist was identified with
0.5 confidence, and other vulnerable road users were recognized, such as a motorcyclist (0.84) and
pedestrian (0.75). Vehicles were segmented with high confidence for the bus (0.7), motorcycle (0.88),


66
     G. Neuhold et al., “The Mapillary Vistas Dataset for Semantic Understanding of Street Scenes.”


                                                                    55
and car (0.99). The RIC segmented traffic signs, support and utility poles, flat road, and road mark-
ings as well.

FIGURE 15:   Image segmentation in Bogotá




                                                      SOURCE: Original figure for this publication, based on data from Mapillary.



The sample image shows favorable results for image segmentation. The performance of the RIC mod-
el in terms of the average precision of the bounding box detection and classification for each class
is provided in Annex 5. In the next step, road attribute data extracted through the RIC were inputs
for the prediction model to link the road characteristics with the likelihood of a crash in the road
networks examined.
Step 3. Develop the ML model RRE for evaluating road risk
To develop the RRE, six study areas in Bogotá, Colombia were selected to reduce computational load.
These study areas were drawn to include a wide variety of neighborhoods (poor, rich) and placed
throughout the city. They also contain high and low crash frequency road segments and comprehen-
sive street view image coverage. Figure 16 shows the six study areas along with the crash risk from
the road network database, high risk (crash frequency >0.5) and low risk (crash frequency <=0.5).




                                                 56
The low- and high-risk road segments in these                                        FIGURE 16: Six study areas and crash frequency in Bogotá

areas were the training data for the model. Based
on the segment risk derived from the road net-
work database and the characteristics for each
road segment derived from the RIC, the model
was trained to evaluate a road segment as high
or low risk.

    Training data: The following input-output pairs obtained from
    road segments in six study areas in Bogotá, Colombia.
    Input: A vector of integers, which is the output of RIC*
    Output: 0 (low risk) or 1 (high risk)


*
 Only 106 out of 124 classes are used as the input to RRE. A total of 18
classes irrelevant to road characteristics, such as sky, bird, etc., were re-
moved from the vector before entering into the RRE.

In searching for an optimal architecture of the
neural network, the number of layers and neu-
rons were tested for the best performance. Test-
ing showed that more layers or neurons do not
significantly improve the performance on this                                             Crash per meter
dataset. The RRE was used to evaluate whether                                                 0.5 - 3.2
a road segment was low or high risk based on a                                                0.0 - 0.5

street view image.                                                                               SOURCE: Original figure for this publication, based on data from OSM and data
                                                                                                                           provided by the Waze App. Learn more at waze.com.


Overall performance of the ML
                                                                                     FIGURE 17: Confusion matrix showing
                                                                                     the accuracy of the RRE model
Predictions of low-risk road segments were cor-
rect 70 percent of the time, and predictions of
high-risk road segments were correct 75 percent
of the time (figure 17). The mean accuracy and F1-
score were both 72.5 percent. The closer the ac-                                              Low                  0.7                                    0.3

curacy and F1-score are to 100 percent, the better
the performance of the model. In the case of this
                                                                                 True value




model, a random guess of a binary classification
is 50 percent, which makes these results prom-
ising. These results suggest the model would
perform well in similar contexts as Bogotá. If
needed, there would be potential to fine-tune the                                             High                0.25                                   0.75
model for increased accuracy and precision in
other areas.

                                                                                                                  Low                                   High
                                                                                                                                 Prediction
                                                                                                                                   SOURCE: Original figure for this publication.




                                                                                57
 TIPS FOR INTERPRETING ML PERFORMANCE
 The performance of an ML model can be evaluated using accuracy, precision, recall, and the F1-score. These are derived by counting
 the correct predictions (true positives and true negatives) and incorrect predictions (false positives and false negatives).

           accuracy = correct predictions / all predictions

           precision = true positives / (true positives + false positives)

           recall = true positives / (true positives + false negatives)

           F1-score = 2*((precision * recall) / (precision + recall))

 A confusion matrix shows how well the model performed in predicting road risk through a comparative chart of the true positives,
 true negatives, false positives, and false negatives.



Bogotá Results

Following the three-step workflow of the deploy-                             FIGURE 18:   Road risk prediction in Bogotá
ment phase described in section 3.2, road risk
was predicted for the entire road network in Bo-
gotá. In total, 98,488 images were processed to
make the predictions shown in figure 18. Road
segments without an image within 3 meters
were not predicted. Overall, high crash frequen-
cy from Waze and high-risk predictions exhibit-
ed similarity along some segments, particularly
on arterial roads; however, the model tended to
moderately overpredict high risk.




                                                                              Risk level
                                                                                  High
                                                                                  Low
                                                                                  No data

                                                                                                 SOURCE: Original figure for this publication, based on data
                                                                                                                            from Mapillary, OSM and Waze.




                                                                        58
3.4 Case Study 2: Padang, Indonesia

The Deployment Phase

The model that was built in Bogotá was applied in Padang. Similar to Bogotá, the road network was
accessed through OSM, and street view images were downloaded using the Mapillary API. Waze
crash data was joined to the OSM road network to compare with risk predictions. Padang had limited
geospatial crash data to validate the model. Table 19 provides a description of the datasets.

TABLE 19:   Data used for case study in Padang, Indonesia
DATA SOURCES         ATTRIBUTES                                             REMARKS
ROAD NETWORK

OSM                  Road network (road segment length)                     Provided through an open license.
CRASHES
Waze                 Road alerts (crashes reported by users, coordinates)   Obtained through DDP.
ROAD CHARACTERISTICS
Mapillary            Street view image detections (crosswalk, curb,         Selection of image annotation tags in study;
(images and tags)    guard rail, human, marking, pothole, sidewalk, sign,   more available through Mapillary Traffic Sign
                     streetlight, traffic sign, utility pole)               and Vistas. Multiple detections per image are
                                                                            possible.
                                                                                          SOURCE: Original table for this publication.



Padang Results

In Padang, preliminary results pointed to the framework’s potential in scanning roads for safety.
Figure 19 shows predictions where arterial road segments were predominately designated as high
risk (red lines). Residential areas were interspersed with low- and high-risk road segments. Similar
patterns of road segments predicted as high risk along arterial roads and a mix of low and high risk
along residential and tertiary road segments were largely found.

                                                                 59
FIGURE 19:   Road risk prediction in Padang




  Risk level
      High
      Low

   SOURCE: Original figure for this publication, based on data from OSM and data provided by the Waze App. Learn more at waze.com. Drone imagery provided by the World Bank
                                                                                                                                       Global Program for Resilient Housing.



In general, where there were crashes reported by Waze, high-risk road segments were predicted.
These preliminary results were encouraging; however, verifying the results was difficult because
there was not sufficient data. While the deployment of the framework in Padang requires further
validation with more data, ML-based approaches such as this are promising to offer initial road safety
scans.

3.5 Findings

The Integrated Framework for Road Risk Prediction demonstrates the strength of ML to identify road
segment safety with substantial accuracy (72.5 percent) in Bogotá. Preliminary results in Padang
support replicating the framework with further validation in other areas. Using advanced ML tech-
niques, the framework applied a streamlined approach that relied on road characteristics and crash
frequency to determine crash risk in the training phase. Then the ML models applied in the deploy-
ment phase could predict road risk based on road characteristics without historical crash data.
The alternative data sources used to train the models were robust – thousands of annotations,
high-resolution images, and crash data joined to extensive road networks – and of suitable quality for
the models to provide a road safety scan. To identify road characteristics, the RIC was trained using
the Mapillary Vistas Dataset, which has a breadth and depth of annotations from different contexts,
providing geographic diversity. The RRE was trained using a pairing of the road characteristics and a
road network database created from OSM road segments and Waze crash data. OSM road segments


                                                                                   60
offered global scalability and were sufficient for a coarse assessment in these case studies. Waze data
availability was dependent on the area (and the users of the app). Given the potential for duplicate
crash reports, Waze data was not relied on for accurate crash data in Bogotá; instead, it was used to
identify crash patterns of high- and low-risk road segments.
The framework is not suitable for detailed road assessments. However, it can be applied to screen
roads for safety without historical crash data if the RIC model is enhanced with more training data
and calibrated for the local street view context; the RRE model can be modified and enhanced with
fine-grained training data. It is replicable in other areas with the following recommendations, which
are applicable for developing other ML-based frameworks for road safety.
Incorporate training data to fine-tune the model for a specific location. Typically, ML models trained
on data collected from one region do not work well for a new region. This is called domain shift: the
testing data has a different distribution than the training data. In this case, including data collected
from the new region in the training phase will usually help. It is important to evaluate the data and
consider any influences the collection method may have on the potential to introduce bias into the
project. For example, if local crash data is introduced to train the RRE, it would help validate and
potentially improve the model’s application in the location of interest. Both RIC and RRE can be con-
tinually trained with newly obtained data so that the knowledge learned from previous data can be
carried on for new regions while the model is still applicable to the previous regions.
It is essential to ensure that models are based on sufficient, high-quality training data. In general,
at least a few thousand annotations are recommended to identify objects from images with simple
context, depending on the characteristics of the object. Whether the street view images are obtained
through big data platforms such as Mapillary or collected by the team, street view imagery covering
different geographical regions makes the trained object detection model, like the RIC, more robust.
Since street level images capture the visual scene (road characteristics and road users) at a single
point in time, it is important to consider these implications when using a snapshot of that time of day,
day of week, and season. Relatedly, a road characteristic may be covered or occluded in a street view
image; for instance, when a passing truck blocks a sign. Imagery collected at a frequent distance,
such as every two meters, permits greater flexibility to analyze the road scene and predict risk using
the RIC and RRE. OSM road networks require review for recency and accuracy, and possibly editing
to ensure suitable quality and coverage in other areas. If high-quality, granular crash data shows a
clear pattern of more risk classes, three classes could be predicted: for example, high, medium, and
low risk.




                                                  61
Conclusion


Big data and ML offer promising opportunities to improve current road safety assessment proce-
dures for sustainable development. Road safety assessments are often required for new transport
and infrastructure developments to be approved or as part of their monitoring and evaluation once
they are completed. However, conducting road safety assessment procedures can be expensive and
time-consuming. Alternative data sources and ML can optimize this process by identifying patterns
using complex predictive models. The Integrated Framework for Road Safety offers one approach
using street view imagery that can be accessed through Mapillary or collected by the team to provide
a road safety scan. With further training, this framework has the potential to provide detailed road
safety assessments, mitigating the need for manual annotations (or years of historical crash data). In
addition to the pilots and studies conducted by the researchers and representatives of road safety or-
ganizations interviewed for this note, there are many ML models contributing to road safety efforts,
which typically outperform statistical models in predicting road safety.67
Integrate alternative data sources and ML into road safety assessments with care. Finding valid,
representative data can be a significant challenge in evaluating risks and reducing crash fatalities and
injuries through data-driven, evidence-based interventions. Teams can directly partner with private
companies and data providers to retrieve alternative sources of data. And data sharing platforms, such
as DDP, offer streamlined solutions. However, commercial data sources are not typically established
to collect data for road safety analysis, and their data may be inadequate for road safety assessment
methods and procedures. Data can be biased, incomplete, and challenging to synchronize with con-
ventional analytical tools. The implications of collecting and analyzing big data using ML require thor-
ough consideration. Data privacy and security are central concerns; data needs to be de-identified and
anonymized and stored according to institutional guidelines.68 Data and models need to be screened
for biases that can affect their outcomes. For example, imbalanced access to smartphones or social
media may amplify gender or community bias.69 Teams can adhere to best practices and data policies
and make their ML models and results transparent and openly shared. Resources such as “A Frame-
work for Understanding Sources of Harm throughout the Machine Learning Life Cycle” and “The
Ethics of Artificial Intelligence” may be helpful for teams implementing ML in their projects.70
The approach used for the case studies in this note can be extended to evaluate specific measures of
road safety. For example, while the framework uses the crash frequency and may identify the number of
relevant road users in a street view image, it does not thoroughly consider the number of (vulnerable) road
users nor does it consider the probability of a crash causing fatalities or serious injuries. The approach could


67
   Philippe Silva, Michelle Andrade, and Sara Ferreira, “Machine Learning Applied to Road Safety Modeling: A Systematic
Literature Review,” Journal of Traffic and Transportation Engineering 7, no. 6 (2020): 775-790,
https://doi.org/10.1016/j.jtte.2020.07.004
68
   World Bank, World Development Report 2021: Data for Better Lives (Washington, DC: World Bank, 2021). doi:10.1596/978-1-
4648-1600-0
69
   World Bank, Use of AI Technology to Support Data Collection for Project Preparation and Implementation: A ‘Learning-by-doing’
Process (Washington, DC: World Bank, 2021).
70
   Harini Suresh and John Guttag, “A Framework for Understanding Sources of Harm throughout the Machine Learning Life
Cycle” in Proceedings of Equity and Access in Algorithms, Mechanisms, and Optimization (EAAMO ‘21),
https://doi.org/10.1145/3465416.3483305; Nick Bostrom and Eliezer Yudkowsky, “The Ethics of Artificial Intelligence,” in The
Cambridge Handbook of Artificial Intelligence, ed. Keith Frankish and William M. Ramsey (Cambridge: Cambridge University
Press, 2014): 316-334.


                                                              62
also be extended using complementary data such as road geometry, traffic flow, traffic volume, traffic speed,
weather, season, and other factors affecting visibility along the road or road surface conditions. The case
studies illustrate the potential of big data and ML to reduce the manual inspection of roadways and provide
road safety insight where otherwise the information is in short supply, thereby contributing to safer roads.
For big data to be fully leveraged for road safety analysis, governments, road safety advocates,
and international development organizations will want to consider investing in platforms and tools
that specialize in collecting and analyzing data for road safety. Ongoing efforts to establish regional
road safety data observatories provide an opportunity to gather data providers and create a data mar-
ketplace specifically for road safety analysis, especially where alternative or traditional sources are
scarce. Government regulations and initiatives to encourage private companies to share data could
further integrate big data in international development projects, including road safety. It is essential
for key stakeholders in road safety assessment to collaborate closely with pioneers of these technol-
ogies to realize their potential in road safety analysis.71 Initiatives such as the Artificial Intelligence
in Road Traffic Crash Prevention Roundtable hosted by the International Transport Forum (ITF) in
early 2021 is an example of one such opportunity. Conversations with World Bank team leaders and
transport specialists reveal that developing a tool to provide a single, easy-to-use solution to access
and utilize big data for road safety analysis is in high demand. There is potential to automate some of
the processing and analysis for which specialist expertise is currently required, and initiatives such as
Ai-RAP and the World Bank Simplified Methodology suggest that practical, scalable solutions could be
a reality soon.72 As big data and ML become more accessible, and as their adoption accelerates world-
wide, road safety practitioners, governments, road safety advocates, and international organizations
can unlock their immense potential to improve the quality and efficiency of road safety assessments.

71
   Subasish Das and Greg P. Griffin, “Investigating the Role of Big Data in Transportation Safety,” Transportation Research
Record 2674, no. 6 (2020): 244–52, https://doi.org/10.1177/0361198120918565
72
   Monica Olyslagers (Safe Cities and Innovation Specialist, iRAP) and Satoshi Ogita (Senior Transport Specialist, World Bank),
in discussion with the authors, April 2021.


                                                              63
ANNEX 1:
Most Relevant Big Data Types for Road Safety Analysis

DATA COLLECTION        POTENTIAL SOURCES         POTENTIAL           ADVANTAGES                    LIMITATIONS
                                                 APPLICATIONS

Street view            •	 Apple Look Around      Identify road       •	 Provides objective         •	 Coverage is incomplete, particularly
imagery                •	 Google Street View     attributes for         evidence of conditions        in rural and low-income areas.
                                                 road safety            in the field.              •	 Licensing restrictions for ML
                       •	 KartaView
                                                 assessments.        •	 Can be used in regions        application.
                       •	 Mapillary
                                                                        where government data
                       •	 Collected by team                             is not available.
Mobile                 •	 Mobile application     Identify vehicle  •	 App data is usually low      •	 Coverage is lighter in rural areas or
applications and          data                   movement, traffic    cost and current.               cities where use of app is low.
telematics             •	 Telematic companies    flows and road    •	 Telematic data could         •	 Often requires data sharing
                                                 use by various       show risky driving              agreements with private companies.
                       •	 Rideshare companies
                                                 types of users       behavior.
                                                 for crash risk.
                                                 identification
                                                 and road safety
                                                 assessments.
Crowdsourced           •	 Waze                   Obtain crash data •	 Can supplement               •	 Requires app use in the region of
                       •	 Delivery drivers       and information      government data,                interest.
                                                 related to road      particularly if incidents    •	 Needs coordination and resources to
                       •	 OSM
                                                 use, such as         are underreported or            collect reports from delivery drivers.
                       •	 Social media           types of road        government provided          •	 Data quality may be low.
                                                 users and their      road networks are
                                                                                                   •	 Social desirability bias can occur,
                                                 relative density     unavailable.
                                                                                                      where users feel inclined to share
                                                 at a specific
                                                                                                      specific types of information to
                                                 location. Can help
                                                                                                      reinforce a positive or negative
                                                 to identify road
                                                                                                      perspective.
                                                 risks.
Government             •	 Government transport   Most frequently      •	 Data often has many        •	 Data can be messy (human error).
                          agencies               used to obtain          attributes or details that •	 Data often not shared.
                       •	 Road safety            crash data,             have been manually
                          observatories          including               added.
                                                 statistics related •	 Data often has been
                                                 to crash severity,      collected for many years
                                                 crash frequency         in the same manner,
                                                 as well as              allowing for temporal
                                                 fatalities and          analysis.
                                                 injuries statistics.
Aerial and satellite   •	 Earth observation      Identify road       •	 Covers large geographic    •	 Requires balancing the cost with
imagery                   agencies               attributes for         area.                         recency and granularity of imagery.
                       •	 Private companies      road safety
                                                 assessments.
Meteorological         •	 Meteorological         Review weather      •	 Infer driving conditions   •	 There are varying levels of
sensors                   agencies               conditions that        (i.e., if road surface        granularity.
                       •	 Local universities     may affect road        conditions are not
                                                 safety, such as        available in government
                       •	 Private companies
                                                 crashes.               crash data).
                                                                                                         SOURCE: Original table for this publication.




                                                                    64
ANNEX 2:
Overview of Big Data Sources

Data sources accessible through DDP are indicated as free for World Bank task teams.
DATA                         ACCESS           ATTRIBUTES                        RESOLUTION         COST   COMMENTS
                                                                                AND FORMAT

STREET VIEW IMAGERY
Apple Look Around            Early stages;    Requires processing to derive     Image              N/A    Offers extremely limited
                             contact          physical features related                                   geographic coverage.
                             company          to road safety, such as:
                                              crosswalks, speedbumps,
Google Street view           Not accessible                                     360 photos must    N/A    Global coverage is fairly
                                              painted lines, roads, road
                             according to                                       be at least 4K            extensive.
                                              shoulders, sidewalks,
                             license                                            (image)
                                              streetlights, traffic signs and
KartaView                    Open license     others specific to region of      Depends on         Free   Images are free, though image
                                              interest.                         camera (image)            processing is required (see
                                                                                                          street view training data);
                                                                                                          global coverage is variable.

Mapillary                    DDP                                                Depends on         Free   Images are free, though image
                                                                                camera (image)            processing is required (see
                                                                                                          street view training data);
                                                                                                          global coverage is variable.
Collected by team            Requires                                           Depends on       High     Collection every two meters
                             permission and                                     camera (image or          recommended for images.
                             coordination                                       video)                    Images or video require
                             with local                                                                   processing; see street view
                             government                                                                   training data.
STREET VIEW TRAINING DATA
Mapillary Traffic Sign DDP                    Traffic signs                     Resolution can     Free   More than 300 traffic sign
                                                                                be very high or           classes covering six continents.
                                                                                very low. The
Mapillary Vistas             DDP              Physical features related to      model performs     Free   Coverage spans six continents.
                                              road                              best on images
                                              crosswalks, speedbumps,           with the same
                                              painted lines, roads, road        resolution level
                                              shoulders, sidewalks,             of the training
                                              streetlights, traffic signs       dataset. (image)
                                              (others possible)
Annotation by team           Hire a team      Physical features related to                         High   Consider collaborating with
                                              road, specific to region of                                 stakeholders in a region of
                                              interest                                                    interest to label images using
                                              crosswalks, speedbumps,                                     a Computer Vision Annotation
                                              painted lines, roads, road                                  Tool (CVAT) or a labeling team
                                              shoulders, sidewalks,                                       with training.
                                              streetlights, traffic signs                                 2,000 labels per class is
                                              (others possible)                                           recommended for a simple
                                                                                                          classification.
World Bank’s                 Open source      Physical features related to                         Free   Video analysis produces a
GRSF Road Risk                                road                                                        richer dataset.
Assessment                                    road grade and curvature,                                   Piloted in Liberia and
software*                                     pedestrian crossings,                                       Mozambique.
* The software is included                    delineation, roadside severity,
in this section as video                      lane width, and number of
training data is limited                      lanes
in World Bank countries.
Contact Satoshi Ogita
(World Bank), for access.




                                                                         65
DATA              ACCESS           ATTRIBUTES                         RESOLUTION         COST     COMMENTS
                                                                      AND FORMAT

MOBILE APPLICATIONS AND TELEMATICS
Grab              Contact          Contact company                    N/A                N/A      Coverage offered in Cambodia,
                  company                                                                         Indonesia, Malaysia, Myanmar,
                                                                                                  Philippines, Singapore,
                                                                                                  Thailand, Vietnam.
HERE              Not accessible   Traffic                            Every minute       N/A      Detailed road network coverage
                  according        current and historical speeds,     (text, number)              in more than 200 countries and
                  to standard      jams, crashes, road closures                                   comprehensive traffic speeds
                  license          and road construction                                          in more than 80 countries.
Mapbox Movement   DDP              Movement activity index;           Aggregated daily Free       See Mapbox Movement
                                   driving activity index available   or monthly at 100           data processing guidelines
                                   in select locations                meter resolution            for recommendations and
                                                                      (text, number)              considerations when using this
                                                                                                  dataset.
Mapbox Traffic    DDP              Traffic (typical speed) each       Typical speed per Free
                                   road segment, identified by a      road segment
                                   start and end node, has 2,016      in five-minute
                                   typical speed predictions (7       increments over
                                   days × 24 hours × 12 five-         a week (text,
                                   minute periods)                    number)
Moovit            Contact          Urban transit                      Contact company N/A
                  company          (public and on-demand)
Ola Cabs          DDP              Travel time and potholes           Contact DDP        Free     Coverage provided in India.


Orbital Insight   DDP              Foot traffic                       Each minute; 2019 Free      Foot traffic using mobile
                                   time of day, day of week,          to present (text,           location data in region of
                                   velocity (stationary, walking),    number)                     interest, subject to data
                                   dwell time                                                     availability per country.
TomTom            Contact          Traffic                            Every minute per   Free to Global coverage is variable.
                  company          current and historical speeds,     road segment       Medium
                                   jams, crashes, road closures       (text, number)
                                   and road construction
Uber Movement     Contact          Traffic                            Average travel     Free     Geographic coverage is limited
                  company          travel times between zones,        time, average               to a selection of major cities.
                                   average speed per segment          speeds per hour,            Currently no API.
                                   and traffic density                time of day or              Was previously part of DDP.
                                                                      quarter of year
                                                                      (text, number)
Unacast           DDP              Human movement                     Coordinates,        Free    Mobile Location Data Inventory
                                                                      horizontal                  for geographic coverage
                                                                      accuracy,                   available through the DDP
                                                                      timestamp, time             website.
                                                                      zone (text, number)
Veraset           DDP              Human movement                     Coordinates,       Free
                                                                      horizontal
                                                                      accuracy,
                                                                      timestamp (text,
                                                                      number)
Waze              DDP              Traffic (alerts, jams,             Every minute;      Free     Includes weather alerts and
                                   irregularities)                    location provided           major and minor crashes by
                                   major and minor crashes;           as coordinates,             application users; see Waze
                                   severity of congestion or          road segment,               under Crowdsourced section.
                                   irregularities; current and        street name (text,
                                   typical speed on jammed            number)
                                   segments; coordinates, road
                                   segment (start and end node),
                                   street name; road type; driving
                                   direction (NSEW); turn type;
                                   alerts (construction, road
                                   closure and weather)




                                                              66
DATA                 ACCESS         ATTRIBUTES                          RESOLUTION            COST     COMMENTS
                                                                        AND FORMAT

WhereIsMy            DDP            Informal transit network            Determined in         Medium Specializes in producing
Transport                                                               collaboration         to High informal transit data according
                                                                        with team                     to General Transit Feed
                                                                                                      Specifications (GTFS).
                                                                                                      Supports team in collecting and
                                                                                                      processing data in exchange
                                                                                                      for the team covering in-field
                                                                                                      costs of data collection and
                                                                                                      facilitating engagement with
                                                                                                      local transport authorities.
CROWDSOURCED
OSM                  Open license   Road segments                       Centerline of road Free        May include additional road
                                    road type, length, and features     segments, referred             attributes: lanes, name,
                                                                        to as ways and                 smoothness, surface, speed
                                                                        relations                      limit, and width, and other
                                                                        (text, number)                 information such as overtaking
                                                                                                       permitted or lighting.
Twitter              DDP            Road incidents                      User-dependent;   Free
                                    tweeted                             can be associated
                                                                        with a place or
                                                                        location
                                                                        (text, number)


Waze                 DDP            Road incidents                      Every minute;      Free
                                    reported using app                  location provided
                                                                        as coordinates,
                                                                        road segment,
                                                                        street name (text,
                                                                        number)
Delivery drivers     Coordinated by Road incidents                      Depends on            High
                     team           reported using app                  collection (text,
                                                                        number)
GOVERNMENT
Government or road   Government     Incidents                           XY coordinate         Free to Processing requires standard
safety observatory   contact or     (date, time, severity, type)        per incident (text,   Low     GIS software such as ArcGIS or
                     open data                                          number)                       QGIS (free).
                     platform                                                                         Storage is small, typically <1GB
                                                                                                      per urban area over multiple
                                                                                                      years.
                                    Road segments                       Road segments         Low
                                    (type, width, speed limit)          (text, number
                                    Traffic lights                      XY coordinate per Low          May include intersection type
                                    (intersection type)                 traffic light (text,           (pedestrian, bicyclist, for
                                                                        number)                        example).
SATELLITE AND AERIAL IMAGERY (AND OTHER REMOTE SENSING)
Maxar Technologies   Contact        Elevation and roads                 Less than 1m          High     Requires processing to derive
                     company                                            (image)                        road networks.
                                                                                                       Was previously part of DDP.
Orbital Insight      DDP            Car and truck count; roads          Car and truck       Free       Car and truck count derived
                                                                        count: high                    from satellite imagery.
                                                                        resolution,                    Limited Geospatial Intelligence
                                                                        2013 to present;               Platform credits to derive
                                                                        roads: medium                  roads in region of interest; not
                                                                        resolution, 2016 to            for routable road networks;
                                                                        present                        not suitable for narrow roads
                                                                        (image, number)                in urban areas or dirt or
                                                                                                       mountainous roads in rural
                                                                                                       areas.




                                                                   67
DATA                  ACCESS          ATTRIBUTES                          RESOLUTION        COST      COMMENTS
                                                                          AND FORMAT

Security or traffic   Collected        Traffic density and volume         Depends on        Medium
cameras               by team or                                          camera (image     to High
                      through external                                    or video)
                      resource

Unmanned aerial       Collected by    Elevation, roads, traffic density   Depends on        Medium Recent research suggests
vehicle (UAV)         team            and volume                          camera (image     to High traffic density and volume are
                                                                          or video)                 possible to calculate.
METEOROLOGICAL SENSORS
OpenWeather           Contact         Weather                             40-year historical Low      Price is economical for the
                      company         (weather type, temperature,         archive for any             40-year history of a single
                                      wind speed and direction,           coordinates by the          coordinate or city.
                                      cloud coverage; rain and snow       hour; or by city or         Contact provider for details on
                                      volume by hour and per 3            1 km, 5 km, 10 km           pricing and to download many
                                      hours)                              or customized grid          locations.
                                                                          (text, number)
Tomorrow.io           DDP             Weather                             500m radius with Free
                                      (weather type, temperature          precipitation
                                      and humidity; wind speed,           recordings as low
                                      direction, gust; precipitation      as 30 feet off the
                                      type, intensity; snow and ice       ground; time steps
                                      accumulation; visibility, moon      range from one day
                                      phase)                              to one minute
                                                                          (text, number)
                                                                                                      SOURCE: Original table for this publication.




                                                                 68
ANNEX 3:
Hotspots and Heatmaps: Uncovering Data Patterns
for Road Safety

Data visualizations are provided in the case study regions using alternative data sources, such as
OSM, Mapbox, and Waze, as well as a select government dataset.

Bogotá, Colombia

Temporal data visualizations show road safety patterns between years, seasons, months, weeks,
days, and times of day. The Waze crash data used to train the ML model covered a period of six
months, from July through December 2020. It was anticipated that the pandemic would affect the
number of Waze crash reports, and potentially traffic patterns, as crashes reported by the govern-
ment noticeably decreased compared to prior years (figure 3.1). The government dataset revealed
fewer incidents starting in March 2020, suggesting that the number of crashes was affected by the
pandemic, though it is worth noting that the speed limit was also reduced from 60km/h to 50 km/h
in May 2020 (figure 3.2). With this in mind, the Waze data was used to identify road safety trends.

FIGURE 3.1:   Road crashes with damage, injury or death in Bogotá, 2016–2020
        With damage                  With injury               With death
 23,530                       23,775
                                                          22,606
                                                                                         21,260




                                                                                                                      12,874
 10,412                       10,096                       11,857                        11,799

                                                                                                                       8,015




  567                          536                            485                          491                          371
  2016                        2017                          2018                         2019                          2020
          SOURCE: Original figure for this publication, based on data from Datos Abiertos Secretaría Distrital de Movilidad.



FIGURE 3.2:   Road crashes per month in Bogotá, 2016–2020
                        Dec              Jan
                                     2020
          Nov                        2019               Feb

                                     2018
                                     2017
    Oct                                                        Mar
                                     2016




                                                                             Road crashes
    Sep                                                        Apr           per month
                                                                               ≤ 1,500
                                                                               ≤ 2,000
          Aug                                           May                    ≤ 2,500
                                                                               ≤ 3,000
                        Jul                 Jun                                ≤ 3,256
                              SOURCE: Original figure for this publication, based on data from
                                             Datos Abiertos Secretaría Distrital de Movilidad.




                                                                                      69
Hotspot analysis groups crash locations to determine statistically significant clusters of crashes.
Government and Waze datasets were analyzed during the same six-month window (figure 3.3). Be-
tween the two datasets, similar hotspots were found near Avenida Boyacá and Calle 6 along the high-
way in the south, Avenida Norte-Quito-Sur (NQS). Overall, Waze had more hotspots than the govern-
ment dataset. Some minor road incidents captured by Waze may have gone unreported to the police.
This trend can be seen in minor collisions clustering further north in the city. This cluster does not
appear in the government data. Instead, clusters of government-reported crashes with only damage
(no injury or fatality) appear in a central band. The approach to identify hotspots can vary, including
the clustering method, size, shape, and search area of neighboring hotspots.

FIGURE 3.3:     Hotspot analysis of government and Waze crash data in Bogotá, July–December 2020
Cold Spot Confidence:             99%        95%        90%           Not significant       Hot Spot Confidence:             90%        95%         99%
Government (all crashes)                                     Government (death or injury)                              Government (damage only)




Waze (all crashes*)                                          Waze (major)                                              Waze (minor)




*Includes major and minor crashes, as well as those not categorized as either type.
SOURCE: Original figure for this publication, based on data from Datos Abiertos Secretaría Distrital de Movilidad and the Waze App. Learn more at waze.com. Basemap provided by
                                                                                                                                           Esri, HERE, Garmin, METI/NASA, USGS.




                                                                                      70
As with other alternative sources of data derived from mobile devices and apps, Waze crash reports
are influenced by the location of the users, which affects where and when the crashes are reported.
While Waze data notes major and minor incidents, the dataset will not include additional crash de-
tails typically obtained from an official source, such as type, severity, class, and reason. Even though
users can validate reports (e.g., thumbs up) to provide a confidence and reliability rating and flag false
reports, there is potential for duplication in Waze data. Deduplication was not conducted for this
analysis because this study was interested in relative crash patterns.
Identifiable temporal patterns display when major crashes are aggregated by the day of the week and
hour of the day (figure 3.4). In Bogotá, major crash reports increased between 6 and 7 p.m., having
the most crashes during this window on Friday. Fewer incidents occurred on Sunday.

FIGURE 3.4:   Major crashes reported on Waze in Bogotá, July–December 2020

Mon

Tue

Wed

Thu
           ≤ 100
Fri        ≤ 200
           ≤ 300
Sat        ≤ 400
           ≤ 507
Sun

       0      1    2   3   4   5   6   7   8     9     10    11 12 13              14     15     16     17     18      19     20     21     22     23
                                                            Hour of day
                                           SOURCE: Original figure for this publication, based on data provided by the Waze App. Learn more at waze.com.




                                                              71
Spatial and temporal analysis can be combined to identify areas for closer inspection that exhibit pat-
terns over time. This is valuable given human movement or behavioral changes, including the effects
of a pandemic, road construction, or updated speed limits, during the examined period. Emerging
hotspot analysis reviews clusters of crashes that are consistent over time and ones that are intensify-
ing or diminishing (figure 3.5).73 In this example, each week was analyzed. Intensifying hotspot areas
were statistically significant hotspots for 90 percent of the weeks analyzed with increasing intensity
of hotspots, including the final week.

FIGURE 3.5:    Emerging hotspot analysis of Waze crashes in Bogotá, July–December 2020




     SOURCE: Original figure for this publication, based on data provided by the Waze App. Learn more at waze.com. Basemap provided by Esri, HERE, Garmin,
                                                                                                                                      METI/NASA, USGS.




73
  For a complete list of definitions, see “How Emerging Hot Spot Analysis Works”:
https://pro.arcgis.com/en/pro-app/latest/tool-reference/space-time-pattern-mining/learnmoreemerging.htm


                                                                                     72
If interventions or investments target a specific road, more geographically detailed information is
required to make decisions. Hotspot analysis applied to road segments visualizes statistically signif-
icant crash frequencies along roads, as shown in figure 3.6.

FIGURE 3.6:   Hotspot analysis using Waze crash frequencies in Bogotá, July–December 2020




   Hot Spot confidence
        99%
        95%
        90%
        Not Significant


                                     SOURCE: Original figure for this publication, based on data provided by OSM and the Waze App. Learn more at waze.com




                                                                73
Padang, Indonesia

Heatmaps visualize the density of crashes. While Waze data was sparse in Padang, some spatial
patterns could be detected. A heatmap shows at least three distinct areas of high crash density that
could be further examined during a site inspection (figure 3.7).

FIGURE 3.7: Heatmap of crashes reported using the Waze app in
Padang, April 2019–July 2021




                                                                                                                                             HIGH
                                                                                                                                             DENSITY
                                                                                                                                             (YELLOW)




                                                                                                                                             LOW
                                                                                                                                             DENSITY
                                                                                                                                             (BLACK)

   SOURCE: Original figure for this publication, based on data provided by the Waze App. Learn more at waze.com. Basemap provided by Esri, HERE, Garmin,
                                                                                                                                    METI/NASA, USGS.




                                                                                 74
Road safety assessments may require operating speeds of road segments. Mapbox collects this
data from mobile devices and provides typical speeds per road segment in 5-minute increments.
In Padang, Mapbox speeds were visualized for a Thursday from 5:00 p.m. to 6:00 p.m. (figure 3.8).
Using the OSM road type to group and designate minor and major roads as a proxy for a low or
high-speed limit (speed limits were sparsely noted in OSM), minor roads are visualized with thinner
lines than major roads. The average speed typically slowed near intersections in pink (<25 km/h)
when compared to major roads in purple (25-50 km/h). High-speed road segments exceeding 50
km/h are found heading north and south along Jalan By Pass. Identifying road segments with high
speeds using Mapbox supports road safety assessments and the implementation of speed manage-
ment or traffic calming measures.

FIGURE 3.8:   Mapbox typical speeds in Padang on Thursday, 5:00 p.m. to 6:00 p.m.

                                                                                         Speed (km/h)
                                                                                             50.1 - 64.8
                                                                                             25.1 - 50.0
                                                                                             0.1 - 25.0
                                                                                             No Data




  SOURCE: Original figure for this publication, based on data provided by Mapbox. Basemap provided by Esri, HERE, Garmin, METI/
                                                                                                                   NASA, USGS.




                                                                                   75
Annex 4: Classes Detected Using Mapillary Vistas Dataset in RIC Model and Input Classes
for the RRE Model
All classes listed were detected using the   marking--discrete--arrow--other             object--sign--other
Mapillary Vistas Dataset. Classes in bold    marking--discrete--arrow--right             object--sign--store
were the input for the RRE Model.            marking--discrete--arrow--split-left-or-    object--street-light
                                             straight                                    object--support--pole
animal--bird                                 marking--discrete--arrow--split-right-or-   object--support--pole-group
animal--ground-animal                        straight                                    object--support--traffic-sign-frame
construction--barrier--ambiguous             marking--discrete--arrow--straight          object--support--utility-pole
construction--barrier--concrete-block        marking--discrete--crosswalk-zebra          object--traffic-cone
construction--barrier--curb                  marking--discrete--give-way-row             object--traffic-light--general-single
construction--barrier--fence                 marking--discrete--give-way-single          object--traffic-light--pedestrians
construction--barrier--guard-rail            marking--discrete--hatched--chevron         object--traffic-light--general-upright
construction--barrier--other-barrier         marking--discrete--hatched--diagonal        object--traffic-light--general-horizontal
construction--barrier--road-median           marking--discrete--other-marking            object--traffic-light--cyclists
construction--barrier--road-side             marking--discrete--stop-line                object--traffic-light--other
construction--barrier--separator             marking--discrete--symbol--bicycle          object--traffic-sign--ambiguous
construction--barrier--temporary             marking--discrete--symbol--other            object--traffic-sign--back
construction--barrier--wall                  marking--discrete--text                     object--traffic-sign--direction-back
construction--flat--bike-lane                marking-only--continuous--dashed            object--traffic-sign--direction-front
construction--flat--crosswalk-plain          marking-only--discrete--crosswalk-zebra     object--traffic-sign--front
construction--flat--curb-cut                 marking-only--discrete--other-marking       object--traffic-sign--information-parking
construction--flat--driveway                 marking-only--discrete--text                object--traffic-sign--temporary-back
construction--flat--parking                  nature--mountain                            object--traffic-sign--temporary-front
construction--flat--parking-aisle            nature--sand                                object--trash-can
construction--flat--pedestrian-area          nature--sky                                 object--vehicle--bicycle
construction--flat--rail-track               nature--snow                                object--vehicle--boat
construction--flat--road                     nature--terrain                             object--vehicle--bus
construction--flat--road-shoulder            nature--vegetation                          object--vehicle--car
construction--flat--service-lane             nature--water                               object--vehicle--caravan
construction--flat--sidewalk                 object--banner                              object--vehicle--motorcycle
construction--flat--traffic-island           object--bench                               object--vehicle--on-rails
construction--structure--bridge              object--bike-rack                           object--vehicle--other-vehicle
construction--structure--building            object--catch-basin                         object--vehicle--trailer
construction--structure--garage              object--cctv-camera                         object--vehicle--truck
construction--structure--tunnel              object--fire-hydrant                        object--vehicle--vehicle-group
human--person--individual                    object--junction-box                        object--vehicle--wheeled-slow
human--person--person-group                  object--mailbox                             object--water-valve
human--rider--bicyclist                      object--manhole                             void--car-mount
human--rider--motorcyclist                   object--parking-meter                       void--dynamic
human--rider--other-rider                    object--phone-booth                         void--ego-vehicle
marking--continuous--dashed                  object--pothole                             void--ground
marking--continuous--solid                   object--sign--advertisement                 void--static
marking--continuous--zigzag                  object--sign--ambiguous                     void--unlabeled
marking--discrete--ambiguous                 object--sign--back
marking--discrete--arrow--left               object--sign--information




                                                                76
Annex 5: Average Precision of the Bounding Box Detection and Classification
An Average Precision (AP) score closer to 100 indicates a better performance in correctly detecting and classifying an object. AP
scores equal to zero mean that no data is available.




                                                                  77
Glossary of Terms


Big Data                   Large data sets that require significant processing power and/or complex
                           computational techniques to reveal patterns, trends, and correlations.
Development Data           A partnership between international organizations and companies, created to
Partnership (DDP)          facilitate the use of third-party data in research and international development.
Deep Learning (DL)         A branch of artificial intelligence that involves creating algorithms for deep
                           artificial neural networks, inspired by the human brain, to learn complex patterns
                           from high dimensional and large quantities of data.
Fatalities and Serious     A metric of those killed or seriously injured in a traffic crash which is used to
Injuries (FSI)             monitor traffic safety performance. Fatalities are defined as those who die within
                           30 days of the crash.
Intelligent Transport      The collection, analysis, and transmission of transportation, vehicle, and
System (ITS)               infrastructure data that informs users with real-time updates and improves future
                           operations and predictions.
Internet of Things (IoT)   Devices that are connected to the internet to send and/or receive data.
Machine Learning (ML)      Method to systematically derive patterns, identify trends, and make conclusions
                           from data with minimal human intervention.
Neural Network             A set of connected algorithms typically organized in three layers: input layer,
                           hidden layer(s), and an output layer.
Overall Project Traffic    The entire traffic and road safety risk of a project that evaluates the road
and Road Safety Risk       infrastructure, vehicle operating speeds, road user behavior, vehicle standards,
(OPTRSR)                   and post-crash trauma care.
Road Crash                 The collision of a vehicle with another entity, such as a car, bicycle, stationary
                           object, pedestrian, or animal, that causes injury or damage to one or more of the
                           entities on a road or road-related area.
Road Safety                System to reduce risks to road users, preventing death or injury.
Road Safety                Systematic review of the current road or traffic scheme to identify hazardous
Assessments                areas.
Road Safety Audit (RSA) Independent, systematic evaluation of the modification or addition to the road or
                        traffic scheme to determine the crash potential and safety performance for all
                        road users.
Road Safety Impact         The safety performance ranking of planned road construction or modification
Assessment (RSIA)          design schemes and their effect on the surrounding road network.
Road Safety Observatory A regional network of government representatives that facilitates the sharing and
(RSO)                   exchange of road safety data and expertise. The World Bank operates RSOs in
                        Latin America (OISEVI), Africa (ARSO), and Asia-Pacific (APRSO).
Safe System                An approach to road safety that integrates principles for safer vehicles, safer
                           roads, and safer users to eliminate death and serious injuries.
Supervised Learning        A machine learning task using labeled data to train the model with input-output
                           pairs.
Unsupervised Learning      A machine learning technique that extracts patterns from unlabeled data. For
                           example, grouping or clustering data with similar attributes.
Vulnerable Road Users      Individuals at a higher risk using the road because they do not have the
                           protection of an enclosed vehicle, such as pedestrians, motorcyclists, bicyclists,
                           and those on animals or animal drawn carts.




                                                      78
References


Australian BITRE (Bureau of Infrastructure and Transport Research Economics). “Australian Road
Deaths Database (ARDD).” Australian BITRE. Updated May 13, 2021.
https://data.gov.au/data/dataset/australian-road-deaths-database
Bedoya Arguelles, Guadalupe, Svetoslava Petkova Milusheva, Arianna Legovini, and Sarah Elizabeth
Williams. “Smart and Safe Kenya Transport (SMARTTRANS).” Washington, DC: World Bank, 2019.
https://documents1.worldbank.org/curated/en/723411574361015073/pdf/Smart-and-Safe-Kenya-
Transport-SMARTTRANS.pdf
Bliss, Tony and Jeanne Breen. “Meeting the Management Challenges of the Decade of Action for
Road Safety.” IATSS Res. 35 (2012): 48–55. https://doi.org/10.1016/j.iatssr.2011.12.001
Bostrom, Nick and Eliezer Yudkowsky. “The Ethics of Artificial Intelligence.” In The Cambridge Hand-
book of Artificial Intelligence, edited by Keith Frankish and William M. Ramsey, 316-334. Cambridge:
Cambridge University Press, 2014.
Das, Subasish and Greg P. Griffin. “Investigating the Role of Big Data in Transportation Safety.” Trans-
portation Research Record 2674, no. 6 (2020): 244–52. https://doi.org/10.1177/0361198120918565
Diop, Makhtar. “All Road Deaths Are Preventable. We Can Make It Happen.” World Bank. Accessed
May 14, 2021.
https://blogs.worldbank.org/transport/all-road-deaths-are-preventable-we-can-make-it-happen
DT Global. “Indonesia: Establishment of Integrated Road Asset Management Systems.” Accessed
October 4, 2021. https://dt-global.com/projects/irams-dc
Google. “Google Maps, Google Earth, and Street View.” Accessed May 14, 2021.
https://about.google/brand-resource-center/products-and-services/geo-guidelines/
He, Kaiming, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. “Mask R-CNN.” 2017 IEEE Interna-
tional Conference on Computer Vision (2017): 2980-2988.
Hidalgo, Darío and Claudia Adriazola-Steil. “Bogota’s Vision Zero Road Safety Plan Is Saving Lives.”
TheCityFix. Last modified September 26, 2019. https://thecityfix.com/blog/bogotas-vision-zero-road-
safety-plan-saving-lives-dario-hidalgo-claudia-adriazola-steil/
Institute for Transportation and Development Policy. “Pune, India Wins 2020 Sustainable Transport
Award.” Last modified June 27, 2019.
https://www.itdp.org/2019/06/27/pune-india-wins-2020-sustainable-transport-award/
International Transport Forum. “Best Practice for Urban Road Safety: Case Studies.” International
Transport Forum Policy Papers, no. 76 (2020).
International Transport Forum. Zero Road Deaths and Serious Injuries: Leading a Paradigm Shift to a
Safe System. Paris: OECD Publishing, 2016. https://doi.org/10.1787/9789282108055-en
iRAP (International Road Assessment Programme). iRAP Star Rating and Investment Plan Implemen-
tation Support Guide. London: iRAP, March 2017.




                                                  79
Krambeck, Holly, Magreth Kakoko, and Mireille Raad. Using Computer Vision to Automatically Detect
Road Features for Road Safety Audits and Assessments: Inception Report. Washington, DC: World Bank,
2019.
Lovón-Melgarejo, Jesús, Alonso Tenorio-Trigoso, Manuel Castillo-Cara, and Daniel Miranda. “Identi-
fication of Risk Zones for Road Safety through Unsupervised Learning Algorithms.” In 16th LACCEI
International Multi-Conference for Engineering, Education, and Technology: Innovation in Education
and Inclusion, Lima, Peru, July 2018. http://www.laccei.org/LACCEI2018-Lima/full_papers/FP413.pdf
Milusheva, Sveta, Robert Marty, Guadalupe Bedoya, Sarah Williams, Elizabeth Resor, and Arianna
Legovini. “Applying Machine Learning and Geolocation Techniques to Social Media Data (Twitter) to
Develop a Resource for Urban Planning.” PLoS ONE 16, 2 (2021):
https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0244317
Neilson, Alex, Indratmo, Ben Daniel, Stevanus Tjandra. “Systematic Review of the Literature on
Big Data in the Transportation Domain: Concepts and Applications.” Big Data Res. 17 (2019): 35-44.
https://doi.org/10.1016/j.bdr.2019.03.001
Neuhold, G., T. Ollmann, S. R. Bulò, and P. Kontschieder. “The Mapillary Vistas Dataset for Seman-
tic Understanding of Street Scenes.” 2017 IEEE International Conference on Computer Vision (ICCV)
(2017): 5000-5009. doi: 10.1109/ICCV. 2017.534
ODI (Overseas Development Institute). “Bogotá.” ODI: Think Change. Accessed October 12, 2021.
https://odi.org/en/about/features/bogot%C3%A1/
ODPH (Open Data Philippines). “Open Data Philippines.” ODPH. Accessed June 3, 2021.
https://data.gov.ph/
OECD (Organisation for Economic Co-operation and Development) /ITF (International Transport Fo-
rum). Big Data and Transport: Understanding and Assessing Options. Paris: OECD/ITF, 2015.
https://www.itf-oecd.org/sites/default/files/docs/15cpb_bigdata_0.pdf
Omdena. “Rating Road Safety Through Machine Learning to Prevent Road Accidents.” Accessed May
28, 2021. https://omdena.com/projects/ai-road-safety/
Ospina-Mateus, Holman, Leonardo Augusto Quintana Jiménez, Francisco José López-Valdés, Natalie
Morales-Londoño, and Katherinne Salas-Navarro. “Using Data-Mining Techniques for the Prediction
of the Severity of Road Crashes in Cartagena, Colombia.” In Applied Computer Sciences in Engineering.
Edited by J. Figueroa-García, M. Duarte-González, S. Jaramillo-Isaza, A. Orjuela-Cañon, Y. Díaz-Guti-
errez, 309-20. Cham: Springer, 2019. https://doi.org/10.1007/978-3-030-31019-6_27
Silva, Philippe Barbosa, Michelle Andrade, and Sara Ferreira. “Machine Learning Applied to Road
Safety Modeling: A Systematic Literature Review.” Journal of Traffic and Transportation Engineering
7, no. 6 (2020): 775-790. https://doi.org/10.1016/j.jtte.2020.07.004
Suresh, Harini and John Guttag. “A Framework for Understanding Sources of Harm throughout the
Machine Learning Life Cycle.” In Proceedings of Equity and Access in Algorithms, Mechanisms, and
Optimization (EAAMO ‘21), Association for Computing Machinery, New York, October 2021.
https://doi.org/10.1145/3465416.3483305




                                                 80
US NHTSA (United States National Highway Traffic Safety Administration). “Data.” US NHTSA. Ac-
cessed May 28, 2021. https://www.nhtsa.gov/data
WHO (World Health Organization). Global Status Report on Road Safety 2018. Geneva: WHO, 2018.
World Bank. “Better Data for Safer Roads: The Powerful Mission of Road Safety Observatories.”
Last modified November 5, 2020. https://www.worldbank.org/en/news/video/2020/11/05/better-da-
ta-for-safer-roads-the-powerful-mission-of-road-safety-observatories
World Bank. Colombia - Programmatic Productive and Sustainable Cities Development Policy Loans. Wash-
ington, DC: World Bank, 2020. http://documents.worldbank.org/curated/en/426591583968971309/
Colombia-Programmatic-Productive-and-Sustainable-Cities-Development-Policy-Loans
World Bank. GRSF DRIVER Completion Report. Washington, DC: World Bank, 2019.
https://documents1.worldbank.org/curated/en/245151560919065747/pdf/Data-for-Road-Incident-Vi-
sualization-Evaluation-and-Reporting-Lowing-the-Barriers-to-Evidence-Based-Road-Safety-Manage-
ment-in-Resource-Constrained-Countries.pdf
World Bank. Environmental and Social Framework for IPF Operations, ESS4: Community Health and
Safety. Washington, DC: World Bank, 2018.
World Bank. Good Practice Note: Road Safety. Washington, DC: World Bank, 2019.
https://pubdocs.worldbank.org/en/648681570135612401/Good-Practice-Note-Road-Safety.pdf
World Bank. Guide for Road Safety Opportunities and Challenges: Low and Middle Income Country Pro-
files. Washington, DC: 2020. https://openknowledge.worldbank.org/handle/10986/33363
World Bank. Indonesia Public Expenditure Review 2020: Spending for Better Results. Washington, DC:
World Bank, 2020. https://openknowledge.worldbank.org/handle/10986/33954
World Bank. Innovative Road Safety Risk Assessment Tool with Automated Image Analysis Technology.
Washington, DC: World Bank, 2019.
Word Bank. Making Roads Safer. Washington, DC: World Bank, 2014.
World Bank. Mobile Metropolises: Urban Transport Matters: An IEG Evaluation of the World Bank
Group’s Support for Urban Transport. Washington, DC: World Bank, 2017.
World Bank. “Open Traffic Data to Revolutionize Transport.” Last modified December 19, 2016.
https://www.worldbank.org/en/news/feature/2016/12/19/open-traffic-data-to-revolutionize-transport
World Bank. Open Traffic: Easing Urban Congestion. Washington, DC: World Bank, n.d.
https://olc.worldbank.org/system/files/WBG_BD_CS_OpenTraffic_1.pdf
World Bank. Road Safety Indicators for Project Monitoring. Washington, DC: World Bank, 2021.
World Bank. The High Toll of Traffic Injuries: Unacceptable and Preventable. Washington, DC: World
Bank, 2017.
World Bank. Use of AI Technology to Support Data Collection for Project Preparation and Implementa-
tion: A ‘Learning-by-doing’ Process. Washington, DC: World Bank, 2021.
World Bank. World Development Report 2021: Data for Better Lives. Washington, DC: World Bank,
2021. doi:10.1596/978-1-4648-1600-0




                                                 81
Zeng, Qiang, Helai Huang, Xin Pei, S.C. Wong, and Mingyun Gao. “Rule Extraction from an Op-
timized Neural Network for Traffic Crash Frequency Modeling.” Accident Analysis & Prevention 97
(2016): 87-95. doi: 10.1016/j.aap.2016.08.017
Zhang, Min, Yang Liu, Shaohua Luo, Siyan Gao. “Research on Baidu Street View Road Crack Infor-
mation Extraction Based on Deep Learning Method.” Journal of Physics: Conference Series no. 1616
(2020). https://iopscience.iop.org/article/10.1088/1742-6596/1616/1/012086/pdf
Ziakopoulos, Apostolos and George Yannis. “Using AI for Spatial Predictions of Driver Behavior.”
(ITF) International Transport Forum Roundtable on Artificial Intelligence in Road Traffic Crash Pre-
vention. Presentation, February 2021.
https://www.nrso.ntua.gr/geyannis/conf/cp450-using-ai-for-spatial-predictions-of-driver-behavior/




                                                82
This guidance note offers a practical introduc-         While the preliminary results in Padang were en-
tion to integrating big data and machine learn-         couraging, additional data is required to verify
ing in road safety evaluations. It outlines data        the performance in a new context. However, the
requirements for several road safety assess-            workflow illustrated through these case studies
ments, provides a convenient overview of rel-           shows potential for replicability. All code for the
evant big data sources, and explains machine            Integrated Framework for Road Safety is free and
learning fundamentals for the application of            publicly available for repurposing and refining to
these advanced technologies, specifically for           local context through a link provided in the note.
road safety. The note proposes an Integrated
                                                        The framework exemplifies current capabilities
Framework for Road Safety, which takes the
                                                        to reduce the reliance on manual image anno-
reader step-by-step through a machine learning
                                                        tations and highlights the potential to conduct
workflow to evaluate road risk, using case stud-
                                                        a road safety scan without years of historical
ies in Bogotá, Colombia and Padang, Indonesia.
                                                        crash data. The increasing availability of big
The Integrated Framework for Road Safety uses           data and the growing use of machine learning
machine learning to identify road characteris-          models for road safety point to rapidly evolving
tics from street view images and predict road           technological solutions that have immense ca-
segment risk based on those identifiable char-          pacity to improve the quality and efficiency of
acteristics. As a result, road segment risk was         road safety assessments in developing coun-
predicted with 72.5 percent accuracy in Bogotá.         tries.




                                                   83