MACHINE LEARNING for D I SA S T E R R I SK MANAG EMENT A guidance note on how machine learning can be used for disaster risk management, including key definitions, case studies, and practical considerations for implementation ... MACHINE LEARNING FOR DISASTER RISK MANAGEMENT © 2018 International Bank for Reconstruction and ACKNOWLEDGMENTS Development/International Development Association or This guidance note was prepared by Vivien Deparday, Caroline Gevaert, Giuseppe Molinario Robert Soden, and Simone The World Bank Balog-Way. 1818 H Street NW Washington DC 20433 The team is grateful for discussion and feedback from www.worldbank.org Cristoph Aubrecht, Alanna Simpson, Trevor Monroe, and his team, Keith Garrett and Sabine Chandradewi Loos. This Disclaimer publication was supported by the contributors of the case This document is the product of work performed by the World studies included in this publication. We thank them for their Bank and GFDRR with external contributions. The findings, time and effort. interpretations, and conclusions expressed in this document do not necessarily reflect the views of any individual partner PHOTO CREDITS organizations of the World Bank, Global Facility for Disaster Photos have been sourced from the following locations with Reduction and Recovery (GFDRR), the Executive Directors of full rights: World Bank website the World Bank, or the governments they represent. The World Bank does not guarantee the accuracy of the data included in this work. The boundaries, colors, denomination, and other information shown in any map in this work do not imply any judgment on the part of The World Bank concerning the legal status of any territory or the endorsement or acceptance of such boundaries. Rights and Permissions The World Bank supports the free online communication and exchange of knowledge as the most effective way of ensuring that the fruits of research, economic and sector work, and development practice are made widely available, read, and built upon. It is therefore committed to open access, which, for authors, enables the widest possible dissemination of their findings and, for researchers, readers, and users, increases their ability to discover pertinent information. The material in this work is made available under a Creative Commons 3.0 By IGO License. You are encouraged to share and adapt this content for any purpose, including commercial use, as long as full attribution to this work is given. More information about this license can be obtained at: http://creativecommons.org/licenses/by/3.0/igo/ Any queries on rights and licenses, including subsidiary rights, should be addressed to the Office of the Publisher, The World Bank, 1818 H Street NW, Washington, DC 20433, USA; fax: 202-522-2422; e-mail: pubrights@worldbank.org Attributions Please cite the work as follows: GFDRR. 2018. Machine Learning for Disaster Risk Management. Washington, DC: GFDRR. License: Creative Commons Attribution CC BY 3.0. MACHINE LEARNING FOR DISASTER RISK MANAGEMENT TABLE OF CONTENTS 3 1. INTRODUCTION 6 2. A MACHINE LEARNING PRIMER 7 2.1 What Is Machine Learning? 10 2.2 Machine Learning Terminology 11 2.3 Supervised Machine Learning: Classification and Regression 12 2.4 Unsupervised Machine Learning 13 2.5 Deep Learning 14 3. APPLICATIONS AND OUTLINE OF A MACHINE LEARNING PROJECT 15 3.1 DRM Applications of Machine Learning 16 3.2 Outline of a Machine Learning Project 18 4. CONSIDERATIONS FOR IMPLEMENTING A MACHINE LEARNING PROJECT 19 4.1 Selecting Suitable Input Data 21 4.2 Evaluating Model Output 25 4.3 Expertise, Time, Infrastructure and Costs 26 4.4 Ethics: Privacy and Bias Considerations 28 5. MACHINE LEARNING IN THE COMMONS 29 5.1 Open Data 29 5.2 Open-source Software and Documented Methodology 30 5.3 Crowdsourcing and Capacity Building 31 5.4 Machine Learning for Sustainable Development: From Use Cases to Standardized Training Data 32 6. CASE STUDIES IN DISASTER RISK MANAGEMENT 34 6.1 Physical Exposure and Vulnerability 38 6.2 Social Exposure and Vulnerability 41 6.3 Risk Mapping and Damage Prediction 45 6.4 Post Disaster Event Mapping and Damage Assessment 47 7. GLOSSARY 48 8. REFERENCES AND RESOURCES 48 8.1 Online Resources 48 8.2 Videos and Talks 48 8.3 Infographics and Interactive Resources 48 8.4 Articles and Blogs 48 8.5 Conferences and Meetings 49 8.6 Challenges and Competitions 49 8.7 Other References, Articles and Textbooks 2 MACHINE LEARNING FOR DISASTER RISK MANAGEMENT 1. INTRODUCTION 3 MACHINE LEARNING FOR DISASTER RISK MANAGEMENT Photo Credit: WB/DaLA team 1. INTRODUCTION Evidence-driven disaster risk manage- ment (DRM) relies upon many different data types, information sources, and types of models to be effective. Tasks such as weather modelling, earthquake fault line rupture, or the development of dynamic urban exposure measures involve complex science and large amounts of data from a range of sources. Even experts can struggle to develop models that enable the understanding of the potential impacts of a hazard on the built environment and society. In this context, this guidance note explores how new approaches in machine learning can provide new ways of looking into these complex relationships and provide more acc- urate, efficient, and useful answers. The goal of this document is to provide a concise, demystifying ref- erence that readers, from project managers to data scientists, can use to better understand how machine learning can be applied in disaster risk management projects. There are many sources of information on this complex and evolving set of technologies. Therefore, this guidance note is aimed to be as focused as possible, providing basic information and DRM-specific case studies and directing readers to additional re- sources including online videos, info- graphics, courses, and articles for further reference. A machine learning (ML) algorithm is a type of computer program that learns to perform specific tasks based on various data inputs or rules provided by its designer. Machine learning is a subset 4 MACHINE LEARNING FOR DISASTER RISK MANAGEMENT of artificial intelligence (AI), but the two be behaving intelligently. For example, terms are often used interchangeably. self-driving cars, robotics that mimic For a thorough discussion of the and surpass human capacities, and differences and similarities of the terms supercomputers can now outperform ML and AI, see Section 2. As the name humans on specialized tasks. The implies, an ML algorithm’s purpose is to same expectation is, and should be, “learn” from previous data and output held for ML as it applies to improving a result that adds information and our capacity to accurately, efficiently, insight that was not previously known. and effectively answer pressing soc- This approach enables actions to be ietal questions. The case studies in taken on the information gathered from this guidance note range from the the data; sometimes in near real time, identification of hurricane and cyclone like suggested web search results, and damage-prone buildings to mapping sometimes with longer term human the informal settlements that house input, like many of the DRM case the most vulnerable urban populations. studies presented in this document. For the understanding of disaster risk, Over the past few decades, there machine learning applies predominantly has been an enormous increase in to methods used in the classification computational capacity and speed and or categorization of remotely sensed available sensor data, exponentially satellite, aerial, drone, and even street- increasing the volume of available level imagery, capitalizing on a large data for analysis. body of work on image recognition and classification. But applications also This has allowed the capabilities of span other types of data: from seismic ML algorithms to advance to nearly sensor data networks and building ubiquitous impact on many aspects inspection records to social media of society. posts. All the advancements made in the applications of ML can and are being Machine learning and artificial intell- used to solve bigger issues confronting igence have become household terms, humans, from making the most of our crossing from academia and specialized land to preparing for and recovering industry applications into everyday from crises. interactions with technology—from image, sound, and voice recognition features of our smartphones to seamlessly recommending items in online shopping, from mail sorting to ranking results of a search engine. The same technology is being leveraged to answer bigger questions in society, including questions about sustainable development, humanitarian assistance, and disaster risk management. When several ML algorithms work together, for example, when fed by a large quantity of physical sensors, it is possible for a computer to interact with the physical world in such a way that the computer system, or robot, appears to Photo Credit: WB/DaLA team 5 MACHINE LEARNING FOR DISASTER RISK MANAGEMENT 2. A MACHINE LEARNING PRIMER 6 MACHINE LEARNING FOR DISASTER RISK MANAGEMENT Photo Credit: Courtesy of Aziz Kountche, Africa Drone Service 2. A MACHINE LEARNING PRIMER MACHINE LEARNING AND ARTIFICIAL INTELLIGENCE Machine learning is a type of artificial intelligence. ML algorithms, some more simple and narrowly focused than others, have been a part of computer science since the late 1950s. Driven by computer vision, ML algorithms were pioneered in fields like satellite remote sensing and statistical data analysis. Now they power many different aspects of our everyday digital lives, from search engines to online shopping. Artificial intelligence (AI) was founded as an academic discipline in 1956. Although AI is often used as a synonym for machine learning, there are some major differences that need disambiguation. AI has become a catch-all term that includes all machine learning software as well as artificial general intelligence (AGI), or strong AI. AGI refers to possible, future versions of AI computers that are generalized, self-aware, and indistinguishable to humans when tested. This is not the current state of AI and not the focus of this document. 2.1 WHAT IS MACHINE LEARNING? Machine learning algorithms that To understand the nuts and bolts are trained by humans based on pre- of ML, we need to understand the existing data are called “supervised,” basic difference between the two whereas those that learn solely from approaches of supervision and how data without human input are referred they can be leveraged to obtain the to as “unsupervised.” This traditional answers that we are looking for. A list dichotomous separation is becoming of definitions of terms used throughout more and more blurred every day, as this document is found on page 10. In projects employing ML algorithms supervised ML, a user inputs a training make use of both types. Sometimes dataset (sometimes called a “labelled these methods are easily categorizable, training dataset”) that identifies such as when a project employs an correct answers and incorrect answers unsupervised ML algorithm in one to help the algorithm learn relevant step and a supervised one in another. patterns in the data. These patterns Other times, the actual ML algorithm is can be identified by categories. For hybridized. Some examples of these ML example, the machine can learn that algorithms are reinforcement learning, data A are images of cats and data B are transfer learning, Generative adversial images of chairs because the algorithm networks (GANs), semi-supervised has been trained by a user that certain learning, and so forth (see box on characteristics—whiskers and paws— page 8 for more information about indicate a cat and not a chair. Thus reinforcement learning). instead of a dataset being comprised 7 MACHINE LEARNING FOR DISASTER RISK MANAGEMENT REINFORCEMENT LEARNING Reinforcement learning is a type of machine learning that takes a page from behavioral psychology. Simply put, the training dataset and rules in an ML algorithm are not binary (yes or no) decisions, but rather they attempt to achieve a balance between data exploration and accuracy. In other words, the model is allowed to make mistakes and explore the data within certain parameters. A famous example of a hybrid system is Google’s DeepMind AI project, which relies on reinforcement learning—in this case, a hybridized artificial neural network combined with supervised learning methods. Bought by Google in 2014, DeepMind has been on the forefront of AI advancement and even developed programs that can defeat humans at complex games like Go. https://www.technologyreview.com/s/533741/best-of-2014-googles- secretive-deepmind-startup-unveils-a-neural-turing-machine/ by anonymous data A and data B, ML algorithms, we only input (x) and thanks to ML we now know that data the ML algorithm applies a series of A and B are different (cats and chairs, statistical methods to identify the best- respectively). In extending this concept fitting function (f) that splits the data to DRM, consider the identification of into a result (y). This learned function rooftops in a satellite image. The ML can sometimes be applied to completely algorithm will need a training dataset different datasets. that has both rooftops and non-rooftop areas, such as trees, identified and For example: There are three kinds of labelled. The ML algorithm will learn datasets required for ML algorithms; what characteristics are indicative of a training, validation, and testing. Training rooftop from that training dataset and datasets are used in the beginning stages can then classify the rest of the image to train the model to recognize features based on the training dataset. and patterns in the data. Validation datasets are used to determine the best In unsupervised ML algorithm, the model parameters and are used before algorithm uses statistical methods, like the testing set. Testing datasets are clustering analysis or neural networks, kept separately from the model while to attempt to group data with similar its training so that it can be used after characteristics together, such as roofs training to test the accuracy. of the same color or texture in the DRM example above. It is then up to the user In order to identify patterns in data, to add semantic information (labels) to individual pixels that make up the the data-driven results. In unsupervised image are analyzed in a type of analysis ML, many other separations in the data called “image analysis”. Object-based might be discovered; not just group (A) image analysis (OBIA), which has also and (B), but possibly (C), (D), and (E), proven its usefulness over the years etc. Therefore, this is also understood as well, organizes neighboring pixels as an exploratory tool in which the into meaningful groups. In both types user does not always know what can of analyses a pixel can be described by be learned from the ML algorithm. If color, texture, or other raster geographic we extend the analogy of a function information system (GIS) information here as well, if y = f(x), in unsupervised such as elevation or temperature. 0 8 MACHINE LEARNING FOR DISASTER RISK MANAGEMENT Photo Credit: World Bank In OBIA, samples can also be described Once you identify the characteristics/ by their area, shape, or orientation. features of data that best explain the data (explanatory variables), you can Although recent developments are then use both types of methods to delivering very powerful ML algorithms, identify relevant patterns. it is important to remember that a model is only as good as the data used On top of different types of ML to train it. First, the categories of data algorithms, there are also a fair should be distinguishable according to amount of ambiguous terms like the features provided. Also, the training artificial intelligence, machine learning, dataset should be representative of the big data, and deep learning, among variability in features of the specific others. These have become somewhat group of data. That is to say, if the interchangeable in the vernacular of target class is a building, the training development agencies, technology data should include examples of the service vendors, and mainstream media variety of building appearances. alike. This document attempts to demystify these terms. It is important to note that training sets for ML algorithms can be geographically biased, and it is important to ensure geographic diversity for the training set. For example, buildings tend to appear differently in European cities than African cities. If an ML algorithm is trained using examples from one region, it will likely perform worse on data from a different region where objects appear differently. Such diversity should be taken into account when putting together the training dataset. (See table on page 10 for more details.) 9 MACHINE LEARNING FOR DISASTER RISK MANAGEMENT 2.2 MACHINE LEARNING TERMINOLOGY Term Used in Alternative Terms Definition Document Despite the plethora of available ML algorithms and the even greater number Feature Attribute, Characteristics used to describe of methods in libraries that can be variables, the data samples and predict the developed into customized models, the dimensions output; not to be confused with general process for an ML algorithm is a “feature” in GIS, which refers to the same. a physical “object” with specific attributes A thorough list of references on ML is available in section 8: References and Output Predicted variable, Phenomenon you want the model Resources. variable target variable to predict; the desired output of the model Sample Reference data Set of samples with known features and class labels. This set of labelled samples should be divided into training, validation, and testing. Training Labelled samples used to train dataset the model, i.e., learn the relevant patterns in the features which are relevant to predict the output variable Validation Cross-validation Labelled samples to validate the dataset set model before the testing set; used to help determine the best model parameters Testing dataset A set of labelled “unseen” training samples which are used to determine model accuracy; cannot be included in the model training Cluster Group of samples which are grouped together based on similarities identified by an unsupervised algorithm Class Group, bin, A class, or group, is the result of categories the splitting of a dataset into two or more groups of data that share some common characteristic. The term “class” is most commonly used in supervised ML algorithms, for example in satellite remote sensing, where features like “rooftops” may be split into class A, and features like “vegetation” may be split in class B and so forth. 10 MACHINE LEARNING FOR DISASTER RISK MANAGEMENT 2.3 SUPERVISED MACHINE LEARNING: CLASSIFICATION AND REGRESSION Supervised learning can be divided amount of green space, population into classification and regression density, or other traits. (See case study problems. In classification, the inten- 6.2.1. Sri Lanka Poverty Mapping). ded output is a semantic label or class. Tighter sentence: For instance in There are many different types of flood mapping classification problems supervised ML algorithms, which would label each pixel in an image as sometimes have fundamentally different “flooded” or “not flooded (see case architectures. The most common class- study 6.4.1 Flood Mapping). Similarly, ification algorithm is logistic regression, cyclone damage assessments may while the most common regression classify buildings suffering from “mild,” algorithm is linear regression. Some “medium,” or “severe” damage (see of the most well-known classification case study 6.4.3 Cyclone Damage algorithms are random forests, gradient Assessment). Regression problems aim boosting, support vector machines to predict a continuous variable, such (SVMs), naive, and gradient boosting as predicting the poverty rate for Bayes networks (see box below). each administrative unit based on Random forests and SVMs can also be characteristics such as type of buildings, adapted to regression problems. Name Description Advantages Disadvantages Random forest A group of decision trees. • Less susceptible to noise • A decision tree’s disadvantage is Each tree is a hierarchy • Can handle large numbers high variance in its results, however of decisions which divide of training samples random forests solve this problem samples into two groups by averaging many trees. The depending on the value of a drawback: as you average many single feature at a time decision trees, it might be hard to interpret the results • Slower than other methods in the testing phase Gradient boosting Similar to random forests, • Studies suggest it can • It is more challenging to train the but trains each tree be more accurate than algorithm sequentially. The samples random forests which have the highest uncertainty according to the results of the previous iteration are prioritized Support vector Uses kernel functions (a • More suitable for • Computational complexity when machine class of algorithms used situations with limited there is a large training set for pattern analysis) to reference points • Sensitive to noisy data describe the non linear • Can easily handle large differences between numbers of input features training samples • Can learn non-linear relations between features Naive Bayes A graphical model • Simple to implement • Assumes all features to be describing the probabilistic • Scales easily independent from each other, which relations between feature • Feature importance is easy is often not the case in real-world values and class labels to interpret applications 11 MACHINE LEARNING FOR DISASTER RISK MANAGEMENT 2.4 UNSUPERVISED MACHINE LEARNING In unsupervised ML, the machine takes put together), but also means that the an input dataset and applies a series of patterns identified by the ML algorithm mathematical and statistical models may or may not be useful for the user. to identify patterns, without the user Due to this uncertainty and the difficulty providing labelled training data. One of understanding the performance of the most common applications is of unsupervised ML algorithms, they clustering, where samples are grouped are often used for data discovery and based on similarity. Other applications exploration. include dimensionality reduction and anomaly detection to reduce variance Often, the results of unsupervised ML in a dataset and filter it for outliers. algorithms are fed into supervised ML algorithms, where human input and Unsupervised methods are purely experience can help a dataset reach driven by the patterns in the data. The its targeted accuracy more quickly. patterns are based on the statistical There are three types of unsupervised characteristics of the input samples. machine learning, as described in the This means that the user doesn’t box below: K-means clustering, principal need to provide labelled training sets component analysis and t-SNE. (which can be costly and difficult to Name Description Advantages Disadvantages K-means clustering A clustering technique • Simple implementation, • User must define number of which iteratively calculates performs well classes the “average value” (e.g., • Distance metric can be centroid) of each cluster and defined by the user cluster icon assigns each sample to the nearest cluster Principal component Transforms the data to • Can be used to retain • Resulting features are analysis features which maximize the relevant information difficult to interpret the variance (differences) while decreasing data between samples dimensionality t-SNE Non linear data • Helps understand patterns by • Sensitive to hyperparameters dimensionality reduction visualizing similar groups • Computational complexity technique suitable for • Captures complex similarities visualization purposes 12 MACHINE LEARNING FOR DISASTER RISK MANAGEMENT 2.5 DEEP LEARNING Artificial neural networks are also making them useful for tasks involving commonly referred to as deep learning. remotely sensed imagery and/or Neural nets, as they are called for short, other spatial data. The more recent work with several hidden layers that are fully convolutional neural networks nested between the inputs and outputs (FCNs) are especially relevant for and are connected to each other through spatial applications, as they are connections that resemble neurons more efficient for processing large in a brain. These neurons all have scenes. New networks and models mathematical formulas that optimize are continuously being developed for the accuracy of the categorization, various applications, many of which most notably using a method called are available as open-source libraries. backpropagation. Backpropagation is However, they require much more short for the backwards propagation of training data and have significantly errors. It is a method used to calculate higher computational requirements the gradients between optimal values than the other methods. It is therefore (weights) in the “neurons”. The term important to consider the complexity “deep learning” comes from the fact and available resources of the that these hidden layers can be nested classification problem when choosing a upon other hidden layers to some depth, suitable algorithm. but has nothing to do with the actual “depth” of the content. In other words, In fact, supervised decision tree deep learning methods can be just as algorithms can be visualized and shallow as other ML methods. Deep explained in terms of a two-dimensional learning can be applied to supervised neural network. While adding nodes, and unsupervised ML tasks. or “decisions,” to a decision makes it deeper (more hierarchical decisions), Recently, deep learning has gained the power of deep learning is that it much popularity as it is capable of can apply a number of hidden layers of obtaining unprecedented accuracies nodes (decisions) that make the neural for large ML algorithm problems. network wider and more intricate, Convolutional neural networks were effectively adding more and more developed for image classification, dimensions/layers. 13 MACHINE LEARNING FOR DISASTER RISK MANAGEMENT 3. APPLICATIONS AND OUTLINE OF A MACHINE LEARNING PROJECT 14 MACHINE LEARNING FOR DISASTER RISK MANAGEMENT Photo Credit: World Bank 3. APPLICATIONS AND OUTLINE OF A MACHINE LEARNING PROJECT 3.1 DRM APPLICATIONS OF MACHINE LEARNING As ML approaches are proliferating (whether physical, like weather stations or recovery phases. For instance, in all fields of expertise, DRM is no or earthquake stations, or remote, such prioritization of building inspections exception—new applications are being as satellites) and other geophysical using previous building inspection developed every day. They are developed characteristics to predict hazard output records and their outcomes, social to improve the different components (see case study 6.4.3 Wildfire Prediction). media mining for response awareness of risk modelling such as exposure, and resource prioritization,1 monitoring vulnerability, hazard, and risks, but also Another approach involves looking of rebuilding and recovery activities for prioritization of resources during at the impact of the hazard on the using computer vision or on-site disaster response and reconstruction. exposure data, or in other words the pictures to control quality, supporting risk. To do so, data is gathered on the insurance claims using computer vision A number of early applications have exposure (see section above), and the to identify crop or building damage been looking at better understanding damage prediction algorithm is trained from pictures, and many others. exposure to disasters from the physical using the impact of past events. Next, side (see case study 6.1 Exposure and it infers and identifies the key aspects Physical Vulnerability) as well as from of exposure that have an influence on the socioeconomic side (see case study the disaster impact (see case studies 6.2 Social Exposure and Vulnerability). 6.3.1 and 6.3.2 Flood Damage Prediction These types of applications have relied and Machine Learning-Powered Seismic mainly on the analysis of satellite Resilience for San Francisco). Once imagery characteristics (in the visible trained, those algorithms can be wavelengths of the electromagnetic used to predict damage in other cities spectrum as well as radar and LiDAR), or countries. often coupled with the addition of georeferenced census data. Newer Post-disaster event mapping and applications are starting to also damage assessment are also emerging leverage computer vision approaches to as key applications. Although difficult identify vulnerabilities from street view using optical data from satellites, some images (see case study 6.1.1 Guatemala approaches are using higher-resolution City Building Earthquake Vulnerability). optical imagery from Unmanned Aerial In the near future, by combining all Vehicles (UAVs) (see case study 6.4.2 these approaches and data sources, we Cyclone Damage Assessment), while can imagine having a detailed exposure others use more complex data that are database at scale that can be updated difficult for humans to interpret but are any time new imagery is available. simple for machines to sift through to identify new relations, like radar data The traditional modelling of hazards, (see case studies 6.4.1 Flood Extent such as earthquakes, wildfires, and Mapping and 6.4.2 Cyclone Damage weather forecasts, is also being Assessment). augmented by ML approaches. This application uses time coded data from Other new applications involve pri- hundreds or thousands of sensors oritizing resources during response 1https://www.floodtags.com/ 15 MACHINE LEARNING FOR DISASTER RISK MANAGEMENT 3.2 OUTLINE OF A MACHINE LEARNING PROJECT This section provides a brief overview of the general steps which must be followed to set up an ML project. The next section will describe how to prepare the inputs for the problem and evaluate the quality of the ML algorithm results and required project resources in more detail. 1. Project goals are defined: What do we want the ML algorithm to predict or classify? The objective of the DRM project should be translated to the output variable that is targeted. For example, ML can support a poverty mapping project by estimating a poverty index (see case studies 6.2.1 and 6.2.3). Building vulnerability can be translated into classifying the type of roof material used. More examples of how DRM objectives can be translated into ML projects are given in the case study section below. 2. Data/imagery sources: This obviously also depends on the objective. ML algorithms have been used for decades on satellite remote sensing imagery of many different kinds and resolutions. Currently, work in the DRM sector often involves using high-resolution (sub-meter spatial accuracy to 10 m or so) panchromatic and multispectral imagery from satellites, drones, and airplanes. However, as discussed above, ML algorithms can be applied to data of all kinds, so big data sources that are actively mined can come from household surveys (see case study 6.3.1 Flood Damage Prediction), census data (see case study 6.2.3 Stanford Poverty Study), social media (see case study 6.4.1 Flood Extent Mapping), tweets, and cell phone locations, to name a few. 3. Training/validation data collection: Labelled samples or reference data are required to train the model and validate the ML algorithm outputs. Projects using high-resolution satellite imagery often manually create this data. If the goal is to map roofs in satellite imagery, then a “training dataset” is manually drawn so that there is an input dataset for the ML algorithm that teaches it what roofs look like. Crowdsourcing can be used to speed up this process (see case study 6.4.2 Cyclone Damage Assessment). Field data are another important source of reference data, such as using household surveys to validate the poverty level (see case study 6.2.3 Stanford Poverty Study). The collection of these labelled samples is often the most expensive part of ML projects. 4. Exploration of dataset: Exploratory data analysis is an important step, as it helps determine which algorithm to use and the best data to include. This analysis also clarifies which input variables are correlated with each other, what is most closely related with the output variable, the distributions of variables, or even whether you can combine/transform input variables. This step also cleans the data of outliers, which could otherwise skew the results dramatically by altering the variance in the data in disproportionate ways. 16 MACHINE LEARNING FOR DISASTER RISK MANAGEMENT 5. Choice of algorithm: When choosing an algorithm, there is no silver bullet or one-size- fits-all solution. The best way to decide is to analyze which algorithms have been used abstract data to tackle similar problems in the past. The choice of algorithm may also depend on the model icon size of the training set, number of features, and computational resources available. Sometimes, multiple models are applied and the best performing one is selected; however, it is important to understand and compare models that have been tuned optimally so that the comparison is actually assessing the model effectiveness and is not biased by the parametrization behind it. If the review or application of various models is too time consuming, then the support of specialized experts should be sought in order to start on the right foot. For example, Task Team Leaders (project managers) who have no background in ML, statistics, or computer science should seek the advice of data science experts at the beginning of the project. 6. Developing the code and running the algorithm: Some ML algorithms are already developed in a programming language and available in user interfaces, ... such as the image classification algorithms available through the ENVI software for remote sensing, through the Google Earth Engine, or DigitalGlobe online platform GBDX for cloud-computing remote sensing image classification. There are a number of readily available ML algorithms inside remote sensing and GIS software packages, some of which are free—like the GRASS GIS plug in for QGIS. In addition, any number of ML algorithms from open or proprietary libraries can be combined and customized to achieve any project’s goals. In custom applications, ML algorithms can be programmed in a variety of programming languages and tools that range from R, Matlab, and ESRI arcPy to GDAL and GRASS. Increasingly open-source platforms like TensorFlow2 have matured and remote sensing-specific ML tools like Mapbox’s RoboSat are openly available on Github.3 On top of that, a number of customized ML services are available on cloud computing platforms like AWS, Azure, and Google Cloud services. In fact, some of the WB projects showcased in this document have been run on these platforms. 7. Validation, reinforcement, and re-running: Any ML algorithm produces an output that needs to be validated for accuracy. This is usually achieved by comparing the output data to a validation dataset that is considered the “truth,” or accurate within a range that is acceptable for the project’s goals. For example, a map of all the roofs in an image drawn by human photo interpreters can be compared to the ML algorithm output to assess its accuracy. Modifying the training dataset and parameters needed to run the algorithm might yield more accurate results, so the intermediate results are used to rerun the ML algorithm with the goal of increasing accuracy. Section 4.2 discusses how to assess the model’s accuracy in more detail. 8. Final data output: The final data output is achieved once the accuracy of the output dataset is deemed adequate for the goal of the project. The final output accuracy needed can differ greatly, and there’s no quick rule of thumb. A final accuracy of 50%, for example, means that the model is no better than random chance at predicting the variable of interest, which means the model is useless. The concept of accuracy is different and possibly ill posed when talking about unsupervised ML algorithms, where the goal might be data discovery. However, in unsupervised cases, something can be learned from output data even if the ML algorithm does not give an accurate classification or explanation of the final variable. 2https://www.tensorflow.org 17 MACHINE LEARNING FOR DISASTER RISK MANAGEMENT 3https://github.com/mapbox/robosat 4. CONSIDERATIONS FOR IMPLEMENTING A MACHINE LEARNING PROJECT 18 MACHINE LEARNING FOR DISASTER RISK MANAGEMENT Photo Credit: World Bank 4. CONSIDERATIONS FOR IMPLEMENTING A MACHINE LEARNING PROJECT There are several issues that need to be considered when planning an ML project. We have divided these into the subsections below: selecting suitable input data; evaluating model output; expertise, time, infrastructure, and costs; and ethics: privacy and bias considerations. 4.1 SELECTING SUITABLE INPUT DATA Once the project goal is defined, the first step is to do a quick data inventory. Which data do I have that might help me predict my output variable? If I need to find additional data, which characteristics should I take into account when selecting suitable input data? It is good to think about the relevance abstract data of the dataset for your proposed goal illustration and how different datasets provide different pieces of information about the problem. A number of open data sources are available, such as NASA and ESA satellite imagery, and derived geospatial products, such as OpenStreetMap, OpenAerialMap, World Bank Open Data, UNdata, the GEOSS Direct and indirect relations, and portal, and the Humanitarian Data data best left out Exchange. The input data can be directly, indirectly, or not related to the output In the age of “big data,” more is not goal. One example of direct relations always better. ML can combine many is identifying buildings in submeter different types of data, but adding satellite imagery. Here, roofs are visible irrelevant data may incur additional in the satellite imagery, and one can costs without improving the model easily assume that there is generally a predictions. At the same time, building below a roof. ML cannot magically obtain good predictions if the input data does not An indirect relation means that the adequately relate to the targeted information captured by the input data output. In general, it is good to check is somehow related to the output goal. which similar data studies or projects For example, one can try to identify that have been used. To help select informal areas or poverty from relevant data, the following section satellite imagery. Buildings in informal provides an overview of some of the settlements or poorer neighborhoods different data characteristics that can generally look different (low rise, be taken into account to guide the corrugated iron roofs, very narrow selection of suitable input data. footpaths) from buildings in planned 19 MACHINE LEARNING FOR DISASTER RISK MANAGEMENT areas (often larger, more regularly spaced, gridded road networks). See case study 6.2.2. Informal Settlement Mapping for an overview of visual differences between formal and informal areas. However, it is important to remember that the physical representation of the buildings in the imagery does not actually have a direct link to the income of the building’s inhabitants. The buildings in the imagery (input data) are therefore indirectly related to the poverty index (goal). Complex ML algorithms are capable of combining many sources of input data which are indirectly related Photo Credit: World Bank to the objective to create reasonable predictions. However, it is important Spectral resolution has to do with which to remember that these relations are wavelengths of the electromagnetic not always causal, but may simply spectrum (which “colors,” when talking show a correlation. Especially when about the visible wavelengths) are predicting socioeconomic variables, the observed by the imagery. Many cameras relationships between the input data capture the same part of the spectrum and targeted output may vary strongly as the human eye, often referred to as from one location to another. RGB, or Red-Green-Blue. Other sensors can observe parts of the spectrum Finally, some data is best left out which the human eye cannot see, of the analysis, even though it may but which can be very relevant and be available. Data which are weakly useful for a wide range of applications. related or inaccurate (i.e., “noisy” data) For example, multispectral imagery should also be left out or otherwise containing Near-Infrared reflectance emphasized by the model. These (NIR) is useful for discerning vegetation, issues can all be determined in the and hyperspectral imagery is often used exploratory data analysis phase. Each to identify different types of minerals additional type of input data makes for geological applications. the ML model more complex, so, adding irrelevant or low-quality data introduces Also important is the spatial resolution. unnecessary data-processing costs and This defines the real-world size of may even lower the quality of the output each pixel in the image on the ground. predictions. A higher resolution means that the image pixel represents a smaller area Image characteristics on the ground. This allows smaller In more traditional Earth observation or objects to be identified in the images. remote sensing applications, there are a The resolution should be high enough number of image characteristics which to identify the object that is needed. are relevant when selecting datasets However, it should be taken into to use. Temporal resolution indicates account that higher resolution imagery the frequency with which images are also means larger file sizes and more captured over the same area. The timing computational complexity. Depending of the data collected may also influence on the application of the ML project its suitability. For example, imagery and unit of analysis, it may not always collected during the winter or dry season be necessary to select the data with the may not be suitable for agricultural highest spatial resolution (or temporal monitoring applications. or spectral, for that matter). 20 MACHINE LEARNING FOR DISASTER RISK MANAGEMENT IMAGE CHARACTERISTICS Temporal resolution: How often is the dataset acquired of the same area? Spectral resolution: Which parts of the electromagnetic spectrum (essentially, which “colors”) are captured? Spatial resolution: What is the actual size of each pixel on the ground? Geographic coverage: What is the area over which the imagery is acquired? Temporal coverage: What is the total timespan of the archive of the imagery/data available? Context: Certain types of sensors are only adequate within certain physical contexts. For example, radar data that is valuable for building detection can be problematic in stony, hilly areas. Other data a vector file showing the administrative Sometimes, the output variable may boundaries, giving the census data be predicted more accurately with the spatial dimensions, and enabling it to support of additional data sources. be combined with imagery. Objects in urban settings often have many different colors and textures, When an ML algorithm involves many so the addition of elevation or LiDAR features, unexpected patterns can end information may be quite useful. Radar up being the most important. Therefore, imagery can be useful for identifying experimenting with combining multiple changes in surfaces or obtaining features can be one of the most crucial data despite cloudy conditions. steps of feature engineering. Socioeconomic studies may include census surveys aggregated at the 4.2 EVALUATING MODEL administrative unit level. Terrestrial OUTPUT or street-view imagery can be used Training, validation, and testing data to provide information which cannot The division of the data into training, be seen from above, such as building validation, and testing sets is key to wall material. Recently, social media evaluating the performance of an ML information is also included, such algorithm. The training set is used to as using tweets or crowdsourced teach the model to distinguish the geotagged images to identify flooded classes we wish to predict. Each ML areas (see case study 6.4.1 Flood Extent algorithm requires a number of model Mapping). Tabular data, such as results parameters to be set. By checking the from household surveys, can be used accuracy of the trained model on the for assessing flood damage (see case validation set, we can compare the study 6.3.1. Flood Damage Prediction). different model parameter settings and Again, a good starting point is to look choose the best ones for our particular at similar projects and find out which problem. The third group is the testing data they have used. set. This set should not be touched during model development and is only When including other data, it is used at the end to check the accuracy of important to link the unit of analysis. the final model output. In some cases, ML algorithms can be applied to pixels, the testing dataset is actually a new vectors (such as building footprints), or dataset, such as in a case where you want samples. When integrating data from to apply a previously developed model to different sources, they should all be a new region. linked back to the same unit of analysis if we wish to use them in the same ML algorithm. For example, census data per administrative unit can be linked to 21 MACHINE LEARNING FOR DISASTER RISK MANAGEMENT There is no specific rule regarding problems, a confusion or error matrix how to divide the reference data into can be used to show the relationship these training, validation, and testing between the number of samples per datasets. One rule of thumb is to class in the reference data and how randomly allocate 50%, 25%, and 25% they are classified by the output data. of the data to each set, respectively. The In general, the overall accuracy of an exact ratios may differ. Benchmarks to algorithm is calculated by dividing the compare algorithms often require users total number of correctly classified to submit their model results for a set samples by the total number of samples. of data for which they are not given the reference labels. An algorithm’s precision (i.e., correct- ness or user’s accuracy) is the number Not only the quantity, but also the of true positives divided by the sum of heterogeneity of the training samples true positives and false positives per are important for ML algorithms. class. This describes the probability However, a tipping point can be reached that a pixel is classified as part of the where too much data heterogeneity correct class. An algorithm’s recall (i.e., leads to unpredictable results. Likewise, completeness or producer’s accuracy) a feature in one geographic region is the number of true positives divided can resemble a completely different by the sum of true positives and false feature in another geographic context, negatives. This number tells us the so it’s often necessary to have different probability of a pixel being correctly models for different areas, even when classified. Both are important because the same output results are being they can indicate whether the class is targeted. Flood damage prediction being overpredicted or underpredicted. models have been shown to obtain Regression problems may often use higher accuracies when trained using the mean average error or root mean flood events of various magnitudes and square error as accuracy metrics. geographical locations (see case study 6.3.1. Flood Damage Prediction). In some cases, it is not possible to obtain a quantitative error metric for the Deep learning models aiming to model. The “true” value may simply not assess cyclone damage to buildings exist, such as for unsupervised clustering had a significantly lower accuracy ML algorithms. Visual interpretation when applied to images of a different can be used to evaluate the output of geographical region (see case study clustering methods to decide whether 6.4.2. Cyclone Damage Assessment). the algorithm generates meaningful Ideally, similar quantities of samples clusters. should be available for the different classes. “Negative” examples are also It is more common, however, that the important to include. For example, true value is simply not known, and when training a classifier to recognize so alternative data sources may be roofs, it can be essential to also used to validate a model. For example, collect a second dataset that contains geotagged crowdsourced images can “everything but roofs” so that the be used to validate the flood extent ML algorithm can learn with higher an ML algorithm generated from accuracy to separate roofs from satellite imagery (see case study 6.4.1. everything else in the imagery. Flood Extent Mapping). Accuracy metrics The accuracy of an ML algorithm can be described by a number of different quality metrics. For classification 22 MACHINE LEARNING FOR DISASTER RISK MANAGEMENT Interpreting model results samples to easily increase the variation As a user, you can also get an idea of in training data. Another option is what is happening in the model by to reduce the model complexity by comparing the accuracies that are changing the model parameters. obtained for the training, validation, and testing datasets (see table on page It’s important to note that some ML 24). If the training error is high, then algorithms, especially deep learning consider getting additional data. This ones, do not give us an idea of could mean obtaining more training which input variables are important, samples or, perhaps, a different type or which relationships between of data which is more capable of variables led to a specific outcome. distinguishing between the different On the contrary, when using ordinary classes. You can also try a different least squares linear regression or classification algorithm or use the decision trees, for example, it is clear validation set to select the best model which features best explain a specific parameters. output of the model. If the training error is low, but the If the training and validation errors validation error is high, you could be are low, but the testing error is high, overfitting the model. Overfitting then there may be a bias in the training happens when the model is too samples. That is to say that the training complex for the input training data. samples are not representative of If limited training data are available, the testing dataset. This may be the a very complex model might actually case when applying a model which be learning which class should be has been developed for one project assigned to each individual sample to a different project. For example, a rather than learning the underlying building detector in the Netherlands patterns which distinguish the different may not function well for a city in classes. To avoid overfitting, try getting Africa because the buildings may look more training data. You can do this by quite different. If this is the case, then collecting more reference data from consider obtaining more representative external sources or introducing slight training data, or start the process variations into the training data you from scratch by dividing your new already have. Deep learning algorithms, data into new training, validation, for example, may rotate or flip input and testing sets. 23 MACHINE LEARNING FOR DISASTER RISK MANAGEMENT A simplified overview of how to use training, validation, and testing errors to understand machine learning outputs. Error Error Error Iterations Iterations Iterations Scenario: Scenario: Scenario: High training error Low training error, but high Low training and validation validation error errors, but high testing error Problem Problem Problem Initial model isn’t suitable Overfitting Training bias Possible solution Possible solution Possible solution • Get more input data (more • Obtain more training data • Obtain more representative samples or complementary • Reduce model complexity training data data) • Retrain model (if applying to a • Change the model or model new project) parameters (for deep learning, • Sometimes necessary to use an train longer) entirely different model. It should be emphasized that the overview in the table above is a simplification of the process. Although it gives a general overview of the main issues, the possible problems and solutions are, of course, much more nuanced than the table demonstrates. However, as a nonexpert, it is important to remember that the testing dataset which is used to describe the model accuracy should not be used to train the model. Having insight to the accuracies of the training, validation, and testing sets can help understand whether the model is accurate and which steps can be taken to improve it. 24 MACHINE LEARNING FOR DISASTER RISK MANAGEMENT 4.3 EXPERTISE, TIME, INFRASTRUCTURE AND COSTS The hardware and software needs of projects using ML algorithms on big data vary widely. A small project or prototype can be envisioned using free software, minimal coding, and WB information technology (IT) infrastructure, but larger and more articulate projects require considerable expertise and IT infrastructure. Projects that require more expertise in coding, parameter tuning, etc., will inevitably incur larger costs and time frames. Several factors may impact the cost of an ML project. Training dataset: Does the training data already exist? Or does it have to be created manually? How much training data is needed for the algorithm to be trained? In most projects, particularly in areas where the data may be scarce, the creation of a required representative training dataset may be one of the main drivers of cost for the project, as it can involve intense manual labor. In other cases, it may be readily available. This can easily be the most important and expensive part of the project, as the model, and any result that comes from it, is inherently tied to the quality of the data that is input. An old adage goes: “garbage in, garbage out.” In other words, a model is only as good as its input data. Creating a relational database of the input features with their labels will also take time, and depending on several factors, those databases could simply be a file on a computer or a networked, cloud-based item. Imagery: When using imagery, can you rely on openly available imagery? Or do you need commercial higher resolution imagery? The latter case may involve buying large, expensive swaths of imagery, or at least paying for on-demand access to the imagery to run the algorithm within platforms like GDBX, Google Earth Engine, Descartes Lab, Orbital Insight, or Airbus OneAtlas. Resources such as GloVIS of USGS/NASA and EO Browser of the European Space Agency have been and are instrumental in accessing earth observation data, but they still need to be downloaded. Algorithm: Do you need a new algorithm, or can an existing one be tweaked and trained to fit your goals? More and more satellite imagery segmentation and recognition may be available out of the box, whether in an open-source format, or for a fee—but newer and more advanced applications may require more extensive work to develop new algorithms or combine existing ones. The time it takes to tune an algorithm’s parameters varies by case. Processing resources: Depending on the amount of data, the size of the area of interest, the type of data, and the algorithm, the resources necessary to process a project can vary. Some can run on a laptop or desktop computer with a good graphics processing unit (GPU), while others require the storage and computing capacity of a server. Others benefit from deployment in large, pay-per-use cloud computing services. Resources such as Google Earth Engine have pioneered and fundamentally changed the way that the processing of Earth imagery can be done by employing the power of the cloud, bypassing many time-consuming and expensive steps in data downloading, archiving, preprocessing, and processing, not to mention keeping archives of imagery updated for recurring tasks. 25 MACHINE LEARNING FOR DISASTER RISK MANAGEMENT 4.4 ETHICS: PRIVACY AND BIAS CONSIDERATIONS PRIVACY In terms of privacy, ML poses threats at applied online in cloud environments. different levels, first due to the amount Specifically in DRM, ML can pose and details of the data handled that privacy risks as the volume of data can be private or high resolution, and increases and the spatial resolution also due to the predictive power of of imagery used, such as drone the ML algorithm applied to those data, increases. It is easy to detect large amounts of data. For instance, individuals in drone imagery. In street- ML can identify individuals better than level imagery, faces are discernible, as humans can and can do it at scale. But well as potential building attributes more unexpectedly, they can also reveal that may pose security risks. Again, things about people that they may not the discussion here is more about know, or that their immediate circles the privacy concerns of acquiring, may not know. storing, and sharing personal data than ML, per se. Certainly, suitable In everyday life, large amounts of privacy guidelines should be followed data are potentially mined and used according to the type of input data according to contracts and conditions utilized. For example, the ethical usage that we enter when using personal of drones for development applications devices such as smartphones, with their is discussed in a separate World Bank multitude of sensors, or participating Group guidance note. in online activities—be it simply e-mail or using social networks. ML For these reasons, as DRM projects methods are used by companies and employ ML to create and use data organizations to manipulate our user from remote sensing as well as other data and provide additional services. sources, it is important to note that For example, our social network feed this data can hold private information learns from our activities what to show and, therefore, should be adequately next and which ads we are most likely dealt with. In addition, the concept of to click on, and our favorite online privacy varies widely over regions and retailer learns from our tastes to offer social groups, so global best practices us other items we may like. In this and standards should always be sense, privacy is a data concern and a supplemented according to the specific sharing concern which simply extends project in question. to ML because it uses data and is often 26 MACHINE LEARNING FOR DISASTER RISK MANAGEMENT BIAS All data has biases. All models are To alleviate those issues, algorithmic incomplete, as they are approximate accountability and algorithmic trans- representations of the real world. Paying parency are two principles that address attention to these biases is necessary the degree to which the results of for both improving ML approaches an ML algorithm can be understood. and using their results responsibly. Especially when ML algorithms result Even significant societal biases like in concrete decisions (e.g., insurance racism, sexism, or economic bias have rates on houses, the prioritization of been shown to affect algorithmic investments, or protection measures), modelling. Especially in the case of it is important that the public can DRM, these biases can have important understand why they qualify or do repercussions if they are the basis on not qualify for a certain subsidy or which the vulnerability of populations policy. Similarly, if the driving factors is assessed. behind ML models are understood, one should understand that by making the It is important to understand that ML results of an algorithm public, they algorithms are not bias-free, because may unintentionally be publicizing in some cases, like deep learning, underlying factors which are more they obtain results without human sensitive from a privacy point of view. interaction. It is crucial, therefore, to have diverse training datasets and to For a very thorough collection of keep in mind whether the reality being resources on this topic please see this modelled is from a data-poor area with article: Toward ethical, transparent geo-scarce information, and what the and fair AI/ML: a critical reading list at connection is with underrepresented and vulnerable populations and dev- https://medium.com/@eirinimalliaraki/toward- ethical-transparent-and-fair-ai-ml-a-critical- elopment goals. Disasters impact reading-list-d950e70a70ea vulnerable groups disproportionately, and any bias involving the information of characteristics of these groups can have a big impact. 27 MACHINE LEARNING FOR DISASTER RISK MANAGEMENT 5. MACHINE LEARNING IN THE COMMONS 28 MACHINE LEARNING FOR DISASTER RISK MANAGEMENT Photo Credit: WB/DaLA team 5. MACHINE LEARNING IN THE COMMONS As promoted by the Principles for Digital Development4 and the Open Data for Resilience principles,5 using open innovation through the use of open standards, open data, and open-source software can greatly benefit sustainable development. Depending on the context and needs of a project, some data may have privacy issues, or the extent to which open-source is used may be different. But, overall, embracing open innovation can greatly increase the use of public resources by avoiding duplication, fostering education, creating new knowledge, and providing opportunities by empowering individuals worldwide with open data and tools. These principles also apply to the development and use of ML algorithms. Throughout the process of ML projects, several aspects can benefit from being documented and shared: the training data, the algorithm, the methodology, and the output data as described in the following sections. They can come together for open ML approaches to support sustainable disaster risk management and development goals. 5.1 OPEN DATA Open data are data that are technically and training datasets across projects and legally open and shared in a and the industry. On the other end machine-readable format using open of the process, the output—the data standards, with proper documentation generated by the ML algorithm should on origin and quality, as well as with also be shared—as it can be critical clear licenses that allow for reuse of for many development decisions. the information. This final point is often neglected, but it can be crucial when it For instance, base exposure datasets, comes to using data in an emergency such as building infrastructure and situation with little time to figure out roads, can be used by many sectors licensing issues. Open data provide for many different decisions. In this several benefits: data can be created context, it is again important to use once and be reused many times for best practices to share the output data multiple purposes, ensuring economy using open standards, documentation, of scale, avoiding duplication, and and proper licensing. maximizing the use of resources. 5.2 OPEN-SOURCE SOFTWARE In the context of ML, training data AND DOCUMENTED should be shared using open standards METHODOLOGY so it can be reused by others to train Given the amount of data handled, the other algorithms. Training is often one time needed to train algorithms, and of the most costly aspects of putting the computing power required, there together an ML project, as it can be is a natural trend of centralization of tedious and manual to assemble. data and algorithms in the cloud under Therefore, sharing training data can proprietary licenses to be used as a catalyze potential ML applications. platform or software as a service for a For instance, for satellite imagery and fee when it is run and used. labelled data, several standards such as Cloud Optimized GeoTiff6 (COG), Although this approach can be and SpatioTemporal Asset Catalog7 economical and more practical for the (STAC), as well as repositories such end users, it can limit the potential as MLHub (https://www.mlhub. of those tools for development, earth) are being developed to allow innovation, and education. Even if sharing and interoperability of tools deployed in production as software as 4https://digitalprinciples.org/ 5https://opendri.org/resource/opendri-policy-note-principles/ 6https://trac.osgeo.org/gdal/wiki/CloudOptimizedGeoTIFF 7https://github.com/radiantearth/stac-spec 29 MACHINE LEARNING FOR DISASTER RISK MANAGEMENT a platform, some domains of ML such For the applications described in this as computer vision have a tradition of note to be successful, ML algorithms open sourcing the code used in order to will need more and more “label” data so foster sharing of knowledge and increase they can be supervised and the accuracy innovation. Similar to open data, open- of their results validated. This is an area source allows for economy of scale by where it will be key to hybridize the enabling many computer programmers work of humans and computers so that and scientists to collaborate on the their efforts can be optimized to achieve same code and improve the same code the maximum efficacy on a project. together. This collaboration will also Crowdsourcing platforms like OSM improve the quality of the code, as already have provided over a decade of more people are checking it and running experience in leveraging large networks it. It is key to develop open-source tools of people to manually add features and where possible and invest in software labels to maps where computers could as a public good,8 especially where not do so. economy of scale can be achieved across an institution or institutions. There is an obvious link between Contributions and support to open- harnessing the power of the crowd to source software can materialize in provide much-needed training data for different ways—not only code, but also ML algorithms. For example, Google documentation, user and developer has been using re-CAPTCHA to train events, user design, and others, as image recognition algorithms. Involving shown with previous example such as the crowd to provide annotated data GeoNode.9 Some examples of open- can help provide a large amount of source software for ML are TensorFlow information to help train accurate and remote sensing-specific tools like models. Especially when involving Mapbox’s RoboSat. people from various parts of the globe, there is a possibility to add local 5.3 CROWDSOURCING AND knowledge and avoid biases such as CAPACITY BUILDING those described above. In the last decade or so, there has been a huge growth of volunteer and networked Beyond generating training data, there communities of individuals mapping are also a number of projects looking at a data together. In general, these are hybrid approach, where the algorithm’s called collaborative or crowdsourced output solely aims to aid the human. maps, and they have created everything Some of those examples relevant to from OpenStreetMap to satellite image development issues include AI-assisted feature recognition in humanitarian road tracing by Facebook,10 where the efforts. These networks of humans have ML algorithm output predicted roads, been brought together and allowed to but humans will ensure their accuracy work collaboratively by ever-evolving and topology before entering them in software that enables seamless work OpenStreetmap. Then this dataset can in crowds that can be formed by be used for many types of accessibility individuals all over the world connected studies. Similarly, DevSeed has set up a by the Internet. similar electric grid mapping system11 for the World Bank, claiming that it This is particularly important in the made the human mapper 33 times more context of the enormous growth efficient. Overall, this approach can and ubiquity of ML methods in ensure high data quality while making computer science, and those that are the human’s tasks less tedious. increasingly applied to disaster risk management. 8https://digitalprinciples.org/resource/howto-calculate-total-cost-enterprise-software/ 9https://opendri.org/resource/opendri-geonode-a-case-study-for-institutional-investments-in-open-source/ 10https://wiki.openstreetmap.org/wiki/AI-Assisted_Road_Tracing 11https://devseed.com/ml-grid-docs/ 30 MACHINE LEARNING FOR DISASTER RISK MANAGEMENT However, it is important to avoid At the same time, this conversation a situation where all advanced AI revolving around training and testing knowledge, software, and data are data would allow local communities centralized with a few large Silicon to build their capacity and have a say in Valley companies. Education and cap- and ownership over their own data and acity building should be stimulated. results of an ML algorithm. Building on the Open Data for Resilience principles, when developing 5.4 MACHINE LEARNING FOR a DRM project, it is also important to SUSTAINABLE DEVELOPMENT: consider new ways of involving local FROM USE CASES TO universities and knowledge centers. STANDARDIZED TRAINING DATA Increased human capital and access Putting together all these components to computing resources will help pave in an open and interoperable way the way for new mapping techniques creates potential for networked global and significant advancements in the data systems using ML algorithms to disaster risk management area. That provide enormous societal benefits human capital, together with ML in disaster risk management as well algorithms, will certainly pave the way as, more broadly, the Sustainable for the future of mapping in the disaster Development Goals (SDGs). risk management arena. Concretely working toward the creation Of particular concern to the World of an open framework encompassing Bank GFDRR is data openness and the different use cases, the training transparency, capacity building, and data required, and the algorithm to be the role of crowdsourcing, such as trained, all following open standards, the OpenStreetMap community. will provide the structure to scale ML efforts across geography and sectors. In terms of crowdsourcing, there is It will also provide transparency and tremendous potential in using the opportunities for capacity building, networks and tools already established crowdsourcing, and knowledge sharing. to go from mapping to training and GFDRR is joining efforts such as MLHub testing ML algorithms. While this to create a network of distributed rep- feature has not been used widely to ositories that provide access points to date, we believe that it could provide openly share ML training data, models, a future avenue for generating large and standards. This also supports the ML algorithm training and testing key role that open data and software, datasets. The OSM map-filling, capacity collaborative networks, crowdsourcing, building, and networking events known and capacity building have to play as “mapathons” could be envisioned together in the future of ML algorithms as “trainathons”—the difference being to support DRM and SDGs alike. that the final output of the training and validation of an ML algorithm could be to fill the map of an area or label it with much higher speed and scale. The OSM ecosystem’s existing tools that allow nested validation by expert mappers, and also the easy tiling/prioritization of mapping areas like in HOT OSM’s ID editor, would already provide the most important data needed for a successful project using ML. 31 MACHINE LEARNING FOR DISASTER RISK MANAGEMENT 6. CASE STUDIES IN DISASTER RISK MANAGEMENT 32 MACHINE LEARNING FOR DISASTER RISK MANAGEMENT Saint Lucia building hurricane vulnerability: Windows being automatically detected (red), distinguished from garages (green) and doors (no detection), GOST, WB. 6. CASE STUDIES IN DISASTER RISK MANAGEMENT The following case studies fall into four categories: 6.1 physical exposure and vulnerability, 6.2 social exposure and vulnerability, 6.3 risk mapping and damage prediction, and 6.4 post-disaster event mapping and damage assessment. These case studies were selected as and where to find more information. they provide an overview of how ML This selection is not comprehensive can support various aspects of DRM. and will be updated regularly, as this They represent different geographical is a booming field with many new regions, various input datasets and applications of ML being developed units of analysis, and various ML on a monthly basis—new upcoming algorithms. An overview is provided applications involve prioritizing building of the key characteristics of each case inspection, social media mining for study: the objective, input data and response awareness, monitoring of reference data used, scale of analysis, rebuilding and recovery activities, the algorithm used, who performed the support to insurance claims, and analysis, results and lessons learned many others. 33 MACHINE LEARNING FOR DISASTER RISK MANAGEMENT 6.1 PHYSICAL EXPOSURE AND Underlying DRM goal Quickly identify seismically vulnerable “soft-story” homes VULNERABILITY 6.1.1 Guatemala City building Which input data • Drone imagery (eBee, RGB, 4 cm) earthquake vulnerability were used • Point cloud elevation data Detecting seismic vulnerability in • Street-view imagery (Trimble MX, (30 megapixel) urban areas is critical. Identifying high-risk buildings can save lives and Reference data OpenStreetMap road layer help prioritize retrofitting investments. However, sending large teams of Unit of analysis Pixel/object (building) surveyors into the field is time con- suming and expensive. Instead, this Scale of analysis Neighborhood-level (three neighborhoods of approximately 10 km² case study leverages imagery from in Guatemala) satellites and drones, and street-view images from 360° street cameras to Which algorithm was Deep learning used identify homes that are a high risk for collapse during an earthquake. Digital Who completed the GOST/GSURR elevation models from satellite imagery analysis helped identify buildings located on steep slopes, which are at higher risk Results and lessons • This method screens a neighborhood of 5,000 homes and is able for mudslides. learned to identify some 500 that need further inspection and possible retrofitting/strengthening. A combination of satellite and drone • Of the “soft-story” buildings flagged by engineers (who viewed them from the outside) this method caught 85% of them. imagery helped identify rooftop • This detailed databases as potentials for input into exposure material, suggesting underlying con- databases, locating and prioritizing retrofitting/housing struction techniques which are more upgrading projects. • Automatic detection of large first-floor openings was done with vulnerable to seismic activities. The data collected by the team—but to scale up, Google Street View availability of street-view imagery is and/or Mapillary should be considered. unique, as it can be used to identify • Satellite imagery was also explored to see if 50-30cm could be used to measure the height if buildings. NTT was hired and soft-story constructions which are delivered a layer that was good but tended to lump households vulnerable to seismic activities. together, especially in dense neighborhoods. This case study is a good example of how different physical factors of vulnerability can be extracted from various data sources and the unique capabilities of street-view imagery. The deep learning algorithm trained on the street-view imagery caught 85% of the buildings which were flagged by expert engineers as vulnerable. Possible soft story (determined by images) Possible soft story (determined by Experts) Possible soft story (determined by both) The map above illustrates the “Rapid Housing Quality Assessment”, done by Sarah Antos, Geospatial Operations Support Team (GOST), World Bank 34 MACHINE LEARNING FOR DISASTER RISK MANAGEMENT 6.1.2 St. Lucia building Underlying DRM goal Estimate hurricane rooftop vulnerability in small island states hurricane vulnerability (Caribbean) What kind of damage would Saint Lucia experience if it was hit by a Category 5 Which input data • Drone imagery (eBee, RGB, 4 cm) storm? Using a recent detailed damage were used • Point cloud elevation data assessment conducted in neighboring • Street-view imagery (Trimble MX, (30 megapixel) Dominica, physical characteristics such as roof material and the shape and size Reference data OpenStreetMap building footprints that were downloaded from the Charmin geonode of buildings were used to predict the vulnerability of individual structures in Unit of analysis Pixel/object (building) Saint Lucia. Scale of analysis City-level (three cities of approximately 9 km² in Saint Lucia) Which algorithm was Conditional random field model—several python libraries combined used with MpGlue Who completed the GOST/GSURR analysis Results and lessons • Using the variables (and combination of them) that most learned powerfully predicted damage in Dominica, each structure in Saint Lucia was given an estimate of destruction. For example, you can expect a general 40% damage; however, if the roof is smaller and has only two panels (gables) you can expect more than 40%. • Volume, roof shape, and roof type were all influential. Large, highly pitched roofs with PVF2-coated metal sheeting tended to do the best. • Algorithm predicted roof shape (hip vs. gable) more easily than material, due to the three-band drone camera. • We are now working to add valuation information and general “quality” index, such as “rustiness.” North (0–22.5) Northeast (22.5–67.5) East (67.5–112.5) Southeast (112.5–157.5) South (157.5–202.5) Southwest (202.5–247.5) West (247.5–292.5) Northwest (292.5–337.5) North (337.5–360) The map detail above shows building vulnerability to hurricane winds, classified by cardinal direction (aspect). Sarah Antos, Geospatial Operations Support Team (GOST), World Bank. 35 MACHINE LEARNING FOR DISASTER RISK MANAGEMENT 6.1.3 Monitoring urban growth Underlying DRM goal Urban growth monitoring, focussing on built-up area and building through floor space index height Regular, cloud-free satellite images combined with ML algorithms can Which input data • Satellite imagery (RGB, 3.7 m) be used to monitor horizontal and were used • Digital surface model (DSM) extracted from stereoscopic satellite vertical urban growth. This study uses imagery (0.8 m) crowdsourced building footprints and height information to train an ML Reference data OpenStreetMap building footprints and height attribute collected during the Ramani Huria project model to be used for urban monitoring in Dar es Salaam. Unit of analysis Pixel Scale of analysis City-level (5,280 km² in Dar es Salaam, Tanzania) Which algorithm was Deep learning, convolutional neural networks used Who completed the Planet analysis Results and lessons • The study shows how to combine OSM reference data and machine learned learning methods. • Building footprints were extracted to an accuracy of 77%; the correct number of floors predicted for 23% of the buildings. • Difficulties were caused by densely built (informal) areas. • Results would likely improve with higher resolution imagery. More information Executive Summary: Monitoring Urban Change with Satellite Imagery and Analytics pp. 36, 37, 40, 41, 43-46. 0 1 2 3 4 5 km graphics (Figure 12B in WB Report) Building to Non-Building Ratio—Inner-City Wards of Dar es Salaam, August 2017 Figure 12 in WB report 36 MACHINE LEARNING FOR DISASTER RISK MANAGEMENT 6.1.4 Targeting high-risk buildings Underlying DRM goal Building regulation violations for building inspection Building inspection is an important Which input data • Cases and descriptions of previous building violations measure to mitigate the risks of fire. were used • Locations However, as cities grow, it becomes • Building vacancy increasingly difficult to prioritize • Tax delinquency which buildings should be inspected. Some emerging methods combine Reference data Building inspection reports from the City of Philadelphia’s Department of Licenses and Inspections. geospatial data and building attributes to determine which buildings present Unit of analysis Building the greatest risk. Scale of analysis City-level (Philadelphia, USA) An example by Azavea focuses on the likelihood of a building which Which algorithm was Gradient boosting and random forests failed a past violation to fail again. used The model makes use of open data from the OpenDataPhilly portal, Who completed the Azavea which provided information on more analysis than 55,000 building inspections in more than 25,500 locations. Various Results and lessons • The model was able to predict repeated building violations with an features of the building inspections learned accuracy of 74%. were considered, such as the duration • The results can help building inspectors allocate resources effectively by targeting high-risk buildings. between inspections, type of violation, location variables, the total number of violations, building vacancy, and More information Predicting Building Inspections Predicting Building Code Compliance with Machine Learning Models tax delinquency. A feature selection was performed to remove variables FollowupBuilding Followup InspectionsResults BuildingInspections Results relativedensities relative ofbuildings densitiesof thatpassed buildingsthat afterinitially passedafter initiallyfailing failingan anin inspection(left) ininspection andthose (left)and thatfailed thosethat (right) failed(right) which were not relevant for predicting a repeat violation. The model results One-time One-time violations violations One-timeviolations Repeat follow-ups violations Repeatviolations Repeat indicate that repeated violations of building inspections could be predicted with an accuracy above 74%. Total Total violations Violations Source:Azavea, Source: Azavea,Data: PhiladelphiaDepartment Data:Philadelphia ofLicenses Departmentof andInspections Licensesand Inspections Follow-up Building Inspection Results Relative densities of buildings that passed after initially failing an inspection (left) and those that failed (right) Source: Azavea, Data: Philadelphia Department of Licenses and Inspections 37 MACHINE LEARNING FOR DISASTER RISK MANAGEMENT 6.2 SOCIAL EXPOSURE AND Underlying DRM goal Estimate poverty levels VULNERABILITY 6.2.1 Sri Lanka poverty mapping Which input data • Satellite imagery (RGB, < 0.5 m) Poverty data are in scarce supply were used ◦◦ Object-based features and difficult to collect. This study ▪▪ Number of buildings investigates the suitability of features ▪▪ Number of cars ▪▪ Fraction roads paved derived from very high-resolution ▪▪ Shadow pixels (building height) satellite imagery to estimate poverty ▪▪ Crop type/extent at a local level in Sri Lanka, allowing ▪▪ Roof type ◦◦ Pixel-based features these estimations to be extrapolated to ▪▪ Vegetation index areas not covered by surveys. A unique ▪▪ PanTex (settlement density) partnership with OSM provided access ▪▪ Texture (HoG, LBP, Line Support Region, Gabor filter, Fourier transform, SURF) to a large amount of labelled data to support the ML algorithm. Reference data Two poverty lines (10th and 40th percentiles of the national per capita consumption distribution) which were obtained from 2011 A large number of object- and pixel- census data based features describing agricultural land, cars, building density and Unit of analysis Pixel/object (administrative unit) vegetation, shadows, road and transportation networks, roof types, Scale of analysis Regional (3,500 km² covering 1,250 administrative units in Sri Lanka) and textural/spectral characteristics were extracted from the imagery. A Which algorithm was • Deep learning (convolutional neural networks) to calculate linear regression was established to used percentage of built-up area, number of cars, shadow pixels, and determine the relationship between crop type for each administrative unit • Support vector machines and visual identification to obtain these features and poverty levels taken information regarding roof type, paved and unpaved roads, and from census data. railroads Who completed the WB poverty team working with Orbital Insight, LAND INFO Worldwide analysis Mapping, LLC, and the George Washington University Department of Geography Results and lessons • Analysis can explain 60–61% of the variation in a small area learned (compared to 15% when using night lights analysis). • Building density, built-up area, and shadows were some of the most influential features describing variations in poverty. • Normalized error rates of 0.25–0.5 of poverty rates when applying the model to geographically adjacent areas. • Project cost $90,000 total. More information Graesser J B, Cheriyadat A M, Vatsavai R, Chandola V, and Bright E A. 2012. Image Based Characterisation of Formal and Informal Neighborhoods in an Urban Landscape. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 5. True poverty rate (40%) Predicted poverty rate (40%) Figure 6 in WB report Seethawaka 0.087–0.249 0.329–0.359 0.250–0.297 0.360–0.395 0.298–0.328 0.396–0.826 0 1.5 3 6 9 Miles 38 MACHINE LEARNING FOR DISASTER RISK MANAGEMENT 6.2.2 Informal settlement mapping Underlying DRM goal Identification of informal settlements In 2012, a peer-reviewed paper by Graesser et al. mapped the informal Which input data • Satellite imagery (RGB, < 0.5 m) settlements in four major cities, using were used ◦◦ Pixel-based features an automated ML algorithm to classify ▪▪ Vegetation indices satellite imagery. According to the ▪▪ GLCM PanTex (settlement density) ▪▪ Texture (HoG, lacunarity, linear feature distribution, line authors, in remote sensing imagery, support region, SIFT, yextons) informal settlements share unique spatial characteristics that distinguish Reference data Manual labelling of imagery them from other types of structures like industrial, commercial, and formal Unit of analysis Pixel residential areas. After a thorough literature review of remote sensing Scale of analysis City (74 km² of Kandahar, Afghanistan; 203 km² of La Paz, Bolivia; methods that have been used for similar 220 km² of Kabul, Afghanistan; 348 km² of Caracas, Venezuela) objectives, the authors used several low- level image features at multiple scales Which algorithm was Decision trees to characterize local neighborhoods, used separated based on a series of spatial, structural, and contextual features. Who completed the Graesser J B, Cheriyadat A M, Vatsavai R, Chandola V, and Bright E A of analysis Oak Ridge National Laboratory Graesser et al. outlined how formal and informal neighborhoods can be Results and lessons • Texture features in submeter satellite imagery were found to be learned suitable for distinguishing formal vs. informal areas in cities. visibly separated given enough spatial • ML algorithm had an accuracy of 85–92% for the four cities. resolution of the imagery used. Informal • Authors suggest that methods which take multiple neighboring settlements often share unique spatial, pixels into account may improve results. • The study relates social vulnerability to the physical appearance structural, and contextual features that and arrangement of buildings and roads; this will depend on local separate them from other types of urban context, and one should take care when applying the models to neighborhoods. These characteristics can other areas. include: More information Graesser J B, Cheriyadat A M, Vatsavai R, Chandola V, and Bright • A high heterogeneity in building E A. 2012. Image Based Characterisation of Formal and Informal Neighborhoods in an Urban Landscape. IEEE Journal of Selected Topics orientation (most buildings aren’t in Applied Earth Observations and Remote Sensing, 5. “neatly” oriented along a planned space [e.g., a road]) • A high variance in building materials Classification results for Classification results used and density of the structures (as Kabul. The results for Kabul. have been The results smoothed usnig a 11x11 opposed to formal settlements, where have been smoothed majority filter. there would be more homogeneity of using an 11 x 11 majority filter. (a) Formal and informal these features in a neighborhood) (Type I) residential. • Small building size (as opposed to (a) Formal and informal (b) Informal (Type (Type II) I) residential larger buildings with more stories in residential built on formal settlements) slopes. (b) Informal (Type II) • Irregular and narrow streets (as residential (c) built on Non-residential opposed to wider and straighter slopes planned streets) (c) Nonresidential • Informal neighborhoods that are often closer to hazardous zones like landfills, airports, railroads, and steeper slopes 39 MACHINE LEARNING FOR DISASTER RISK MANAGEMENT 6.2.3 Stanford poverty study Underlying DRM goal Poverty mapping Poverty mapping based on census data is often expensive and difficult Which input data • Deep learning model trained on ImageNet to collect at a large spatial scale and were used • NOAA nightlights imagery update frequently. This study aims • Google Maps imagery to use remote sensing to predict the ratio of households above the poverty Reference data • Night-time lights line in Uganda. • Governmental household surveys This study shows an alternative Unit of analysis Pixel (1 km x 1 km grid), object (districts) strategy of how to use deep learning when limited training samples are Scale of analysis National (Uganda) available. First, a deep learning model, which learned image features from an Which algorithm was Deep learning (fully convolutional neural network) and logistic used regression classifier object detection challenge (ImageNet) is used. Due to the lack of survey data, Who completed the Stanford University the researchers use night light data analysis (a proxy for economic development) to train the model to learn relevant Results and lessons • Proposed method can predict poverty levels with 72% accuracy. features, which are then used to form learned This is comparable to results of the logistic regression when using a logistic regression model predicting survey-based features to predict the surveyed poverty levels. poverty levels. • This shows how a proxy dataset can be used to develop a machine learning model when not enough reference data are available. More information Transfer Learning from Deep Features for Remote Sensing and Poverty Mapping Stanford researchers use dark of night and machine learning to shed light on global poverty Block poverty probabilities District poverty probabilities Uganda poverty rates (2005) High Medium Low Figure 3 from the Arxiv paper Left: Predicted poverty probabilities at a fine-grained 10 km x 10 km block level. Middle: Predicted poverty probabilities aggregated at the district level. Right: 2005 survey results for comparison (World Resources Institute 2009) 40 MACHINE LEARNING FOR DISASTER RISK MANAGEMENT 6.3 RISK MAPPING AND DAMAGE Underlying DRM goal Flood damage assessment PREDICTION 6.3.1 Flood damage prediction Which input data • Water depth Many flood damage assessment models were used • Building type utilize water depth to calculate damage • Building footprint area curves based on specific location and • Floor area for living • Building age flood conditions. Applying the same • Basement curves to different situations therefore • Household size often produces unreliable results. This • Flow velocity • Flood duration project researches how the inclusion • Return period of additional variables can be used to • Flood experience improve the transferability of flood • Precautionary measures damage prediction models. Reference data Relative building damage and relative content damage from field surveys Bayesian networks and regression random forests were constructed to Unit of analysis Tabular (survey data) relate the relative building damage or relative content damage reported by surveyed households to various Scale of analysis Regional (a flood event in the Netherlands in 1993 and six flood events in Germany between 2002 and 2013) input features. Results show that models which are trained using Which algorithm was Bayesian networks and random forests (regression) heterogeneous data (i.e., flood events used with various characteristics) have a higher performance. The authors Who completed the Deltares, GFZ German Research Centre for Geosciences emphasize the importance of acquiring analysis a heterogeneous training set for flood damage models, including a variety of Results and lessons • Updating an ML algorithm with data from a different country flood events, geographical locations, learned improves the model’s performance on flood events from that and asset characteristics. country. • The collection of training data from various flood events and regions may be more effective than a large amount of information from a single event. More information Regional and Temporal Transferability of Multivariable Flood Damage Models fe fd pre rp fa rcd/rbd bt wdt Structureof Structure ofthe GermanBayesian theGerman Network based Flood BayesianNetwork-based Flood Damage Estimation Model Model for Damage Estimation the private for the private BN-FLEMOps sectorBN-FLEMOps sector 41 MACHINE LEARNING FOR DISASTER RISK MANAGEMENT 6.3.2 Machine learning-powered Underlying DRM goal Earthquake structural damage modelling seismic resilience for San Francisco Modelling structural damage from Which input data • Seismic shaking data for the earthquake of interest earthquakes (as with other hazards) were used • Soil characteristics is challenging due to the number of • Seismic hazard parameters factors which influence the process. • Building characteristics like material, number of stories, area, etc. A proprietary algorithm developed by OneConcern models seismic Reference data Historical earthquake damage data from multiple events resilience by predicting the structural damage resulting from earthquakes. It Unit of analysis Tabular (survey data) leverages various data sources such as earthquake shaking parameters, soil Scale of analysis City block–level and seismic hazard characteristics, multiple building characteristics, and Which algorithm was Proprietary algorithm used real-time field input to estimate the impact of earthquakes. Data from Who completed the OneConcern, Inc. previous earthquakes are used to train analysis the ML models, which are optimized using a unique performance measure Results and lessons • Making use of data streams from multiple sources and at multiple to ensure a better estimation of higher learned resolutions can gain a higher training accuracy. damage to buildings. Techniques such • It is important to use diverse data sources to ensure as geographical hold-out, event hold- generalizability of the algorithms. • The inclusion of localized data captures effects which are generally out, and randomized hold-out are not identified through generic methods. used to further improve the model’s performance. OneConcern also focuses on using Evacuation team the developed model to provide real-time and on-demand situational awareness right before, during, and Spill containment immediately after a seismic event. The damage predictions are made at Emergency medical service a census block-level resolution, thus visualizing detailed localized data for seismic hazards throughout the city of San Francisco while maintaining the anonymity of the individual blocks within the city. This also enables the Seniors (>65 years old) risks and vulnerability data to be democratized by sharing it with local communities and volunteers. Low Income (<10,000 annual) Figure from the original study, available at: https://medium.com/@oneconcerninc/2018-the-dawn-of-benevolent-intelligence-263c6bd1a63 42 MACHINE LEARNING FOR DISASTER RISK MANAGEMENT 6.3.3 Real-time global landslide Underlying DRM goal Landslide hazard mapping hazard mapping The Landslide Hazard Assessment Which input data • Elevation for Situational Awareness (LHASA) were used • Faults and geologic regions provides landslide hazard data in real • Roads time. An algorithm was trained which • Forest cover • Rainfall links landslide susceptibility factors (slope, geology, road networks, fault Reference data Global Landslide Catalog zones, and forest loss) to historical landslide events. This model is Unit of analysis 0.1° applied to precipitation data from the Global Precipitation Measurement Scale of analysis Global (between 50°N and 50°S) (GPM) mission at three-hour intervals. When the rainfall for a given region Which algorithm was Decision tree is extremely high for that region, used the landslide susceptibility map is consulted. If a region is also classified Who completed the NASA as being highly susceptible to a analysis landslide, a nowcast warning is issued. Thus, LHASA provides near-real-time Results and lessons • The model would have issued a nowcast for historical landslide situational awareness of landslide risk learned events with a false positive rate below 3% and true positive rate of on a global scale, presented in an open- up to 60%. • Lack of historical data and locational accuracy of historical source framework. landslide events make it challenging to train a good model. The Cooperative Open Online Landslide Repository was launched to obtain additional reference data through citizen science. More information NASA landslide map estimates risk in real time Satellite Based Assessment of Rainfall Triggered Landslide Hazard for Situational Awareness a Global susceptibility Very low Global susceptibility Very low Very high Very high b Global Landslide Catalog Global Landslide # of fatalities Catalog 0 # of fatalities 1–5 0 6–25 1–5 26–50 6–25 51–100 26–50 101–5000 51–100 101–5000 (a) Global landslide susceptibility map computed using slope, geology, fault zones, road networks, and forest loss (Stanley and Kirschbaum, 2017); (b) Global Landslide Catalog (2007–2016) showing the distribution of landslide fatalities (Kirschbaum et al., 2015) 43 MACHINE LEARNING FOR DISASTER RISK MANAGEMENT 6.3.4 Wildfire prediction Underlying DRM goal Real-time wildfire prediction Two high school students invented a device to predict the probability Which input data • Weather data of a forest fire occurring. The device were used ◦◦ Humidity is placed in the forest and can take ◦◦ Temperature real-time photos which are uploaded ◦◦ Gas ◦◦ Carbon monoxide/dioxide to SensorInsight to enable real- ◦◦ Wind time visualization. Deep learning • Images algorithms are used to analyze the images and predict the amount of Reference data Approximately 100 randomly sampled images of grass and shrubs dead fuel present in the sensor’s area. from Google Images This information is combined with local weather data to predict the Unit of analysis Point (locations of sensors placed in forests in California) possibility of a fire. Scale of analysis Regional (selected forests in California) This study is based on a relatively small sample size and will likely Which algorithm was Deep learning used require extensive validation with more substantial reference data. Despite these factors, it is a very unique case Who completed the Cal Fire and Monta Vista High School analysis study as it showcases a grassroots solution and how to combine ML Results and lessons • Classifies images of grasses and shrubs into 14 classes indicating algorithms to obtain real-time risk learned various forest fire risk levels with 89% accuracy. predictions. The Smart Wildfire Sensor • Model will likely require more extensive validation. they devised is being further developed • Real-time, grassroots approach of using ML algorithm for DRM. and tested with Cal Fire in three counties in California. More information Fighting fire with machine learning: two students use TensorFlow to predict wildfires Image from the original study, available at: https://www.blog.google/technology/ai/fighting-fire-machine-learning-two-students-use- tensorflow-predict-wildfires/ 44 MACHINE LEARNING FOR DISASTER RISK MANAGEMENT 6.4 POST-DISASTER EVENT Underlying DRM goal Flood extent mapping MAPPING AND DAMAGE ASSESSMENT Which input data • Optical satellite imagery 6.4.1 Flood extent mapping were used • SAR imagery (through clouds) Orbital insight developed a project • Digital elevation models (DEMs) in 2017 in which they used Synthetic Aperture Radar (SAR) as an input for Reference data Crowdsourced, geotagged images an image classification algorithm that allowed the categorization of Unit of analysis Pixel at-risk areas for flooding in Houston, Texas, U.S.A. A combination of optical Scale of analysis Hurricane Harvey flood event and SAR imagery (which is capable of “looking” through clouds) helped Which algorithm was Deep learning identify the flooding extent. Digital used elevation models (DEMs) allowed natural watersheds to be delimited, Who completed the Orbital insight analysis and crowdsourced geotagged images were used to confirm the flood extents. Results and lessons • Combining various types of large-scale spatial data helped learned estimate flood extent. • Crowdsourced, geotagged imagery can help verify flooding in accuracy analysis. More information Understanding the Extent of Flooding in Houston from Hurricane Harvey How Orbital Insight Measured Hurricane Harvey’s Flooding through the Clouds 0.1 1.26 2.41 3.57 4.73 5.88 7.04 8.11 9 0 1 2 3 4 km Actual flood maps after applying Orbital Insight’s geospatial interpolation across observation points and DEM (Source: Orbital Insight, Google Street Map) 45 MACHINE LEARNING FOR DISASTER RISK MANAGEMENT 6.4.2 Cyclone damage assessment Underlying DRM goal Damage assessment The World Bank and UAViators collected UAV images after Cyclone Which input data UAV optical imagery Pam hit Vanuatu in 2015. High detail and were used the ability to collect data under cloud cover were advantages of using imagery Reference data Crowdsourced annotation of images from UAVs rather than satellites. At the time, volunteers from Humanitarian Unit of analysis Pixel Open Street Map (HOT) and the Digital Humanitarian Network annotated the Scale of analysis Regional (Cyclone Pam, Vanuatu, 2015) damage in the images. Which algorithm was Deep learning Since then, the images and reference used data have also been used to develop ML algorithms. Artificial Intelligence Who completed the Artificial Intelligence for Digital Response (AIDR), Qatar Computing for Digital Response (AIDR) is an open analysis Research Institute, MicroMappers; World Bank and UAViators acquired the imagery platform combining crowdsourcing and ML to interpret social media data Results and lessons • A pipeline was developed for combining crowdsourced damage in disaster situations. A similar pipeline learned annotation and deep learning with 63% accuracy. was developed for Cyclone Pam • Tests on a damage event in the Philippines were 41% accurate, data when MicroMappers organized demonstrating a need for more training data to improve model volunteers to identify and demarcate predictions. various levels of damage to buildings. These were used to train a deep More information Lessons from Mapping Geeks: How Aerial Technology Is Helping Pacific Island Countries Recover from Natural Disasters learning algorithm (Nazr-CNN) to first recognize buildings and then identify Nazr-CNN: Fine-Grained Classification of UAV damage levels. The study indicates a Imagery for Damage Assessment need for additional training samples in order to improve the transferability of UAV image Crowd annotation Semantic segmentation Proposed the model. Figure from the original study, available here: https://arxiv.org/pdf/1611.06474.pdfres/ 46 MACHINE LEARNING FOR DISASTER RISK MANAGEMENT 7. GLOSSARY AGI: Artificial General Intelligence—the to the biological ecosystem, but to Optical Imagery: Imagery that is artificial intelligence that does not exist the fact that it uses many decision obtained via an optical sensor, yet, where computers have learned the “trees”—decision structures where a whether in the visible Red-Green-Blue ability to be self-aware and tackle all yes/no decision is made at every fork, bands or in other wavelengths of the different types of generalized problems creating a “tree” electromagnetic spectrum in a way that’s indistinguishable from human intelligence GANs: Generative adversarial networks OSM: OpenStreetMap, a global crowd- https://skymind.ai/images/wiki/GANdancers.png sourced map of roads, buildings, and AI: Artificial intelligence—a term used https://skymind.ai/wiki/generative-adversarial- other physical features. OSM is an open, to describe all types of computer network-gan collaborative, crowdsourced version of machine learning other common maps, such as Google GDAL: Geospatial Data Library Maps or Bing Maps CAPTCHA: Completely Automated Public Turing test to tell Computers GEOSS: Global Earth Observation QGIS: Quantum GIS , an open-source and Humans Apart—a tool ubiquitously Systems of Systems GIS software used in web pages to discern humans from machines in an attempt to protect GFDRR: Global Facility for Disaster Radar/SAR: Synthetic-aperture radar, a online resources from malicious Reduction and Recovery of the World type of sensor used in earth observation software Bank RMSE: Root-mean-square error, a type Commons: Resources and information GOST: Geospatial Operations Support of statistical analysis used to assess the that are freely available to all members Team of the World Bank accuracy of MLA results of a community, e.g., wikis and open- source software GRASS: Geographic Resources Analysis Supervision: Human training of ML Support System algorithms to learn to classify data Crowdsourcing: A method of creating according to set target parameters data that leverages the communal work HDX: Humanitarian Data Exchange of a team or community (crowd), using UAV: Unmanned Aerial Vehicle software often made that allows the HOT: Humanitarian OpenStreetMap communal effort to be properly saved, validated, and analyzed to then become ISET: Informal settlement a common asset LiDAR: Light Imaging Detection and Deep learning: A term that references Ranging the architecture of neural network algorithms, where there are hidden MLA: Machine learning algorithm layers between the inputs and outputs that connect with each other in a way OBIA: Object-Based Image Analysis similar to neurons in the brain, albeit with many fewer connections OpenArialMap: Online platform for sharing openly satellite, aerial, and DRM: Disaster risk management drone imagery ESA: European Space Agency OpenStreetCam: An open data version of Street View, with street-level imagery Forests (of Decision Trees): A collected from the ground common supervised ML algorithm, https://openstreetcam.org/ where the term “forests” refers not 47 MACHINE LEARNING FOR DISASTER RISK MANAGEMENT 8. REFERENCES AND RESOURCES There are a number of online resources available, and for the 8.4 ARTICLES AND BLOGS user that wants to go more in depth, there are indeed courses • A Tour of the Top 10 Algorithms for Machine Learning as well as many academic papers and textbooks that can be Newbies referenced. The following is a curated list of these references https://www.kdnuggets.com/2018/02/tour-top-10-algorithms-machine- and resources learning-newbies.html • Top 10 Machine Learning Algorithms for Beginners 8.1 ONLINE RESOURCES https://www.dataquest.io/blog/top-10-machine-learning-algorithms- • One of the most thorough and up-to-date courses for-beginners/ on machine learning is from TechChange: Artificial • Experts, Crowds, Machines—Who Will Build the Maps of Intelligence for International Development the Future? https://course.tc/301-1/c https://blog.mapillary.com/update/2017/12/21/who-will-build-the- • Educational resources on AI from Google maps-of-the-future.html https://ai.google/education/ • Updating Google Maps with Deep Learning and Street • Crash course on Machine Learning from Google View https://developers.google.com/machine-learning/crash-course/ https://research.googleblog.com/2017/05/updating-google-maps-with- • A Machine Learning online course on Coursera, from deep-learning.html Stanford University • Introduction to GBDX https://www.coursera.org/learn/machine-learning https://platform.digitalglobe.com/gbdx/ • Along with the above Coursera course, the following is • GBDX Overview specifically about Unsupervised Learning https://gbdxdocs.digitalglobe.com/docs/gbdx-overview-1 https://www.coursera.org/learn/machine-learning/lecture/olRZo/ • Machine Learning and ethics­ —Toward ethical, transparent unsupervised-learning and fair AI/ML: a critical reading list • Another resource for learning statistical methods is https://medium.com/@eirinimalliaraki/toward-ethical-transparent-and- Datacamp fair-ai-ml-a-critical-reading-list-d950e70a70ea https://www.datacamp.com/ • The Building Blocks of Interpretability https://distill.pub/2018/building-blocks/ 8.2 VIDEOS AND TALKS • PBS Machine Learning and Artificial Intelligence: Crash 8.5 CONFERENCES AND MEETINGS Course Computer Science #34 • Computer Vision conferences: ECCV, ICCV, CVPR, etc. https://www.youtube.com/watch?time_continue=687&v=z-EtmaFJieY • GFDRR Understanding Risk • Deep learning in medical imaging https://understandrisk.org/ https://www.youtube.com/watch?v=2_Jv11VpOF4&feature=youtu. • AI for Good Global Summit 2018 be&t=4m7s https://www.itu.int/en/ITU-T/AI/2018/Pages/default.aspx • Mapbox—Locate 8.3 INFOGRAPHICS AND INTERACTIVE RESOURCES https://www.mapbox.com/locate • Understanding Machine Learning https://futurism.com/images/understanding-machine-learning- infographic/ • Machine Learning 101 http://usblogs.pwc.com/emerging-technology/machine-learning-101/ • A Beginner’s Guide to Machine Learning Algorithms http://dataconomy.com/2017/03/beginners-guide-machine-learning/ • A Visual Introduction to Machine Learning http://www.r2d3.us/visual-intro-to-machine-learning-part-1/ • The mostly complete chart of neural networks, explained https://towardsdatascience.com/the-mostly-complete-chart-of-neural- networks-explained-3fb6f2367464 48 MACHINE LEARNING FOR DISASTER RISK MANAGEMENT 8.6 CHALLENGES AND COMPETITIONS Challenges are an effective approach to get multiple people • Sethi I K. 1990. Entropy nets: from decision trees to neural to try and tune models to the best of their ability in order to networks. Proceedings of the IEEE 78, 1605–13. get the most accurate results. https://ieeexplore.ieee.org/document/58346/figures • Shankar et al. 2017. No Classification without Representa- • We Robotics Open AI Challenge tion: Assessing Geodiversity Issues in Open Data Sets for https://blog.werobotics.org/2018/05/16/announcing-winners-open-ai- the Developing World. https://arxiv.org/pdf/1711.08536.pdf challenge/ • DeepGlobe http://deepglobe.org/ • SpaceNet http://explore.digitalglobe.com/spacenet • DSTL Satellite Imagery Feature Detection https://www.kaggle.com/c/dstl-satellite-imagery-feature-detection/ • Functional Map of the World Challenge https://www.iarpa.gov/challenges/fmow.html • DIUx xView 2018 Detection Challenge http://www.xviewdataset.org/ 8.7 OTHER REFERENCES, ARTICLES, AND TEXTBOOKS • Engstrom et al. 2016. http://pubdocs.worldbank.org/en/60741466181743796/Poverty-in-HD- draft-v2-75.pdf • Geo-diversity for better, fairer machine learning https://developmentseed.org/blog/2018/03/19/geo-diversity/ • Gevaert C M, Persello C, Sliuzas R, and Vosselman G. 2017. Informal settlement classification using point-cloud and image-based features from UAV data ISPRS Journal of Photogrammetry and Remote Sensing Complete 225–36. • Graesser J B, Cheriyadat A M, Vatsavai R, Chandola V, and Bright E A. 2012. Image Based Characterisation of Formal and Informal Neighborhoods in an Urban Landscape IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 5. https://www.osti.gov/scitech/biblio/1050316 • James G, Witten D, Hastie T, and Tibshirani R. 2013. An Introduction to Statistical Learning vol 103 New York, NY: Springer New York. http://link.springer.com/10.1007/978-1-4614-7138-7 • Kirschbaum, D. B., T. Stanley, and J. Simmons (2015), A dynamic landslide hazard assessment system for Central America and Hispaniola, Nat. Hazards Earth Syst. Sci., 15(10), 2257–2272, doi:10.5194/nhess-15-2257-2015. • Kirschbaum D and Stanley T 2018 Satellite-Based Assessment of Rainfall-Triggered Landslide Hazard for Situational Awareness Earth’s Future 6 505–23 • Machine Learning Applications for Earth Observation. https://link.springer.com/chapter/10.1007/978-3-319-65633-5_8 • Mather P M, and Koch M. 2011. Computer Processing of Remotely-Sensed Images: An Introduction John Wiley and Sons. • Mathieu P-P, and Aubrecht C. 2017. Earth observation open science and innovation New York, NY: Springer Science+Business Media. 49 MACHINE LEARNING FOR DISASTER RISK MANAGEMENT Photo Credit: World Bank