Policy Research Working Paper 9587 Meta-Analysis Assessing the Effects of Virtual Reality Training on Student Learning and Skills Development Diego F. Angel-Urdinola Catalina Castillo-Castro Angela Hoyos Education Global Practice March 2021 Policy Research Working Paper 9587 Abstract Training using virtual reality has been applied in many fields successfully develop students’ skills across different fields of of education, but primarily in the fields of health and safety, education and the size of the effects encountered. The anal- engineering and technical education, and general education. ysis presented here relies on 31 primary studies and more Numerous studies assessing the use of immersive training than 90 experiments. The results indicate that, on average, in education have yielded promising results in educational virtual reality training is more effective than traditional outcomes, but there is not yet in the literature a systematic training in developing technical, practical, and socio-emo- analysis of the effects of virtual reality training on student tional skills. The results are particularly promising in fields learning. This paper presents a meta-analysis of the results of related to health and safety, engineering, and technical available studies that assess virtual reality training’s impact education. The results also indicate that students who are on student learning and skills development, and which rely exposed to virtual reality training are more efficient in using on robust evaluation methods. The study’s primary purpose inputs and time and/or avoiding performance errors than is to identify the extent to which immersive training can students receiving traditional training. This paper is a product of the Education Global Practice. It is part of a larger effort by the World Bank to provide open access to its research and make a contribution to development policy discussions around the world. Policy Research Working Papers are also posted on the Web at http://www.worldbank.org/prwp. The authors may be contacted at dangelurdinola@worldbank.org. The Policy Research Working Paper Series disseminates the findings of work in progress to encourage the exchange of ideas about development issues. An objective of the series is to get the findings out quickly, even if the presentations are less than fully polished. The papers carry the names of the authors and should be cited accordingly. The findings, interpretations, and conclusions expressed in this paper are entirely those of the authors. They do not necessarily represent the views of the International Bank for Reconstruction and Development/World Bank and its affiliated organizations, or those of the Executive Directors of the World Bank or the governments they represent. Produced by the Research Support Team Meta-Analysis Assessing the Effects of Virtual Reality Training on Student Learning and Skills Development Diego F. Angel-Urdinola, Catalina Castillo-Castro, Angela Hoyos 1 The World Bank JEL Classification: I20, I24 Keywords: Virtual reality, Education, Learning, Skills Development. Corresponding Author: Diego F. Angel-Urdinola (dangelurdinola@worldbank.org) 1 This paper is a product of the Education Global Practice. The paper was prepared as a background paper for the Pilot Program to Test Virtual Reality Training Programs for Technological and Technical courses in Higher Education (TF0A8313), supported by the Korea World Bank Partnership Facility (KWPF). The authors acknowledge useful comments and support from Christine H. Joo, Robert Hawkins, and Michael Trucano. I. Introduction Recent events, such as the health pandemic introduced by the COVID-19 virus, have contributed to speed up alternative mechanisms to offer digital instruction to substitute and complement in-class instruction. The expansion of digital and computer assisted learning is becoming a global trend, making it of extreme importance to identify technology tools that work, while being scalable and cost-effective (Escueta, Quan, Nickow, & Oreopoulos, 2017). Even before the pandemic, it had become particularly challenging for education systems to supply digital learning opportunities that provide students the hands-on pedagogical experiences necessary to develop practical skills, especially for programs that require the use of laboratories. Virtual reality (VR) training is often known as the process of learning in a simulated or artificial environment. VR training has existed in the realm of education for over half a century but has dramatically expanded over the past 15 years as VR simulators are becoming less expensive to develop and increasingly realistic. The term VR applies to computer-simulated environments that can imitate physical presence in places in the real world, as well as in imaginary worlds (Lorenzo, Pomares, & Lledó, 2013), and simulate the illusion of participation in a synthetic environment with an external observation of such surroundings (Gigante, 1993). VR simulations can be constructed employing 3D graphics using a desktop computer (non-immersive) or using a head-mounted display (immersive) (Makransky, Terkildsen, & Mayer, 2019). In non-immersive VR, the simulated environment is displayed on a conventional computer with sound and graphics coming through the computer’s speaker and monitor, and the interaction is controlled through a regular computer mouse. Immersive VR uses a head-mounted display in which a high graphical fidelity screen is mounted in front of the user’s eyes with separate lenses for each eye and with sound delivered through earphones. The interactions in the context of high-immersion VR are controlled through head-motion tracking in conjunction with a computer system that allows users to look around a simulated 360-degree environment. In some educational fields, the development of adequate cognitive, technical, and socio- emotional skills remains a challenge for trainees and their tutors, partly because of the limited availability of hands-on training or access to proper content and learning situations. As a response, educators are starting to rely on VR simulations to develop learning experiences that would otherwise not be easily accessible to students. VR simulations can provide students practical 2 training opportunities without pressure, danger, and allowing repeated interventions. Also, VR simulations can provide students access to situations and learning environments (such as traveling within a cell, simulated scenarios for public speaking, among others) that would otherwise be very difficult or impossible to access. Such opportunities can accelerate students' learning curve in a simulated environment, reproducing real-life conditions and situations without time or space limitations and much fewer risks than real environments. In addition, VR simulations offer the great advantage of providing students and teachers a standardized, reproducible environment for repeated and optimized training (Apostolellis, Bowman, & Chmiel, 2018; Cheung, Fong, Fong, & Wang, 2013; Ferracani, Pezzatini, & Del Bimbo, 2014; Huang, Rauch, & Liaw, 2010; Sharma, Agada, & Ruffin, 2013). Another advantage of using VR simulations is that gamification, performance metrics, and collaborative features (using avatars) can be embedded in the software, enabling continuous peer interaction, active learning, enjoyment, and performance feedback – all elements that enhance proficiency-based training. Indeed, constructivism is often cited as a theoretical framework that supports the implementation of learning in virtual environments. Constructivism suggests that students learn by constructing knowledge and incorporating it into their existing knowledge structure. Thus, constructivist learning environments can increase active learning, motivation, interactivity, and personalized learning (Madathil et al., 2017). Proponents of VR simulations claim that higher motivation and presence are the main two channels through which VR training simulations can influence student learning (Mikropoulos, & Natsis, 2011). As a result, VR simulations have been regarded as a pedagogical method with the potential to increase student learning, as they increase self-motivation to learn and allow embedding to the educational experience constructivist pedagogy, collaboration, and gamification (Kavanagh, Luxton-Reilly, Wuensche, & Plimmer, 2017). The impact of media on student learning outcomes has been highly debated among educational technologists where much of the prior literature has shown no significant difference between technology-based and traditionally delivered instruction and media. However, the counterargument contends that using the correct media could impact students’ cognitive skills and that the media itself is a critical component of instructional design (Madathil et al., 2017). 3 Given the rising importance of identifying digital education platforms that work, this paper conducts a meta-analysis of the results of available experiments that assess the impact of VR training on learning and skills development. The study's primary purpose is to identify the extent to which VR training is conducive to learning and skills development. A secondary objective is to assess, to the extent possible, if VR training is also an efficient mechanism to deliver training. The analysis presented here relies on a total of 31 primary studies and over 90 different experiments. There has not been a systematic assessment of the effects of VR training on learning, other than in the field of surgical education in the early 2000s (Haque, & Srinivasan, 2006). This study was conducted with a limited number of studies and focused on assessing the extent to which VR training could help students perform surgical procedures faster (i.e. improve their time-in- task). To bridge the knowledge gap, this study focuses on a more recent time period (2005-2020), during which VR technology has significantly evolved, and covers other fields such as engineering, science, technical education, and general education. Moreover, this study analyzes the effects of VR more holistically as a mechanism to develop cognitive, technical, and socio- emotional skills. As such, the findings of this paper represent an essential contribution to the literature and intend to guide education institutions and policy makers to have more information about the effects of VR training as they expand their offer of digital learning opportunities to students. Based on the information available, our research shows that VR training is, on average, more effective than traditional training as a mechanism to develop students' technical, practical, and socio-emotional skills. Results are particularly promising in fields related to health and safety, engineering, and technical education. Results reveal that for each additional hour (¼ hour) of VR training, students score 3 percent higher in technical (cognitive) learning assessments than students exposed to the same curricular content delivered through traditional training methods. Results also indicate that students exposed to VR instruction report on average 30 percent higher scores in socio-emotional skills assessments after completing their training than their peers receiving traditional instruction. Results also suggest that students exposed to VR training are up to 30 percent more efficient using inputs, time, and/or avoiding performance errors than students exposed to traditional training, per additional hour of instruction. 4 The paper is structured as follows. Section II provides a review of the literature on the use of VR for pedagogical purposes. Section III presents the data and methodology used to conduct the meta-analysis. Section IV presents the results of the meta-analysis and intends to quantify the observed effects of VR training on students’ learning outcomes and skills development. The conclusion follows in Section V. II. Literature review Recent studies assessing the effect of the use of VR simulations in education show promising findings, in different areas, from increased time-on-task, enjoyment, motivation, and learning. Nonetheless, there is not a recent systematic analysis of the effects of VR training on student learning and skills development. A recent review of the literature (Kavanagh, Luxton- Reilly, Wuensche, & Plimmer, 2017) shows that VR simulations are used in many education fields, but primarily in health and surgical education, engineering and technical education, and general education (mainly in STEM related fields). Since each of these fields uses VR with slightly different pedagogical purposes, our literature review discusses each of these fields independently. 2.1. VR training for health and safety The use of VR training has shown great potential within the field of health, especially in the area of surgery, as it offers trainees the opportunity to practice several surgical procedures in a safe environment and at a comparatively low cost. Simulators provide excellent benefits to surgical trainees by allowing for repeated practice of a specific skill set in a controlled and safe environment, before ever entering the operating room. VR training allows developing surgical training experience that can enable junior trainees to undertake self-directed training while practicing and learning the fundamentals of surgery procedures without putting patients at risk and without needing supervision from an attending surgeon. Also, VR training can provide junior trainees relevant experience at an early stage in their surgical training while giving them an exposure to otherwise scarce educational resources, such as cadaveric parts (Zhao, Kennedy, Yukawa, Pyman, & O'Leary, 2011). A meta-analysis on the effects of VR training for surgery training was first made in 2006 (Haque & Srinivasan, 2006). While the study was limited in scope (only assessed the impacts of VR simulators on task completion time), it concluded that VR simulators did lessen the time trainees take to complete a given surgical task. 5 Other studies have shown that simulation-based training of surgical skills can improve medical personnel performance in the operating room and diminish complication rates related to inexperience (Gallagher et al., 2005). For instance, VR laparoscopic simulators and robotic surgery have been extensively used in health practice (Gurusamy, Aggarwal, Palanivelu, & Davidson, 2008; Valdis, Chu, Schlachta, & Kiaii, 2016). Laparoscopic and robotic surgery have become a standard approach for many surgical specialties, as they reduce patient’s surgical trauma, faster postoperative recovery, shorter hospital stays, and are associated with better cosmetic results. By using virtual reality simulator training, surgeons are expected to improve their proficiency and speed up their learning curve to master these procedures (Larsen et al., 2009). Similarly, the use of VR training for eye surgery as well as for other uncomfortable procedures for both the patient and the examiner, such as transvaginal examinations and infant sedation, has been promoted extensively in the medical practice (Chao, Chalouhi, Bouhanna, Ville, & Dommergues, 2015; Zaveri et al., 2016). Finally, the use of VR training for some procedures, such as bone surgery and total hip arthroplasty, has shown to be effective addressing limited access to resources that are necessary for making practical training possible, namely real human bones, as well as decreasing surgical errors (such as the incorrect alignment of the hip). 2 Safety and risk prevention are fields where VR training has also shown significant potential. As disasters and accidents are recurrent in all areas, training on safety and risk prevention is essential to mitigate their incidence and provide a rapid response and minimize casualties. VR training allows participants to emulate situations that may otherwise not be accessible with traditional learning methods. Immersive VR simulators have the potential to expose individuals to situations where high-level performance is critical but difficult to rehearse, such as mass disasters, evacuation drills, firefighting, and other hazardous or toxic conditions (Farra et al., 2018). Training emergency response personnel for catastrophes, for instance, is difficult due to the inability to replicate a given disaster environment comprehensively. In addition, there is an ethical concern about exposing trainees to the emotional and physical stresses encountered in real casualty situations (Andreatta et al., 2010). Available disaster drills often rely on mock patients, and they 2 Although cadaver temporal bones remain the gold standard of simulated training for temporal bone surgery, their increasing scarcity worldwide has meant that additional training tools are currently being explored. In addition to a shortage of cadaver bones, the increasing workload of attending surgeons has meant that the time that can be devoted to teaching and education has decreased. 6 can be very costly. Also, available disaster drills do not provide opportunities for on-demand repetitive training. In such contexts, VR training could represent a more cost-effective and accessible alternative than large-scale real-life exercises. 2.2. Engineering, science, and technical education Applications of VR training in engineering, science, and technical education have been most common in the fields of aviation, design, mechanics, industrial safety, and robotics (Buiu, & Gansari, 2014; Wei, Dongsheng, & Chun, 2013). In these fields, VR training provides students similar to real-life environments and access to state-of-the-art technology and equipment without the need to make significant capital investments in laboratories. When teaching engineering, science, and technical education, laboratory sessions constitute an essential part of the training. They provide hands-on experiences that allow students to learn the necessary skills required to manage, configure, troubleshoot, repair equipment, specialized instruments, and machinery. Laboratories enable students to practice and acquire skills before performing tasks in real professional situations. Nonetheless, many technical programs fall short in providing practical experience to students, as the set-up and functioning of laboratories require important capital investments in equipment, as well as in maintenance and updates. Also, the necessary equipment needed to perform hands-on labs is not always available or accessible, especially in developing countries and in rural areas. A proposed solution to address the difficulties to set-up laboratories is to substitute them with virtual laboratories. A virtual lab is an interactive simulation of a real lab. Virtual labs are essentially synthetic environments with attributes that include interactivity and real-time feedback (Lampi, 2013). The purpose of virtual labs is to develop student proficiency in the execution of practical skills. Virtual labs have traditionally been used in fields that require a skills proficiency that guarantees learners’ safety before they can operate real equipment. Pilot training, military equipment training, and nuclear power plant training have a long-documented history in utilizing virtual labs (Lampi, 2013). Moreover, virtual labs offer additional advantages such as remote access for distance education, low cost, reliability, security, flexibility, and convenience to the student. The authenticity of the learning experience in a virtual lab depends on the extent to which 7 the simulation causes learners to engage in cognitive processes comparable to those provided by a real laboratory. VR has been used to improve the fidelity of virtual labs as 3D environments, allowing the possibility of recreating real environments and even developing psychomotor skills when using virtual labs (Stone, Watts, & Zhong, 2011). 2.3. VR training for general education VR training has also been used to impart general education courses in several areas, such as STEM education, astronomy, anatomy, nurse education, and the arts (Kavanagh et al., 2017). Given its pedagogic potential and increasing market availability, it is crucial to examine the effectiveness of emerging VR technologies to deliver content to students in general education settings. Some advocates for using VR technologies for general instruction claim that immersive learning using VR brings motivational benefits that can lead to improved student learning (Makransky, et al., 2019). This is so because VR simulations can replace or amplify real-world learning environments by allowing students to interact and manipulate objects and parameters, thus promoting constructivist learning. VR simulations can also enable students to observe otherwise unobservable phenomena and provide students a higher sense of physical, environmental, and social presence. Students work harder when they are more interested in the material, either intrinsically (individual interest) or as elicited by the situation (situational interest) (Parong, & Mayer, 2018). Nonetheless, creating educational applications for VR could be a laborious and costly endeavor, so it is crucial to investigate whether these applications are useful for learning or not (Allcoat, & von Mühlenen, 2018). Unnecessary features introduced by VR simulations may hinder learning compared with traditional methods or compared to less sophisticated multimedia channels, such as videos and well-designed slideshows (Parong et al., 2018). Some authors claim that VR simulations may not adhere to the coherence principle of multimedia, which states that people learn better when extraneous words, sounds, and pictures are excluded from rather than included in the student learning environment. This occurs because VR simulations, especially those that are fully immersive, often add material and features (visual effects, sounds, detailed environments) which could divert attention from the important material. In other words, VR simulations may include content that is not relevant to the instructional goal. Given that learners have a limited amount of cognitive processing capacity, if VR simulations entail unnecessary 8 detail, learners may not engage adequately with the essential materials that trigger cognitive processing and learning (Mayer, 2009; Mayer, 2014). Finally, other authors claim that the usefulness of VR for general education might also depend on the type of subject of learning. Indeed, VR simulations may not necessarily be equally suitable for all subject areas. For example, it might be less beneficial for learning to play a musical instrument that requires tactile feedback, such as arts education. Still, it may be particularly helpful for teaching subjects where it is important to visualize the learning materials in 3D (e.g., biology or geometry) (Allcoat, & von Mühlenen, 2018). Other authors argue that technologies themselves, such as VR, do not directly cause learning but can afford specific tasks that themselves may result in learning (Dalgarno, & Lee, 2010). III. Data and Methodology A first step for conducting an informative meta-analysis is to gather relevant studies and accurately extract and report information from these primary studies (Uttl, White, & Gonzalez, 2017). Data were collected through a review of available studies assessing the impacts of VR training on learning and skills development. Studies included in the meta-analysis follow some predetermined criteria. First, they need to be published in a peer-reviewed journal or as a doctoral thesis, as a proxy for research quality. Second, to account for significant technological developments in access and quality of VR simulators (hardware and software), the sample only includes studies conducted within the last 15 years (2005-2020 period). Third, studies included in the meta-analysis assess the impact of VR training on student learning through value-added experiments or experimental evaluations that use randomized control trials (RTC). 3 Fourth, studies included in the sample assess skills development using objective and clearly measurable metrics, such as learning assessments or performance evaluations (pre- and post-test). The review of studies relied on web and academic databases, such as ACM Digital Library, IEEE Xplore, Web of Science, ERIC, and Scopus. Data obtained from these primary studies were compiled systematically, including the following information: 3 RCT experiments are defined as those where individuals are allocated at random (by chance alone) to receive one or several interventions. One of these interventions is the standard of comparison or control. The control may be a standard practice or no intervention at all. Value-added studies quantify changes in desired outcomes (for instance, skills development of student learning) by quantifying these outcomes before and after individuals benefit from the intervention. 9 • Year of implementation; • Field of study, using three main categories: (i) Health and safety, (ii) Virtual laboratories for engineering, science, and technical education; and (iii) General Education • Type of VR training used: (i) immersive; (ii) non-immersive • Beneficiary grade level: (i) Basic Education (K to 12); (ii) Technical-Vocational Education and Training (TVET), (iii) Higher Education, (iv) On-the-job training 4 • Number of individuals who participated in the study • Type of evaluation conducted: (i) RCT; (ii) Value-added • Type of skills assessed: (i) Cognitive Skills, understood as the acquired knowledge to understand and retain complex ideas, adapt effectively to the environment, learn from experience, and reason; (ii) Technical Skills, understood as the expertise and ability needed to perform a specific job, including the mastery of the materials, tools, or technologies, and time on task 5; and (iii) Socio-emotional skills, understood as the ability to navigate interpersonal and social situations effectively included leadership, teamwork, cooperation, self-control, self-confidence, self-efficacy, and grit • A description of the instrument and scale used to assess student’s skills • Evaluation results (e.g. results of pre and posttest and their statistical significance) • VR exposure time, in hours. The meta-analysis assesses three main outcomes of VR training courses: learning performance (L), value-added (VA), and learning efficiency (LE). While not all papers report all three outcomes, papers included in the analysis report at least one of them. Learning performance is quantified as the average percentage gain in test-scores obtained after the training is completed (i.e. % difference in posttest scores), between students who receive the VR training, or treatment group (T) and students who receive traditional training, or control group (C). Learning value added is defined as the net gains accrued as a result of the training, measured by differences in posttest and pretest scores that assess similar competences. Most studies report this information for the treatment group only, but a few allow to assess differences in learning value added between 4 TVET includes technical basic education, technical higher education, and vocational training programs. Higher Education includes academic undergraduate and graduate programs. 5 Time on task refers to the time spent to successfully complete a task or procedure. 10 students in the treatment and control groups. Learning efficiency is measured as the % difference in inputs or time utilization (such as training time, time-on-task, materials used) between students exposed to VR training vs. students exposed to traditional training methods. These outcomes are defined as follows: − = (1) − = , with k = [T, C] (2) − = (3) Since not all test-scores in (1) and (2) reported in the studies use a similar metric, a first step to assure comparability of outcomes across studies is to conduct a monotonic transformation to normalize all test-scores (r) between zero and one and into a comparable metric (s), as follows: − = − , with k = [T, C] (4) Where min and max are the minimum and maximum allowed values for the test-score of the original metric used. Since each study (i) included in the meta-analysis display important differences in sample size (N), intervention exposure times in hours (E), and type of skills assessed (j) (cognitive, technical, and socio-emotional); we use weights in our estimations to take into account that larger sample size studies and higher intervention exposure are associated with less sampling error than studies with smaller size and less intervention exposure. As such, to assess average effects by skill type, we compute for each of the outcomes (O) assessed, a sample weighted average and a sample weighted variance, as follows: � = ∑ ,×, ×, (5) ∑ ∑ , , � , )2 (, ×, )×(, − 2 = ∑ ∑ , ∑ , , with O = [L, VA, LE] (6) Since the main objective of the study is to identify the extent to which VR contributed to learning and skills development, the instruments used by the primary studies to assess learning and 11 skills are of prominent importance. The assessment of technical skills often relies on direct observation of the trainee’s performance on a predetermined task or procedure. Expert practitioners conduct observations and evaluate student performance based on a series of pre- determined metrics. Observations tend to be anonymous to prevent the observer’s bias. The protocols for the observation and the metrics to assess performance are often available and previously applied by the industry to certify professional skills, especially in the fields of medicine and engineering. The assessment of cognitive skills is generally measured using standardized tests, developed by professors and practitioners based on the curricula imparted in the training course. Finally, the assessment of socio-emotional skills is mainly conducted through students’ self- reported perceptions of self-efficacy and attitudes towards learning. Tables A2, A3 and A4 in the Annex provide a detailed description of the instruments used by different experiments to assess skills proficiency. 3.1. Descriptive statistics After conducting a thorough review of the literature, a total of 31 primary studies met the criteria specified above. Most studies (29) were conducted in OECD countries, notably in the United States, the United Kingdom, and Canada (18 of 31). While many studies attempt to assess the effects of VR on learning and skills development, not all conduct a credible evaluation of the impacts and, among those that do, many do not report complete information on the impacts of the VR training and their statistical significance. Nonetheless, the studies that met the criteria (31) include 92 different experiments that assess the effects of VR training on students’ skills development. Detailed information about each experiment is provided in Tables A1 to A12 in the Annex section. Figures 1 and 2 present descriptive statistics of the 92 experiments included in the meta- analysis. A total of 78 experiments assess the impacts of VR training on learning outcomes using RCTs. Most experiments (79 of 92) assess immersive VR training. A total of 50 experiments were conducted in higher education settings, while 37 experiments studied the effects of VR training on cognitive skills and 29 studied the effect on technical skills. Only 13 experiments were conducted in basic education settings (k to 12). In terms of the educational field, the experiments were evenly distributed in health and safety (35) and virtual labs for engineering, science, and technical education (35). A total of 22 experiments focused on general education (Figure 1). While 12 categorizing experiments across education fields was straightforward for those experiments related to health and safety, there were some topics that overlapped between experiments in the fields of engineering, science, and technical education with those pertaining to general education. The determining factor to sort these studies in one of these two field, was the use or not of a virtual laboratory. If the training imparted aimed to emulate and/or substitute for a real laboratory, the experiment was included in the field of engineering, science, and technical education – independently of the subject (see Table A1). Figure 1: Descriptive statistics of the primary experiments in the meta-analysis. Total number of experiments (N=92) Impacts assessed using RCTs 78 Immersive VR 79 Basic education (K to 12) 13 Grade Level TVET / OJT 29 Higher education 50 Efficiency 12 Socio-emotional 14 Type of Technical 29 skill Cognitive 37 assessed General education 22 Health and safety 35 Field of Engineering, science, and TVET 35 study 0 10 20 30 40 50 60 70 80 90 Source: Author’s elaboration Developing comparable and fair metrics constitutes a critical aspect to accurately assess the impacts of VR training on learning and skills development. As mentioned above, in order to provide a fair assessment of the available literature, the meta-analysis weights results based on two aspects, notably, intensity of the treatment (i.e. exposure time to VR training) and experiment sample size (Figure 2). These two variables display important variations in the primary studies included in the analysis. While half of the experiments expose students to more than one hour of VR training, 20 experiments report very short VR exposure (less than 15 minutes), while 22 report an exposure that surpasses 5 hours. Moreover, while most experiments included in the analysis are 13 medium size (benefiting between 21 and 100 students), some experiments have very limited sample sizes, of fewer than 20 beneficiaries (16 experiments in total), while others (9) include larger scale experiments that reach more than 100 beneficiaries. Figure 2: Exposure time and sample size of the primary studies in the meta-analysis Total number of experiments (N=92) 45 41 40 Number of experiments Number of experiments according according to the experiment to the VR exposure time 35 sample size 30 26 25 25 22 20 20 16 15 12 13 9 10 5 0 0 to 20 21 to 50 51 to 100 > 100 0 to 15 16 to 30 31 to 59 1 to 5 > 5 hours minutes minutes minutes hours Source: Author’s elaboration Figure 3 provides a list of all studies included in the meta-analysis, as well as information about their relative weight (i.e. sample size multiplied by exposure time, normalized to 100%), education field, and training topic. Results in the figure show studies in the field of engineering, science, and technical education tend to provide students with longer VR exposure and be conducted at a larger scale. As such, these studies are given a higher relative weight in the analysis. Finally, studies included in the meta-analysis cover a diverse range of topics, from surgical education to welding. General education topics are also studied, such as from frog dissection and job interview training. In the field of health and safety, most studies included in the analysis pertained to surgical education (laparoscopic surgery, bone dissection, robotic cardiac surgery, robotic suturing, cataract surgery, and hip arthroplasty), although studies in other areas were also included, such as studies pertained to safety and risk prevention (2), one study on medical procedures in gynecology, and one study on nursing education. 14 Figure 3: Studies in the meta-analysis according to their relative weight (in %) Stone et al.(2011) Welding Oser (2013) Genetics Yang & Heh (2007) Physics Tatli and Ayas (2013) Chemistry McLaurin & Stone (2012) Welding Tschannen et al.(2012) Nursing education Smith (2015) Job interviewing Hwang & Hu (2013) Geometry Lampi (2013) PC network configuration Finkelstein et al.(2005) Circuits design Valdis et al.(2015) Robotic surgery Tanyildizi & Orhan (2007) Synchronous motors Kiely et al.(2015) Robotic suturing Farra et al.(2018) Emergency evacuation Kockro et al. (2015) Anatomy of the heart Logishetty (2018) Hip surgery Zhao et al.(2011) Bone dissection Rupasinghe et al.(2011) Corrosion prevention Webster et al. (2015) Corrossion prevention Skou-Thomsen et al.(2017) Cataract surgery Makransky et al.(2019) Biology - mammalian proteins Chao et al.(2015) Gynecologic ultrasound Larsen et al.(2009) Laparoscopic surgery Alhalabi (2016) General Science Allcoat et al. (forth) Science - solar panels Allcoat & von Mühlenen (2018) Biology - plant cells Akpan & Strayer (2010) Frog dissection Parong & Mayer (2018) Biology - human cells Buttussi et al.(2018) Aviation safety Zaveri et al.(2016) Pediatric sedation Vincent et al. (2008) Emergency triage -15.0 -10.0 -5.0 0.0 5.0 10.0 15.0 20.0 25.0 30.0 35.0 Note: studies in bold are in the field of engineering, science, and technical education and underlined studies are in the field of health and safely. Other studies are in the field of general education. Most studies included in the field of health and safety focused on assessing the impact of VR training on students’ technical skills. In the field of engineering, science, and technical education, all studies are focused on assessing the effectiveness of virtual labs (e.g. for welding, physics, chemistry) to develop students’ practical and cognitive skills. Most studies pertaining 15 general education were in STEM related fields and focused on assessing the effects of VR training on students’ cognitive skills vis a vis otherwise similar content imparted through more traditional mechanisms, such as slideshows and videos (see Table A5 in the annex). IV. Main results This section presents the main results of the meta-analysis. Table 1 presents the observed average effects of VR training on student learning performance, as proxied by the % difference in posttests between students exposed to VR training vs. students exposed to traditional training. The effects are calculated based on information of a total of 42 available experiments, and account for the experiments’ sample size and VR exposure time. To foster comparability, we report the effects of VR training on students' technical skills per one hour of training and the effects of VR training on students' cognitive skills per quarter-hour of training. This reporting choice reflects that many experiments that aim to improve students' cognitive skills are of minimal duration (in some cases, they entail an exposure of fewer than five minutes), which would be difficult and misleading to extrapolate more extendedly. Table 1. Average effects of VR training on student learning performance % Difference in posttests results (VR vs traditional training) Technical skills Cognitive skills Socio-emotional skills (effect per hour of training) (effect per ¼ hour of (effect per training training) course) Average impact 2.95 2.51 29.8 Standard Deviation 10.1 8.07 - Number of experiments 15.0 24.0 3.0 Registering a positive effect 7.0 17.0 2.0 Registering negative effect 0.0 1.0 0.0 Registering no effect 19 8.0 6.0 1.0 Average VR exposure time (hours) 8.8 1.93 22.0 Average experiment size 57.3 89.6 80.0 Source: Author’s elaboration. A total of (15) 26 out of the 42 experiments included in the meta-analysis show a (neutral) positive effect of VR training on learning performance. Only one experiment indicates that VR training is associated with lower learning performance when compared to traditional training. Findings emanating from these experiments indicate that, on average VR training is indeed more effective than traditional training as a mechanism to develop technical, practical, and socio- emotional skills. Results are particularly promising in fields related to health and safety, 16 engineering, and technical education. Results reveal that for each additional hour (¼ hour) of training, students exposed to VR training score 3 percent higher in technical (cognitive) learning assessments, when compared to students exposed to the same curricular content delivered through traditional training methods. Results also indicate that students who complete a VR training course report, on average, 30 percent higher scores in socio-emotional skills assessments. Table 2 presents the observed average effects of VR training on learning value added, as proxied by the % difference between posttests and pretests of students exposed to VR training. These results are indicative of the capacity of VR training to positively improve students’ skills. The effects are calculated based on information of a total of 27 available experiments, and account for the experiments’ sample size and VR exposure time. Results indicate that, on average, VR training contributes to gains in technical (practical) skills that average 17 (8) percent per hour (¼ hour) of training. Moreover, students who completed a VR training course, report on average, gains averaging 20.5 percent in self-reported socio-emotional skills. Table 2. Average effects of VR training on student learning value added % Difference between posttests and pretests (VR only) Technical skills Cognitive skills Socio-emotional skills (effect per hour of effect per ¼ hour of (effect per training training) training) course) Average impact 17.3 8.0 20.5 Standard Deviation 20.9 31.3 26.1 Number of experiments 10.0 7.0 10.0 Registering a positive effect 9.0 7.0 7.0 Registering negative effect 0.0 0.0 0.0 Registering no effect 1.0 0.0 3.0 Average VR exposure time in hours 3.5 2.42 0.24 Average experiment size 23.4 89.0 39.4 Source: Author’s elaboration. Average results hide important patterns that arise when assessing the effects of VR training by education type of skill and education field. Results in the next subsections are organized by the type of skill assessed by the available experiments (technical, cognitive, and socio-emotional) and, within each skill, if there are any observed patterns, there is brief discussion of the impacts of the training by education field. 4.1. Impacts of VR training on technical skills Technical skills are often measured using ability tests, whereby students are required to perform specific task and are graded based on how their performance fares against predetermined 17 standards. The meta-analysis includes a total of 14 experiments that assess student learning of technical skills, proxied as the % difference observed in posttests per hour of training received between students who participated in VR training, or treatment group, and students who participated in traditional training, or control group. The results of these experiments are presented in Figure 4. Results in the chart provide information about the authors who conducted the experiment and the field and topics of the training. Figure 4: % Difference in posttests between students exposed to VR vs non-VR training per hour of training [Technical Skills] Chao et al. (2015) Gynecologic ultrasound Larsen et al. (2009) Laparoscopic surgery Logishetty (2018) Hip arthroplasty surgery Farra et al. (2018) Emergency evacuation Valdis et al. (2015) Robotic heart surgery Tschann et al. (2012) Nursing procedures Osner (2013) Genetics McLaurin & Stone (2012) Vertical groove weld (3G) McLaurin & Stone (2012) Vertical filet weld (3F) McLaurin & Stone (2012) Flat groove weld (1G) McLaurin & Stone (2012) Horizontal filet weld (2F) Lampi (2013) PC Network configuration Lampi (2013) PC Network troubleshooting Zaveri et al. (2016) Pediatric sedation -60.0 -40.0 -20.0 0.0 20.0 40.0 60.0 80.0 100.0 120.0 Note: studies in bold are in the field of engineering, science, and technical education. Underlined studies are in the field of health and safely. Other studies are in the field of general education. The bold vertical dotted line represents the average observed effects. Gray vertical dotted lines represent the standard deviation of the observed effects. Not surprisingly, experiments that assess technical skills pertain mainly to the fields of health and safety (notably surgery performance) and engineering, science, and technical education (i.e. they relate to virtual labs for training students to perform technical tasks). Results indicate that for experiments in the field of health and safety, VR training is generally associated with posttest scores in trainees’ ability assessments that are 20 to 60 percent higher per every extra hour of instruction than those observed in students exposed to traditional training. However, this is not the case for studies in engineering, science, and technical education where students who are exposed to VR laboratories fare just as well as students who access traditional laboratories. 18 At the aggregate level, the average difference in post-test scores between students in the treatment and control groups is 3 percent per additional hour of training, with a standard deviation of 10 percent. This average effect accounts for the fact that experiments in the field of health and safety, where impacts tend to be positive and large, are generally limited in terms of sample size and exposure time to VR training (see Table A1). The fact that students exposed to VR training in the fields of engineering, science, and technical education do not display higher posttest scores than students exposed to traditional training does not indicate that VR training in these fields is not effective. What the result suggests is that VR training is as effective as traditional training methods. As will be discussed below, when the effects of VR training on learning efficiency are assessed, this result is quite relevant because VR training for technical education could be more cost-effective that traditional training in cases where it is cheaper and safer to use simulators than traditional laboratories that are expensive to set up, maintain, and update. The meta-analysis also includes 10 experiments, most of them in the field of health and safety, that assess the effects of VR training on student learning value added, as proxied by the % difference in post-test minus pre-tests for students who participate for VR training (Figure 5). Figure 5: Value added (% difference between posttests and pretests) per hour of training for students in the Treatment group [Technical Skills] Vincent et al. (2008) Accuracy in emergency triage Vincent et al. (2008) Emergency triage Kiely et al.(2015) Robotic suturing - Satisfactory knots Skou-Thomsen et al.(2017) Cataract surgery - intermediate MDs Skou-Thomsen et al.(2017) Cataract surgery - Novice MDs Kiely et al.(2015) Robotic suturing - Total knots Kiely et al.(2015) Robotic suturing - GEARS test Kiely et al.(2015) Robotic suturing - GOALS + test Smith (2015) Job Interview performance Skou-Thomsen et al.(2017) Cataract surgery - Experienced MDs -80.0 -30.0 20.0 70.0 120.0 170.0 220.0 Note: studies in bold are in the field of engineering, science, and technical education. Underlined studies are in the field of health and safely. Other studies are in the field of general education. The bold vertical dotted line represents the average observed effects. Gray vertical dotted lines represent the standard deviation of the observed effects. 19 Results systematically confirm the effectiveness of VR training to develop technical skills in the field of health and safety. Available experiments indicate that one hour of VR training increases students net learning outcomes by 17 percent on average (with a standard deviation of 18 percent). Some results even indicate that VR training can be conducive to double student learning gains, especially in topics such as emergency response, where it is otherwise hard to provide students with access to real emergency situations. Results also show that the effects of VR training on value added can vary depending on the seniority of the trainees. For instance, results in Skou-Thomsen et al. (2017) indicate that VR training is conducive to higher learning gains for novice and intermediate surgeons. However, such training shows no statistically significant learning gains for more experienced surgeons. Finally, the meta-analysis results include 4 experiments that assess differences learning value added between students exposed to VR training vs. students exposed to traditional training. In the field of robotics surgery, these experiments suggest that students exposed to VR training display 28 percent higher learning gains per hour of training on average than students exposed to traditional training (Figure 6). Figure 6: % Difference in value added per hour of training (VR vs. Non-VR) [Technical Skills] Kiely et al.(2015) Robotic suturing - GOALS + test Kiely et al.(2015) Robotic suturing - GEARS test Kiely et al.(2015) Robotic suturing - Satisfactory knots Kiely et al.(2015) Robotic suturing - Total knots -50.0 -30.0 -10.0 10.0 30.0 50.0 70.0 90.0 110.0 Note: studies in bold are in the field of engineering, science, and technical education. Underlined studies are in the field of health and safely. Other studies are in the field of general education. The bold vertical dotted line represents the average observed effects. Gray vertical dotted lines represent the standard deviation of the observed effects. 4.2. Impacts of VR training on cognitive skills Cognitive skills are generally measured using standardized tests, whereby students are required to answer a set of questions based on the curricula imparted. The meta-analysis includes a total of 24 experiments that assess the impact of VR training on students’ cognitive skills, proxied as the % difference observed in posttests per every ¼ hour of training received between students 20 who participated in VR training, or treatment group, and students who participated in traditional training, or control group. Contrary to VR training experiments aimed to develop technical skills, available experiments seeking to improve cognitive skills tend to have shorter exposure times to the technology (See Table A1 in the annex). As a result, as mentioned before, while we assess the impacts of VR training per hour of instruction when courses aim to develop technical skills, when developing cognitive skills, it seems more adequate to assess the effects per ¼ hour of instruction. Once factoring out the relative weight of each experiment, results of available experiments reveal that the VR training is associated with a 3 percent higher learning than traditional training per every ¼ hour of instruction, with a standard deviation of 8 percent. Most experiments assessing cognitive skills pertain to the fields of general education (in topics related to STEM), as well as virtual laboratories. The results of these experiments are presented in Figure 7. While results from available experiments display some dispersion, most experiments (18 out of 24) indicate that VR training has positive impacts in student learning. In some experiments (8 out of 24), these impacts are quite high and show that students exposed to VR training have results in cognitive assessments that are 20 to 80 percent higher than those of students exposed to traditional training methods per every additional ¼ hour of instruction. Nonetheless, a total of 6 experiments show no significant effects of VR training on learning vis a vis traditional instruction. Only one study, in the topic of biology, finds that the effect of VR training on learning is negative (Parong & Mayer, 2018). When assessing by field of study, the results presented in Figure 7 indicate that that students trained in virtual laboratories generally display higher cognitive learning (proxied by test scores) than students exposed to traditional laboratories. The intuition behind this result is that virtual labs allow for illimited repetition of experiments, are self-paced, and generally provide direct feedback to students. Such features are particularly useful for student learning in topics that require understanding of abstract concepts, such as physics (Yang, & Heh, 2007). Results do not reveal a clear pattern when it comes to general education. Most related experiments aim to assess the impacts VR training has on learning compared to other more traditional instruction methods such as lecture and/or classes that use other type multimedia aid such as a video, a textbook, or a slideshow (Table A5 in the annex provides more details). Some studies indicate that students who receive VR training perform better in cognitive assessments than 21 students exposed to a traditional lecture or videos (Alhalabi, 2016; Allcoat, & von Mühlenen, 2018). Figure 7: % Difference in posttests between students exposed to VR vs non-VR training [Cognitive Skills] Allcoat & von Mühlenen (2018) Plant cells (VR vs Video) Alhalabi (2016) Famous inventors Allcoat & von Mühlenen (2018) Biology concepts (VR vs Video) Alhalabi (2016) Networking Allcoat & von Mühlenen (2018) Biology concepts (VR vs Text) Alhalabi (2016) Transportation Akpan & Strayer (2010) Anatomy Alhalabi (2016) Anatomy Rupasinghe et al.(2011) Use of boroscope (oral test) Finkelstein et al.(2005) Circuits (essay) Finkelstein et al.(2005) Circuits (test) Tatli and Ayas (2013) Chemistry Rupasinghe et al.(2011) Use of Eddy labs (oral) Tanyildizi & Orhan (2007) Synchronous motors Farra et al. (2018) Emergency Preparedness Hwang & Hu (2013) Science Yang & Heh (2007) Synchronous motors Yang & Heh (2007) Synchronous motors Kockro et al. (2015) Anatomy Allcoat & von Mühlenen (2018) Plant cells (VR vs Text) Rupasinghe et al.(2011) Use of Eddy labs (written) Rupasinghe et al.(2011) Use of boroscope (written) Tatli and Ayas (2013) Chemistry Parong & Mayer (2018) Biology -80.0 -60.0 -40.0 -20.0 0.0 20.0 40.0 60.0 80.0 100.0 120.0 Note: studies in bold are in the field of engineering, science, and technical education. Underlined studies are in the field of health and safely. Other studies are in the field of general education. The bold vertical dotted line represents the average observed effects. Gray vertical dotted lines represent the standard deviation of the observed effects. Other studies (Allcoat, & von Mühlenen, 2018; Hwang, & Hu, 2013) indicate that students who receive VR training perform just as well as students that use textbooks to complement traditional lectures. Allcoat and von Mühlenen (2018) and Hwang and Hu (2013) find that students exposed to VR training memorize better the parts of a plant cell and display better skills to calculate area and volume of 3D figures than students exposed to similar content but whose lessons were imparted using more traditional learning methods, such as videos and lectures. Parong & Mayer (2018) find that students exposed to slideshows perform better than students exposed to VR training on subjects such as biology. Finally, Kockro et al., 2015 found no significant difference 22 in students’ knowledge of the human ventricular system’s anatomy after providing them with a training course using VR content versus a traditional slideshow. The meta-analysis also includes 7 experiments that assess the effects of VR training on learning value added, as proxied by the % different in post-test minus pre-tests for students who participate for VR training (Figure 8). Results in all available experiments are indicative that VR training is conducive to positive learning gains for the development of cognitive skills. Such gains in some experiments can be as high as 2.5 times the baseline cognitive knowledge, as proxied by standardized tests. Nonetheless, experiments with such acute increases in learning gains are often very small in size and limited in terms in VR exposure (i.e. VR training of less than 5 minutes) (Allcoat, 2021; Buttussi, & Chittaro, 2018). Once accounting for experiment size and exposure time, results indicate that VR training contributes to student learning gains averaging 8 percent per ¼ hour of training, with a rather large standard deviation of 31 percent. Figure 8: Learning value added (% difference between posttests and pretests) per ¼ hour of training for students in the Treatment group [Cognitive Skills] Allcoat et al. (forth) Science - solar panels Buttussi et al.(2018) Aviation safety - VR wide view Buttussi et al.(2018) Aviation safety - VR narrow view Akpan & Strayer (2010) Science - anatomy Webster et al. (2015) Science - corrossion prevention Tatli and Ayas (2013) Chemistry - lab instruments Tatli and Ayas (2013) Chemistry - chemical changes -120.0 -70.0 -20.0 30.0 80.0 130.0 180.0 230.0 280.0 330.0 380.0 Note: studies in bold are in the field of engineering, science, and technical education. Underlined studies are in the field of health and safely. Other studies are in the field of general education. The bold vertical dotted line represents the average observed effects. Gray vertical dotted lines represent the standard deviation of the observed effects. Finally, results in the meta-analysis include 5 experiments that assess differences in learning value added between students exposed to VR training vs. students exposed to traditional training. Results are mixed. Some experiments indicate that students exposed to VR training experiment lower learning gains than students exposed to traditional training (Zaveri et al., 2016; Makrasky et al., 2019). Other experiments indicate that VR training contributes to similar or higher 23 learning gains than traditional training (Allcoat, 2021; Farra et al., 2018; Tanyildizi & Orhan, 2007). Nonetheless, when accounting for experiment size and exposure time, the average observed differences in learning gains per ¼ hour of training between students exposed to VR vis a vis those exposed to traditional learning is close to zero (Figure 9), indicating that VR training is, on average, as effective a mechanism to enhance student’s cognitive skills when compared to traditional training. Nonetheless, due to the limited number of experiments that assess learning value added across VR and non-VR recipients, results need to be used with care and may not allow to adequately generalize. Figure 9: % difference in value added per ¼ hour of training (VR vs. Non-VR) [Cognitive Skills] Tanyildizi & Orhan (2007) Synchronous motors Farra et al.(2018) Emergency evacuation Allcoat et al. (forth) Science - solar panels Makransky et al.(2019) Biology - mammalian proteins Zaveri et al. (2016) Pediatric sedation -50.0 -40.0 -30.0 -20.0 -10.0 0.0 10.0 20.0 30.0 40.0 Note: studies in bold are in the field of engineering, science, and technical education. Underlined studies are in the field of health and safely. Other studies are in the field of general education. The bold vertical dotted line represents the average observed effects. Gray vertical dotted lines represent the standard deviation of the observed effects. 4.3. Impacts of VR training on socio-emotional skills The assessment of socio-emotional skills is often conducted through students’ self-reported perceptions of self-efficacy and attitudes towards learning. Such perceptions are often quantified using a Likert scale type of questionnaire that students complete before and/or after they complete their training (see Tables A2 to A4 in the annex). Such perceptions are often indicative of the main channels that explain why VR training can be more effective and more conducive to learning than traditional training. The meta-analysis includes a total of 13 experiments that assess the effects of VR training on the effects of students’ socio-emotional skills. 24 Figure 10: % Difference in posttests per course (VR vs non-VR training) [Socio-emotional Skills] Stone et al.(2011) Collaborative learning Stone et al.(2011) Open communication Stone et al.(2011) Improvement seeking -60.0 -40.0 -20.0 0.0 20.0 40.0 60.0 80.0 100.0 120.0 Note: studies in bold are in the field of engineering, science, and technical education. Underlined studies are in the field of health and safely. Other studies are in the field of general education. The bold vertical dotted line represents the average observed effects. Gray vertical dotted lines represent the standard deviation of the observed effects. Results from Stone et al. (2011) indicate that students who receive welding training in virtual labs report higher self-reported peer communication and collaborative learning in post-test than students who receive traditional training. This may occur because virtual laboratories can allow for digital peer interactions, which can be repeated multiple times, and allow for real-time / automated feedback to students. However, the authors do not find significant differences in student self-reported improvement seeking of students who received their training in a virtual lab (Figure 10). The remaining 10 experiments assess the effects of VR training on student learning self- reported skills value added, as proxied by the % difference in post-test minus pre-tests for students who participate for VR training (Figure 11). Available results indicate that students who complete VR training tend to report 20% higher levels of confidence and self-efficacy towards learning after they complete their courses. For instance, results from Vincent et al. (2008) indicate that after being exposed to VR training, students report higher levels of confidence in several dimensions (e.g. being effective, making decisions, and using resources) when addressing a health emergency. Interestingly, results for Buttussi et al. (2018) indicate that such self-reported increases in confidence can vary using different types of VR technology. In particular, the authors find that students exposed to narrow view VR technology (which allows them to focus more on features of the simulation) tend to report higher increases in self-reported self-efficacy addressing a plane emergency than students exposed to wide VR technology (which may expose students to more visual distractions). 25 Figure 11: Learning value added (% difference between posttests and pretests) per completed VR training course [Socio-emotional Skills] Buttussi et al.(2018) Self-efficacy assessing an emergency (VR narrow view) Confidence on ability to prioritize Vincent et al. (2008) resources Self-efficacy addressing an emergency Buttussi et al.(2018) (VR Wide view) Vincent et al. (2008) Confidence on ability to identify risks Confidence on ability to prioritize Vincent et al. (2008) treatment Confidence that others will consider Vincent et al. (2008) me effective Vincent et al. (2008) Confidence on being effective Akpan & Strayer (2010) Attitude towards use of PC Akpan & Strayer (2010) Attitude towards science Akpan & Strayer (2010) Attitude towards frog dissection -30.0 -10.0 10.0 30.0 50.0 70.0 90.0 110.0 Note: studies in bold are in the field of engineering, science, and technical education. Underlined studies are in the field of health and safely. Other studies are in the field of general education. The bold vertical dotted line represents the average observed effects. Gray vertical dotted lines represent the standard deviation of the observed effects. Finally, results from Akpan and Strayer (2010), which use VR training to simulate a frog dissection, do not find a significant effect of the use of VR training in students change in attitude towards learning anatomy and/or using PC assisted instruction. 4.4. Impacts of VR training on learning efficiency In the context of this study, learning efficiency is defined as any savings in the form of inputs, time, or performance errors that VR training could contribute to. One of the promises of using simulators vis-a-vis traditional instruction is their potential to save training costs and minimize the risks and errors faced when novice students intend to master some skills they will use in real life. Ideally, VR training should contribute to more efficient use of inputs, more expedited completion of tasks (or time-in-task), and fewer performance errors. The meta-analysis includes a total of 12 experiments that study the effects of VR training on the utilization of inputs (such as materials and time to complete a task) and performance errors. A total of 11 out of 12 experiments included in the meta-analysis find that VR training is associated with higher learning efficiency levels when compared to traditional training. In fact, experiment results indicate that, on average, students who are exposed to VR training are, on average, up to 30 percent more efficient (using inputs, time, and/or avoiding performance errors) than students exposed to traditional training per additional hour of instruction. Results from available experiments indicate 26 that VR training can help welding students to be more efficient using materials, such as plates and electrodes (Stone et al. 2011) and can expedite the time students take to perform surgical procedures such as laparoscopy surgery (Larsen et al. 2009), robotic heart surgery (Valdis et al., 2015) and hip arthroplasty (Logishetty, Rudran, & Cobb, 2018). Results from Logishetty, Rudran, & Cobb (2018) also indicate that students who are exposed to VR training are less likely to make mistakes when performing real surgery procedures (see Table A12 in the annex). Of course, due to the limited number of experiments used to draw these conclusions, these results need to be interpreted with care and should not be generalized. Nonetheless, these findings are indicative of the potential of VR simulators to be not only an effective, but also an efficient learning mechanism, especially in the fields of health and safety and technical education. Figure 12: % decrease in inputs / performance errors per 1 hour of VR training compared to traditional training [Learning Efficiency] Lampi (2013) Configuration time - min Lampi (2013) Troubleshooting time - min Stone et al. (2011) Number of groove plates used Finkelstein et al. (2005) Time on task - min Stone et al. (2011) Number of plates used Stone et al. (2011) Pounds of electrodes used Valdis et al. (2015) Time on task - min Logishetty (2018) Operation time - min Larsen et al. (2009) Time on task - min Valdis et al. (2015) Time on task - min Logishetty (2018) Anteversion error - degrees Logishetty (2018) Inclination error - degrees -120.0 -100.0 -80.0 -60.0 -40.0 -20.0 0.0 20.0 40.0 60.0 Note: studies in bold are in the field of engineering, science, and technical education. Underlined studies are in the field of health and safely. Other studies are in the field of general education. The bold vertical dotted line represents the average observed effects. Gray vertical dotted lines represent the standard deviation of the observed effects. V. Conclusions The development of students’ skills remains a challenge for education systems worldwide. To address this challenge, educators are beginning to explore the possibility of using information technology to create learning experiences that would otherwise not be accessible to students. Simulations that rely on VR technology can provide students access to learning environments that 27 would otherwise be very difficult, expensive, or impossible to access. VR simulations can provide students practical training opportunities without pressure, danger, and allowing repeated practice. Such opportunities have the potential of accelerating students’ learning curve in a simulated environment, reproducing real-life conditions and situations without time or space limitations, and with much fewer risks. This study constitutes an attempt to assess the effects of VR instruction holistically, as a mechanism to develop students’ skills. Given its pedagogic potential and its increasing market availability, it is crucial to examine the effectiveness of emerging VR technologies for pedagogical instruction. Creating educational applications for VR could be a laborious and costly endeavor, so it is essential to investigate whether these applications are useful for learning or not and, to the extent possible, to assess their cost-effectiveness. Results capitalize from a thorough review of 31 primary studies and over 90 experiments that intend to assess the effects of VR instruction on student learning. Our findings reveal that VR instruction is, on average, more effective than traditional training as a mechanism to develop students’ skills. Results indicate that for each additional hour (¼ hour) of training, students exposed to VR training score 3 percent higher in technical (cognitive) learning assessments, when compared to students exposed to the same curricular content delivered through traditional training methods. Results also indicate that students exposed to VR instruction report, on average, 30 percent higher scores in socio-emotional skills assessments after completing their training. Results are particularly promising in fields related to health and safety, engineering, and technical education. Results from available experiments confirm systematically that VR instruction yields positive results as a mechanism to train surgeons and medical personnel. It offers trainees the opportunity to practice medical procedures safely and at a comparatively low cost. Available experiments confirm VR simulators' effectiveness to improve surgeons’ proficiency to perform procedures such as laparoscopic surgery, robotic surgery, eye surgery, transvaginal examinations, infant sedation, and bone surgery, to name a few. Some results even indicate that VR training can be conducive to much higher student learning gains, especially in topics such as emergency response, where it is otherwise hard to provide students access to real emergencies. VR training can provide students similar-to real-life laboratories and equipment without making significant capital investments. Available experiments show that virtual laboratories can 28 be as effective as real laboratories to develop students’ skills, but they can be a more efficient, safe, and cost-effective mechanism of instruction. Our findings indicate that students exposed to VR training are up to 30 percent more efficient using inputs, time, and/or avoiding performance errors than students exposed to traditional training, per additional hour of instruction. The intuition behind this result is that virtual labs allow for illimited repetition of experiments, are self-paced, and generally provide direct feedback to students. Nonetheless, results do not reveal a clear pattern when it comes to the use of VR instruction for general education. Some studies indicate that students who receive VR training perform better in cognitive assessments than students exposed to a traditional lecture or videos. Other studies indicate that students exposed to other less expensive multimedia platforms, such as slideshows and videos, learn more than students exposed to VR training. As such, VR instruction may not be adequate as a mechanism for instruction in all educational fields. Indeed, VR training may provide students too much information, which may deviate their attention from the aspects of the curricula that matter most. VR training can also help students develop their socio-emotional skills. Simulations can develop and promote collaborative features that enable peer interaction, active learning, and performance feedback. In turn, these features can promote student motivation and presence, which are two channels that can positively influence student learning. Available experiments show that students exposed to VR training report higher peer communication and collaborative learning in standardized assessments, than students who receive traditional training. Available experiments also show that students who complete VR training report higher confidence and self-efficacy towards learning after completing their courses. It will be essential to continue to assess the cost-effectiveness of VR training, which is something beyond the scope of this study. While VR training's cost-effectiveness is likely to vary depending on many parameters such as course duration, cost of the actual equipment that VR intends to simulate, educational field, risks of making mistakes in a non-simulated environment, and type of technology used, it is not always assured. Indeed, this type of instruction could be cost- effective only if it provides savings or reduces potentially expensive risks compared to other alternative multimedia or traditional laboratories. Imparting VR courses entails software development and equipment, maintenance, support, and updates, which require sustained 29 investments. To date, not many studies assessing the effects of VR training have focused on conducting a cost-benefit or cost-effectiveness analysis of VR instruction compared to traditional training methods. Having more such information will be crucial to assess the scalability potential of VR training across education systems. Finally, this study's results primarily draw their conclusions based on experiments in developed countries. As such, these results may not necessarily hold in all educational settings because several factors necessary for VR training to succeed (e.g., connectivity, availability of equipment and IT support, students' and teachers' dominium of necessary digital skills, among others) are not necessarily assured in many education institutions in developing countries. In summary, this study finds that VR training tends to be an effective mechanism of instruction to develop students’ technical, practical, and socio-emotional skills. Nonetheless, these results cannot be generalized, and it is important to continue to assess the pros and cons of using VR for pedagogical instruction for different subjects as well as its cost-effectiveness and scalability. 30 References Akpan, J., & Strayer, J. (2010). Which comes first the use of computer simulation of frog dissection or conventional dissection as academic exercise?. Journal of Computers in Mathematics and Science Teaching, 29(2), 113-138. Alhalabi, W. (2016). Virtual reality systems enhance students’ achievements in engineering education. Behaviour & Information Technology, 35(11), 919-925. Allcoat, D. (2021). Effects of Virtual Reality on solar-power panel efficiency learning. Department of Psychology. University of Warwick, Allcoat, D., & von Mühlenen, A. (2018). Learning in virtual reality: Effects on performance, emotion and engagement. Research in Learning Technology, 26. Andreatta, P. B., Maslowski, E., Petty, S., Shim, W., Marsh, M., Hall, T., ... & Frankel, J. (2010). Virtual reality triage training provides a viable solution for disaster‐ preparedness. Academic emergency medicine, 17(8), 870-876. Apostolellis, P., Bowman, D. A., & Chmiel, M. (2018). Supporting social engagement for young audiences with serious games and virtual environments in museums. In Museum Experience Design (pp. 19-43). Springer, Cham. Barbe, W. B., Milone, M. N., & Swassing, R. H. (1988). Teaching through modality strengths: Concepts and practices. Zaner-Bloser. Buiu, C., & Gânsari, M. (2014, April). Designing robotic avatars in Second Life-A tool to complement robotics education. In 2014 IEEE Global Engineering Education Conference (EDUCON) (pp. 1016-1018). IEEE. Buttussi, F., & Chittaro, L. (2018). Effects of different types of virtual reality display on presence and learning in a safety training scenario. IEEE transactions on visualization and computer graphics, 24(2), 1063-1076. Chao, C., Chalouhi, G. E., Bouhanna, P., Ville, Y., & Dommergues, M. (2015). Randomized clinical trial of virtual reality simulation training for transvaginal gynecologic ultrasound skills. Journal of Ultrasound in Medicine, 34(9), 1663-1667. Cheung, S. K., Fong, J., Fong, W., & Wang, F. L. (Eds.). (2013). Hybrid Learning and Continuing Education: 6th International conference, ICHL 2013, Toronto, ON, Canada, August 12- 14, 2013, Proceedings (Vol. 8038). Springer. Dalgarno, B., & Lee, M. J. (2010). What are the learning affordances of 3‐D virtual environments?. British Journal of Educational Technology, 41(1), 10-32. Escueta, M., Quan, V., Nickow, A. J., & Oreopoulos, P. (2017). Education technology: An evidence-based review. 31 Farra, S., Hodgson, E., Miller, E. T., Timm, N., Brady, W., Gneuhs, M., ... & Bottomley, M. (2019). Effects of virtual reality simulation on worker emergency evacuation of neonates. Disaster medicine and public health preparedness, 13(2), 301. Ferracani, A., Pezzatini, D., & Del Bimbo, A. (2014, November). A natural and immersive virtual interface for the surgical safety checklist training. In Proceedings of the 2014 ACM international workshop on serious games (pp. 27-32). Finkelstein, N. D., Adams, W. K., Keller, C. J., Kohl, P. B., Perkins, K. K., Podolefsky, N. S., & LeMaster, R. (2005). When learning about the real world is better done virtually: A study of substituting computer simulations for laboratory equipment. Physical review special topics-physics education research, 1(1), 010103. Gallagher, A. G., Ritter, E. M., Champion, H., Higgins, G., Fried, M. P., Moses, G., ... & Satava, R. M. (2005). Virtual reality simulation for the operating room: proficiency-based training as a paradigm shift in surgical skills training. Annals of surgery, 241(2), 364. Gigante, M. A. (1993). Virtual reality: definitions, history and applications. In Virtual reality systems (pp. 3-14). Academic Press. Gurusamy, K., Aggarwal, R., Palanivelu, L., & Davidson, B. R. (2008). Systematic review of randomized controlled trials on the effectiveness of virtual reality training for laparoscopic surgery. British Journal of Surgery, 95(9), 1088-1097. Haque, S., & Srinivasan, S. (2006). A meta-analysis of the training effectiveness of virtual reality surgical simulators. IEEE Transactions on Information Technology in Biomedicine, 10(1), 51-58. Huang, H. M., Rauch, U., & Liaw, S. S. (2010). Investigating learners’ attitudes toward virtual reality learning environments: Based on a constructivist approach. Computers & Education, 55(3), 1171-1182. Hwang, W. Y., & Hu, S. S. (2013). Analysis of peer learning behaviors using multiple representations in virtual reality and their impacts on geometry problem solving. Computers & Education, 62, 308-319. Kavanagh, S., Luxton-Reilly, A., Wuensche, B., & Plimmer, B. (2017). A systematic review of Virtual Reality in education. Themes in Science and Technology Education, 10(2), 85-119. Kiely, D. J., Gotlieb, W. H., Lau, S., Zeng, X., Samouelian, V., Ramanakumar, A. V., ... & Press, J. Z. (2015). Virtual reality robotic surgery simulation curriculum to teach robotic suturing: a randomized controlled trial. Journal of robotic surgery, 9(3), 179-186. Kockro, R. A., Amaxopoulou, C., Killeen, T., Wagner, W., Reisch, R., Schwandt, E., ... & Stadie, A. T. (2015). Stereoscopic neuroanatomy lectures using a three-dimensional virtual reality environment. Annals of Anatomy-Anatomischer Anzeiger, 201, 91-98. 32 Lampi, E. (2013). The effectiveness of using virtual laboratories to teach computer networking skills in Zambia (Doctoral dissertation, Virginia Tech). Larsen, C. R., Soerensen, J. L., Grantcharov, T. P., Dalsgaard, T., Schouenborg, L., Ottosen, C., & Ottesen, B. S. (2009). Effect of virtual reality training on laparoscopic surgery: randomised controlled trial. Bmj, 338. Logishetty, K., Rudran, B., & Cobb, J. P. (2019). Virtual reality training improves trainee performance in total hip arthroplasty: a randomized controlled trial. The bone & joint journal, 101(12), 1585-1592. Lorenzo, G., Pomares, J., & Lledó, A. (2013). Inclusion of immersive virtual learning environments and visual control systems to support the learning of students with Asperger syndrome. Computers & Education, 62, 88-101. Madathil, K. C., Frady, K., Hartley, R., Bertrand, J., Alfred, M., & Gramopadhye, A. (2017). An empirical study investigating the effectiveness of integrating virtual reality-based case studies into an online asynchronous learning environment. Computers in Education Journal, 8(3), 1-10. Makransky, G., Terkildsen, T. S., & Mayer, R. E. (2019). Adding immersive virtual reality to a science lab simulation causes more presence but less learning. Learning and Instruction, 60, 225-236. Mayer, R. (2009). Multimedia Learning. Cambridge University Press. doi: 10.1017/cbo9780511811678 Mayer, R. (2014). Cognitive Theory of Multimedia Learning. In R. Mayer (Ed.), The Cambridge Handbook of Multimedia Learning (Cambridge Handbooks in Psychology, pp. 43-71). Cambridge: Cambridge University Press. doi:10.1017/CBO9781139547369.005 Mikropoulos, T. A., & Natsis, A. (2011). Educational virtual environments: A ten-year review of empirical research (1999–2009). Computers & Education, 56(3), 769-780. McLaurin, E. J., & Stone, R. T. (2012, September). Comparison of virtual reality training vs. integrated training in the development of physical skills. In Proceedings of the Human Factors and Ergonomics Society Annual Meeting (Vol. 56, No. 1, pp. 2532-2536). Sage CA: Los Angeles, CA: SAGE Publications. Oser, R. R. (2013). Effectiveness of virtual laboratories in terms of achievement, attitudes, and learning environment among high school science students (Doctoral dissertation, Curtin University). Parong, J., & Mayer, R. E. (2018). Learning science in immersive virtual reality. Journal of Educational Psychology, 110(6), 785. Rupasinghe, T. D., Kurz, M. E., Washburn, C., & Gramopadhye, A. K. (2011). Virtual reality training integrated curriculum: An aircraft maintenance technology (AMT) education perspective. International Journal of Engineering Education, 27(4), 778. 33 Sharma, S., Agada, R., & Ruffin, J. (2013, April). Virtual reality classroom as an constructivist approach. In 2013 Proceedings of IEEE Southeastcon (pp. 1-5). IEEE. Skou -Thomsen, A., Bach-Holm, D., Kjærbo, H., Højgaard-Olsen, K., Subhi, Y., Saleh, G. M., ... & Konge, L. (2017). Operating room performance improves after proficiency-based virtual reality cataract surgery training. Ophthalmology, 124(4), 524-531. Smith, M. J., Ginger, E. J., Wright, K., Wright, M. A., Taylor, J. L., Humm, L. B., ... & Fleming, M. F. (2014). Virtual reality job interview training in adults with autism spectrum disorder. Journal of autism and developmental disorders, 44(10), 2450-2463. Stone, R. T., Watts, K. P., & Zhong, P. (2011). Virtual reality integrated welder training. Welding Journal, 90(7), 136s. Tanyildizi, E., & Orhan, A. (2007). A virtual electric machine laboratory for synchronous machine application. Computer Applications in Engineering Education, 17(2), 187-195. Tatli, Z., & Ayas, A. (2013). Effect of a virtual chemistry laboratory on students' achievement. Journal of Educational Technology & Society, 16(1), 159-170. Tschannen, D., Aebersold, M., Mclaughlin, E., Bowen, J., & Fairchild, J. (2012). Use of virtual simulations for improving knowledge transfer among baccalaureate nursing students. Journal of Nursing Education and Practice, 2(3), 15. Uttl, B., White, C. A., & Gonzalez, D. W. (2017). Meta-analysis of faculty's teaching effectiveness: Student evaluation of teaching ratings and student learning are not related. Studies in Educational Evaluation, 54, 22-42. Valdis, M., Chu, M. W., Schlachta, C., & Kiaii, B. (2016). Evaluation of robotic cardiac surgery simulation training: a randomized controlled trial. The Journal of thoracic and cardiovascular surgery, 151(6), 1498-1505. Vincent, D. S., Sherstyuk, A., Burgess, L., & Connolly, K. K. (2008). Teaching mass casualty triage skills using immersive three‐dimensional virtual reality. Academic Emergency Medicine, 15(11), 1160-1165. Webster, R. (2016). Declarative knowledge acquisition in immersive virtual learning environments. Interactive Learning Environments, 24(6), 1319-1333. Wei, W., Dongsheng, L., & Chun, L. (2013, September). Fixed-wing aircraft interactive flight simulation and training system based on XNA. In 2013 International Conference on Virtual Reality and Visualization (pp. 191-198). IEEE. Yang, K. Y., & Heh, J. S. (2007). The impact of internet virtual physics laboratory instruction on the achievement in physics, science process skills and computer attitudes of 10th-grade students. Journal of Science Education and Technology, 16(5), 451-461. 34 Zaveri, P. P., Davis, A. B., O'Connell, K. J., Willner, E., Schinasi, D. A. A., & Ottolini, M. (2016). Virtual reality for pediatric sedation: a randomized controlled trial using simulation. Cureus, 8(2). Zhao, Y. C., Kennedy, G., Yukawa, K., Pyman, B., & O'Leary, S. (2011). Can virtual reality simulator be used as a training aid to improve cadaver temporal bone dissection? Results of a randomized blinded control trial. The Laryngoscope, 121(4), 831-837. 35 ANNEX Table A1: Studies included in the meta-analysis Type of skill assessed Grade Level VR STUDY TVET Immersive Sample Exposure C T SE K-12 H.E. /OJT VR RCT size in hours Health and safety Farra et al.(2018) Yes Yes Yes* No No Yes Yes Yes 93 0.66 Logishetty (2018) No Yes No No Yes No Yes Yes 24 2.00 Skou-Thomsen et al.(2017) No Yes No No Yes No Yes No 18 1.50 Zaveri et al.(2016) Yes Yes No No Yes No Yes Yes 14 0.50 Chao et al.(2015) No Yes No No Yes No Yes Yes 34 0.66 Kiely et al.(2015) No Yes No No Yes No Yes Yes 27 5.00 Valdis et al.(2015) No Yes No No Yes No Yes Yes 20 9.30 Tschannen et al.(2012) No Yes No No Yes No Yes Yes 115 3.00 Zhao et al.(2011) No Yes No No Yes No Yes Yes 20 2.00 Larsen et al.(2009) No Yes No No Yes No Yes Yes 21 1.00 Vincent et al. (2008) No Yes Yes No Yes No Yes No 20 0.25 Virtual labs for engineering, science, and technical education Buttussi et al.(2018) Yes No Yes No No Yes Yes Yes 96 0.08 Lampi (2013) No Yes No No No Yes Yes Yes 56 4.00 Osner (2013) Yes No No Yes No No No Yes 322 5.00 Tatli and Ayas (2013) Yes No No Yes No No Yes Yes 90 8.00 McLaurin & Stone (2012) No Yes No No No Yes Yes Yes 21 25.00 Rupasinghe et al.(2011) Yes No No No No Yes Yes Yes 39 1.00 Stone et al.(2011) No Yes Yes No No Yes Yes Yes 22 80.00 Tanyildizi & Orhan (2007) Yes No No No Yes No No Yes 73 2.00** Yang & Heh (2007) Yes No No Yes No No No Yes 150 7.50 Finkelstein et al.(2005) Yes No No No Yes No No Yes 222 1.00 General education Allcoat et al. (forthcoming) Yes No No No Yes No Yes Yes 75 0.17 Makransky et al.(2019) Yes No No No Yes No Yes Yes 52 0.50 Allcoat & von Mühlenen (2018) Yes No No No Yes No Yes Yes 99 0.12 Parong & Mayer (2018) Yes No No No Yes No Yes Yes 55 0.20 Alhalabi (2016) Yes No No No Yes No Yes Yes 48 0.33 Kockro et al. (2015) Yes No No No Yes No Yes Yes 169 0.30 Smith (2015) No Yes No No No Yes Yes Yes 32 10.00 Webster et al. (2015) Yes No No No No Yes Yes Yes 140 0.25 Hwang & Hu (2013) Yes No No Yes No No Yes Yes 58 4.00 Akpan & Strayer (2010) Yes No Yes Yes No No No Yes 34 0.33*** Notes: RCT: Randomized Control Trial. Type of skill assessed: C: cognitive, T: technical, SE: socioemotional. Grade Level: K-12: Basic education; H.E: Higher education, TVET/OJT: Technical education and on-the-job training. * While this study includes experiments aiming to assess socio-emotional skills, it does not clearly present the skills assessment instruments. As such, the experiments were not included in the meta-analysis. ** This study does not provide the time of exposure. Since the instruction on synchronous motors within a syllabus of electrical machines takes on average 2.5 hours (including introduction and theory), we estimated the exposure to the virtual environment to be 2 hours approximately. *** This study does not provide the time of exposure. Since participants were enrolled in six and eight-period life science course, we assumed one of these periods to be dedicated to the topic of dissection and estimate exposure to the simulator in one of these periods to be of 20 minutes approximately. 36 Table A2: Instruments used to assess learning for health and surgical education Article Main Topic Instrument used for skills assessment Farra et al. Emergency Technical skills were assessed by having students participate in live evacuation (2018) Evacuation of exercises using mannequins of newborns. The research team developed a tool to Neonates asses students’ performance through direct observation. Psychomotor skills were assessed using a rubric developed in collaboration with disaster experts using the Cincinnati Children’s Emergency Preparedness and Response Program. Cognitive skills were measured using knowledge assessment developed by the researchers, based upon the course objectives and modules. The assessment included multiple-choice questions to assess students’ comprehension of the topic and knowledge of its practical application. Logishetty Total hip Technical skills were assessed by measuring: (i) the total correct tasks necessary to (2018) arthroplasty conduct a successful total hip arthroplasty (out of a total of 30); (ii) the errors in the component orientation in degrees (i.e. inclination with respect of the pelvis) (the higher the degrees, the higher the error); and operation time (in minutes). The task check list was developed by a pool of expert surgeons. Operation time (in minutes) was also registered. 6 Skou- Cataract Technical skills were assessed through performance in the operating room, using Thomsen et al. surgery the Objective Structured Assessment of Cataract Surgical Skill (OSACSS) rating (2017) scale, a tool previously validated by the practice. Participants performed 3 consecutive phacoemulsification surgeries immediately before and after the training intervention. Procedures were recorded in video. Three raters evaluated all anonymized videos independently. Zaveri et al Procedural Technical skills: After completing the intervention or control module, all residents (2016) sedation then immediately participated in a simulated pediatric procedural sedation scenario. All simulations occurred in the Simulation Center with an infant patient simulator. Simulations were video and audio recorded. Each performance video was reviewed by one or two team members blinded to the group allocation. Performance on preparation and management of a complication was assessed using a 32-point checklist, adapted for this sedation scenario from a previously published checklist. The initial checklist was determined by a consensus from a panel of experts in pediatric emergency medicine and pediatric procedural sedation. Chao et al. Transvaginal Technical skills were assessed by asking participants to produce 4 images (2015). Gynecologic (longitudinal and axial sections of the uterus and the ovaries) and measure the Ultrasound uterus and each ovary. Participants were given 5 minutes of scanning time. Images with measurement calipers were stored in a database. Two blinded reviewers (M.D. and G.E.C) assessed the images in a random order two months after the trials were completed. Kiely et al. Robotic Technical skills were assessed by three blinded raters (two gynecologic oncologists (2015) suturing and one gynecologic oncology fellow, all experienced in robotic surgery) using the GOALS+ score. This score is composed of the five domains (each ranging from 1 to 5 points) developed for assessing skill in laparoscopy which include autonomy, efficiency, tissue handling, depth perception, and bimanual dexterity plus two additional metrics developed specifically for robotics, precision and awareness of camera and instruments. The GOALS+ includes 7 domains, 6 of which form the GEARS score. Data allowed also to calculate the Global Evaluative Assessment of Robotic Skill (GEARS) scoring tool, a tool previously validated by the medical 6 The study also included a procedure-based assessment with a global summary score ranging from an ability to only assist (Level 1a) to advanced competence (Level 4b). However, since the assessment did not include a numeric score, these results are not included in the meta-analysis. 37 practice. Other secondary outcomes were the number total knots and satisfactory knots performed during the inanimate model suturing task. 7 Valdis et al. Robotic Technical skills were assessed by asking participants to complete a standardized (2015). cardiac robotic internal thoracic artery harvest and mitral valve annuloplasty performed in surgery porcine models. The de-identified recordings of the procedures were assessed by a single investigator (to control for interobserver variability) using the Global Evaluative Assessment of Robotic Skill (GEARS) scoring tool, a tool previously validated by the practice. Time on task was also assessed. Tschannen et Nursing Students participated in a mannequin-based simulation. Their performance was al. (2012) education evaluated by expert practitioners using an adapted version of the Capacity to Rescue Instrument (CRI), a tool previously validated by the practice. The instrument is designed to capture key elements (assessments, interventions) that are needed to ensure a good outcome for the patient for a specific simulation scenario. For the purpose of this study, the modified CRI consisted of 17 items measuring key concepts: communication (9 items), problem solving (4 items), and priority setting (4 items). Zhao et al. Bone Technical skills were assessed by asking participants to complete a cortical (2011) dissection mastoidectomy on a cadaveric temporal bone. The participants had 1 hour to complete the procedure. Their dissections were captured using a video camera. The videos contained only the hands of the participants. These videos were then presented to 3 otologists who were blinded to whether the participant received traditional or immersive training. The otologists assessed the participants’ performance using a standardized assessment tools previously validated by the practice. Larsen et al. Laparoscopic Technical skills were assessed by asking participants to complete a laparoscopic (2009) surgery surgery. Two independent / blinded observers assessed their performance using the Objective Structured Assessment of Technical Skills (OSATS), an instrument previously validated by the practice. A secondary outcome assessed was operation time in minutes (time on task). Vincent et al. Mass casualty Technical and soft skills were measures using the following outcome variables: (2008) triage Triage score: A point was given for each correct answer that was selected by the learner in the VR environment: 1) was the main problem correctly identified, 2) was the required intervention correctly identified, and 3) was the triage category correctly identified? Thus, each learner could receive a maximum of 15 points per scenario (5 victims x 3 questions per victim). Intervention score: A point was awarded for each intervention that was performed correctly in the VR environment. Thus, each learner could receive a maximum of 5 points per scenario. Self-efficacy: Subjects completed a 5 question self-efficacy questionnaire before and after the VR experience. Each question was scored on a 5-point Likert scale with points labeled ‘‘never’’ (1) to ‘‘always’’ (5). Note: Authors own elaboration. 7 For scoring of total knots, if a knot was partially completed at the 10 min stop time, it was scored as follows: 0.4, if the first double throw was completed and cinched down and then, an additional 0.2 for each single throw cinched down. Satisfactory knots were defined as knots that the rater would not cut out and re-suture during live surgery 38 Table A3: Instruments used to assess learning for engineering, science, and technical education Article Main Topic Instrument used for skills assessment Buttussi et al Aviation To measure cognitive skills about cabin safety, the researchers used a test with nine (2018) safety questions related to: 1) what to do in case of turbulence; 2) what to do in preparation for procedures impact; 3)which exit should be the first choice for evacuation; 4)when it is not possible to use an exit; 5) what to do if the chosen exit cannot be used; 6) what to do if there is smoke in the cabin during evacuation; 7) what to do after using a wing exit; 8) what to do after leaving the aircraft; 9) what to do with luggage. Participants were asked to answer the questions orally to avoid suggesting possible answers. Answers were audio recorded and later rated by the experimenter as correct or wrong, following a codebook that listed the possible answers and their rating (right/wrong). Knowledge was measured as the number of correctly answered questions, and thus ranged between 0 and 9. Self-efficacy was assessed using a questionnaire with six items: 1) I feel able to deal with an emergency evacuation of an aircraft; 2) I would be able to deal with an emergency evacuation even if the aircraft is on fire; 3) I would be able to deal with an emergency evacuation even if one or more exits are blocked; 4) I would be able to deal with an emergency evacuation even if most of the passengers scream or cry; 5) I feel confident of my ability to exit from the aircraft in time; 6) I would be able to help passengers in need. Each item was rated by participants on a 7-point scale (1=not at all, 7=very). Lampi (2013) Computer Technical skills were assessed by giving students the opportunity to configure and Networks troubleshoot local area networks in a physical lab. Proficiency was measured by how quickly and accurately students configured and troubleshot a computer network. The instruments were developed by the researcher based on industrial certification skills objectives of the Cisco Certified Network Associate (CCNA) program (Cisco, 2012), as follow: Configuration time measured the amount of time a student took to configure a network. This was measured by recording the time it took to complete a network design as specified in a lab test. Time was measured in minutes. Configuration accuracy (0-22 points) was measured by the score a student obtained in a lab test on configuring a network design. The instrument was based on the objective performance measure (Lewis, 1993). It consisted of a check list of items of tasks that had to be completed to determine the eventual score. Correct configuration of an item scored a value of 1 and an incorrect scored a value of 0. Troubleshooting accuracy (0-6 points) was measured by the score a student obtained in lab test on troubleshooting a network that had a number of faults. The instrument was an objective performance measure (Lewis, 1993), with a check list of tasks that had to be completed to determine the eventual score. Troubleshooting time measured the amount of time a student took to troubleshooting a network that had a number of faults. This was measured by recording the time it took to complete troubleshooting a network specified in a lab test based on a CCNA® certification test. Time was measured in minutes. Osner (2013) Genetics Achievement test The Laboratory Assessment in Genetics (LAG) was used in the study. The LAG includes a scale for assessing students’ achievement in Genetics. Specifically, it measures the extent to which students understand various concepts, including Mendelian inheritance, the structure of DNA, mutations, cloning, and genetic engineering. All achievement items utilized a multiple-choice answer format with four possible responses from which to choose. Scoring was based on the number of items correctly answered and ranged from zero (0) for no correct answers to ten (10) for all correct answers. The score was then 39 divided in half for meaningful comparison with scores from other sections of the LAG, which ranged from zero (0) to five (5). Tatli and Ayas Chemistry Cognitive skills were measured using two posttest examinations: (2013) lab Chemical changes unit achievement test (CCUA): the test included 25 items to measure the learning outcomes of the course. Laboratory equipment test (LET): This test assessed students’ ability to recognize laboratory equipment: Items in the exam, devised by the group of researchers, were prepared in order to cover all laboratory materials and equipment used in primary school science and ninth-grade chemistry courses. The test included 28 items, endorsed by five academics from departments of instructional natural science and chemistry. In addition to these 28 items, a module was added, asking students to enter the names of laboratory materials and equipment into blank spaces beneath color pictures of the related material and equipment. McLaurin and Welding Technical skills were assessed in two ways: (i) students who completed the training took Stone (2012) the welding certification exam imparted the American Welding Society. Certification rates of students could oscillate between (0 and100%); (ii) students submitted their welds to a welding master expert, who would assess the quality of the weld (0-100 points). Rupasinghe et Corrosion Cognitive skills were measured using two posttest examinations: (i) a written examination al. (2011) prevention (0 to 100 points) consisting of two-tiered multiple-choice questions, fill in the blanks, and and control essay questions where the students had to describe and apply the concepts and procedures (aircraft learned. (ii) An oral examination (0 to 100 points) where students were given several Maintenance) inspection scenarios and they had to describe how they would resolve the issues using the most appropriate Non-Destructive Inspection (NDI) tool. These questions were aimed at testing deeper knowledge (higher levels of knowledge using Bloom’s taxonomy) on each inspection device / simulator. Stone et al. Welding Technical skills were assessed using the amount of material used by trainees: amount of (2011) overall flat plates (both virtual and real-world plates) used by participants in both groups, the number of groove plates, and the number of electrodes (less usage of materials for a similar task is more desirable). 8 Socio-emotional skills were measured using the Team Learning Questionnaire (TLQ) which tracked three key dimensions of team learning and interaction: (1) Continuous Improvement Seeking (the degree to which a team can learn from previous experiences); (2) Dialogue Promotion and Open Communication (the degree to which open and honest communication is encouraged and takes place within a team); and (3) Collaborative Learning (the degree to which team members are seen and used as sources of knowledge by the rest of the team). Each dimension consists of a series of questions, which the participant answers on a five-point scale. Tanyildizi and Synchronous Cognitive skills were measured by pre and posttest assessing cognitive skills of the Orhan (2007) Motors operation of synchronous motor. Yang and Heh Physics lab Cognitive skills were measured using two posttest examinations: (2007) Physics Achievement Test (0-89 points). A 40 items test was developed from a senior high school physics textbook (items in mechanics, optics, and electricity). The content of the test was validated by two senior high school physics teachers and one physics professor. Science Process Skills Tests (0-36 points). Assesses the performance of the basic and integrated science process skills of the students. A point is given for every correct item. The highest score is 36 points 8 The article also assesses the effects of VR training on pure “welding” technical skills but does not report the statistical significance of results due to the limited sample size. As such, these results are excluded from the meta-analysis. 40 Finkelstein et Circuits Technical skills: At the end of each laboratory section, all students completed the al. (2005) design challenge worksheet in which they were asked to build a circuit using real equipment with their groups. Teaching Assistants reported the average time used to complete the task. Cognitive skills (write-up): Each student in the circuit challenge completed a writeup answering the following question: “Describe what happens and WHY the bulbs change brightness as they do. You may use words and formulas". The answers were evaluated by the authors as to their overall correctness using a standardized rubric with a scale from 0 to 3. Zero represented no demonstrated knowledge of the domain, while 3 represented correct and complete reasoning. The research team came to consensus on the grading metric, grading not only for overall correctness, but also for use of concepts, such as current, voltage, power, series or parallel resistance; and mathematics. Cognitive skills (test): Three questions on circuits were included in the final examination. Q1: rank the currents through each of the bulbs; Q2: rank the voltage drops across the bulbs in the same circuit; Q3: predict whether the current through the first bulb increased, decreased, or remained the same when the switch was opened. For each student, the share of correct answers in these 3 questions was recorded. Note: Authors own elaboration. 41 Table A4: Instruments used to assess learning for general education Article Main Topic Instrument used for skills assessment Allcoat et al. Science Cognitive skills were measured using a battery of 8 questions assessing students’ (forthcoming) (efficiency of knowledge, comprehension, and knowledge application. The questions were a mix of solar panels) formats and followed Bloom’s Taxonomy. Questions in the knowledge test were marked as correct or incorrect to give a total score of 0 to 8. Participants in the treatment (VR), treatment two (Mixed Reality), and control groups (traditional learning) completed the test before and after the training. Makransky, Biology Cognitive skills were assessed through two multiple-choice tests: a knowledge test Terkildsen, (mammalian (pretest) and a transfer test (post-test). A group of subject matter experts, including two and Mayer proteins) scientists, two psychologists, and a psychometrician, developed the test questions. The (2019) knowledge test consisted of 10 multiple-choice questions designed to assess conceptual and procedural knowledge of essential material presented in the simulation. The questions required that students had a deep knowledge of the content and that they could apply that knowledge to a realistic context. Students received one point for each correct answer and 0 points for selecting an incorrect answer. Allcoat and Biology Cognitive skills were assessed using a test that contained 17 biology questions sourced von Mühlenen (plant cells) directly from a British AQA Biology From this, 12 questions were related to the (2018) remembering of information (memorization), whereas 5 questions were more concerned with the understanding of information. Parong and Biology (cells To assess cognitive skills, participants completed a posttest on the material they viewed Mayer (2018) in human during the lesson. The posttest consisted of 20 questions based on the lesson, including bloodstream) 16 factual questions in multiple-choice format and 4 conceptual questions in short-answer format. The posttest was scored out of 20 points, with a point given for each correct multiple-choice and short-answer question; half-points were given for partially correct answers on short answer question. The short answer questions were scored based on a rubric that indicated the words and phrases required for 1 point or 1/2 point. Alhalabi Science for Cognitive skills were assessed using a 10 question (100 points) knowledge quiz on each (2016) engineers of the 4 topics taught: (i) Astronomy, (ii) Transportation; (iii) Networking; and (iv) Inventors. Questions assessed mostly general facts. Kockro et al., Anatomy of Cognitive skills were assessed using a test immediately following each teaching session. (2015) the heart Participants were asked to complete a short examination consisting of 10 multiple-choice questions related to the content given (i.e. the topographical anatomy of the third ventricle). These questions were developed and agreed on by an expert committee of four neurosurgeons and anatomists. Each correct answer in the examination was awarded one point, with a maximum of 10 points achievable. Smith (2015) Job Interview Technical skills were assessed using a role-play performance of a Job interview. Participants performed two pre-test and two post-test video recording role-play interviews. Each interview was scored 0-100 using and algorithm that assessed the appropriateness of responses based on eight domains: negotiation skills, conveying that you're a hard worker, sounding easy to work with, sharing things in a positive way, sounding honest, sounding interested in the position, behaving professionally, and establishing interviewer rapport. Participants self-reported self-confidence in a pre and post-test. Participants rated their self-confidence at interviewing using a 7-point Likert scale to answer nine questions, with higher scores reflecting more positive views (e.g., “How comfortable are you going on a job interview?”). Total scores at pre-test and post- test had strong internal consistencies (α = 0.95 and α = 0.92, respectively). Webster Science Cognitive skills were assessed using a test consisting of 22 questions. Questions 1 to 16 (2015) (corrosion had 4 possible answer choices, while questions 17 to 22 had 6. The test served as pre-test and post-test. However, the post-exam had 5 questions that were different from the pre- 42 prevention exam (i.e. 17 common exam questions). Corrosion subject matter experts and instructors and control) validated the content of the exams. The test evaluated five topics: (1) the importance of corrosion prevention and control, (2) corrosion basics, (3) corrosion influences, (4) corrosion types, and (5) basic corrosion prevention. The topics were aligned with six learning objectives: (i) demonstrate knowledge of why CPC is important by identifying and selecting the outcomes of past lack thereof; (ii) demonstrate knowledge of corrosion by identifying and selecting characteristics of the definition; (iii) demonstrate knowledge of the mechanics of corrosion by identifying and selecting the individual components of corrosion and possible influences; (iv) demonstrate knowledge by identifying different types of corrosion by selecting each type; (v) demonstrate knowledge of different types of corrosion by identifying and selecting characteristics of each type; and (vi) demonstrate knowledge of basic CPC techniques, theories, and principles. Hwang and Hu Basic Cognitive skills were assessed using test Scores (20-100 points in 5 levels): a pretest and (2013) geometry a post-test were delivered. The test evaluated four dimensions including the mathematics context, cognitive processes, types of representations, and specific tasks. The test contains rubric at five score levels for examining subjects’ understanding of the geometric problems. Level 1: Solutions not related to the problem and no explanation is provided (20 points); Level 2:The process leading to the solution is reasonable but the final answer is incorrect (40 points); Level 3: The answer or equation is correct but without textual or graphical explanation (60 points); Level 4: The answer is correct, textual or graphical explanation of the process leading to the solution is correct but partially provided (80 points); Level 4: The answer is correct, textual or graphical explanation of the process leading to the solution is correct and thoroughly provided (100 points Akpan and Frog Cognitive skills were assessed using a Dissection Achievement Test. The test was Strayer (2010) dissection designed by a life science classroom instructor in cooperation with three science experts. Questions were designed to meet the objectives of dissection as contained in the Modern Biology textbook and the national curricula. Example questions include: “The organ responsible for filtering toxins from the blood is the? (a) spleen (b) kidneys (c) heart (d) liver.” The test was used as a pretest and posttest and had 25 multiple choice items (ten focused on identification of organs and fifteen related to the functional knowledge of anatomy and morphology) and a short answer section. Socio-emotional skills were measured using an attitude self-assessment. The assessment measured student’s attitudes towards: (i) dissection (9 items), (ii) school/science (4 items); and (iii) computers (10 items). Twenty-three of the items included in the test were adopted from previous research/available instrument. Two items were developed by the researchers. Note: Authors own elaboration. 43 Table A5: Impact of VR training on learning performance (cognitive skills) as proxied by results in posttest Article Skills Developed Performance Metric Posttest Results (1) (2) (3) (4) Health and Safety Farra et al. (2018) Knowledge on emergency Cognitive Assessment. Treatment: 74.2; preparedness Score range: 0-100 points Control: 70.7 (N.S) Engineering, Science, and Technical Education Osner (2013) Knowledge about genetics Cognitive Assessment Treatment: 2.78; Score range: 0-5 Control: 2.90 (N.S.) Tatli and Ayas (2013) Knowledge about chemical Cognitive Assessment Treatment: 59.33; changes Score range: 0-100 Control: 55.33 (N.S.) Tatli and Ayas (2013) Knowledge about laboratory Cognitive Assessment Treatment: 67.41; equipment Score range: 0-100 Control: 35.43 (***) Rupasinghe et al. Usage of a borescope to assess Cognitive Assessment (written) Treatment: 58; (2011) aircraft corrosion. Score range: 0-100 Control: 60 (N.S.) Rupasinghe et al. Usage of a borescope to assess Cognitive Assessment (oral) Treatment: 83; (2011) aircraft corrosion. Score range: 0-100 Control: 60 (***) Rupasinghe et al. Knowledge of Eddy current Cognitive Assessment (written) Treatment: 71; (2011) inspection to assess corrosion Score range: 0-100 Control: 75 (N.S.) Rupasinghe et al. Knowledge of Eddy current Cognitive Assessment (oral) Treatment: 85; (2011) inspection to assess corrosion Score range: 0-100 Control: 78 (*) Tanyildizi and Orhan Knowledge of synchronous Cognitive Assessment Treatment: 24.28; (2007) motors Score range: 0-30 Control: 21.00 (***) Yang and Heh (2007) Knowledge of physics Cognitive Assessment Treatment: 61.01; Score range: 0-89 Control: 53.89 (***) Yang and Heh (2007) Knowledge of science Cognitive Assessment Treatment: 26.43; processes Score range: 0-36 Control: 23.69 (***) Finkelstein et al. Knowledge of circuits Cognitive Assessment (written) Treatment: 1.86; (2005) operation Score range: 1 to 3 Control: 1.64 (**) Finkelstein et al. Knowledge of circuits Cognitive Assessment Treatment: 59.3; (2005) operation Score range: 0-100 Control: 47.6 (***) General Education Allcoat and von Knowledge of the parts of a Cognitive Assessment. VR vs video Mühlenen (2018) plant cell Score range: 0-100% (% Treatment (VR): 56.5; questions that are correct). Control: 43.9 (***) Allcoat and von Knowledge of the parts of a Cognitive Assessment. VR vs textbook Mühlenen (2018) plant cell Score range: 0-100% (% Treatment: 56.5; questions that are correct). Control:50.2 (N.S) Allcoat and von Memorizing the parts of a plant Cognitive Assessment. VR vs video Mühlenen (2018) cell Score range: 0-100% (% Treatment: 55.1; questions that are correct). Control:40.6 (***) Allcoat and von Memorizing the parts of a plant Cognitive Assessment. VR vs textbook Mühlenen (2018) cell Score range: 0-100% (% Treatment: 55.1; questions that are correct). Control = 43.6 (**) Parong and Mayer Knowledge of how human cells Cognitive Assessment. VR vs slideshow (2018) work Score range: 0-20 points. Treatment: 10.17; Control:13.54 (***) Alhalabi (2016) Knowledge of anatomy Cognitive Assessment. VR vs lecture Score range: 0-100 points. Treatment: 93.0; 44 Control: 69.0 (**) Alhalabi (2016) Knowledge of transportation Cognitive Assessment. VR vs lecture Score range: 0-100 points. Treatment: 90.0; Control: 60.0 (**) Alhalabi (2016) Knowledge of science Cognitive Assessment. VR vs lecture networking Score range: 0-100 points. Treatment: 38.0; Control: 22.0 (**) Alhalabi (2016) Knowledge of famous Cognitive Assessment. VR vs lecture inventors Score range: 0-100 points. Treatment: 15.0; Control: 8.0 (**) Kockro et al., (2015) Knowledge about the anatomy Cognitive Assessment. VR vs slideshow of the human ventricular Score range: 0-10 points Treatment: 5.45; system Control: 5.19 (N.S.) Hwang and Hu (2013) Calculation of the volume and Cognitive Assessment. VR vs textbook area of 3D objects Score range: 20-100 points. Treatment: 70.24; Control: 59.17 (**) Akpan and Strayer Knowledge of frog anatomy Cognitive Assessment. VR vs dissection (2010) Score range: 0-25 points. Treatment: 23.33; Control: 16.94 (***) Note: N.S: Not statistically significant at a 10 percent confidence level. *10% significance level; ** 5% significance level; *** 1% significance level. Table A6: Impact of VR training on learning performance (technical skills) as proxied by results in posttest. Article Skills Developed Performance Metric Posttest results (1) (2) (3) (4) Health and Safety Farra et al. (2018) Performance of emergency Ability assessment Treatment: 86.5; evacuation Score range: 0-100 points Control: 71.1 (***) Logishetty (2018) Performance of total hip Ability Assessment. Treatment: 22.0; arthroplasty Score range: 0-30 points Control: 12.0 (***) Zaveri et al (2016) Residents learn how to conduct Ability Assessment Treatment: 24.0; pediatric procedural sedation. Score range: 0-32 points Control: 22.5 (N.S) Chao et al. (2015). Performance of transvaginal Ability Assessment Treatment: 12.0; gynecologic ultrasound Score range: 0-19 Control: 9.0 (**) (experienced surgeons) Valdis et al. (2015). Performance of robotic internal Ability Assessment. Treatment: 22.8; thoracic artery harvest and Score range: 6-30 points Control: 11.0 (***) mitral valve annuloplasty Tschannen et al. Nurses show improvement in Ability Assessment. Treatment: 21.9; (2012) the following skills: (i) priority Score: 0 to 22 points Control: 20.1 (**) siting (focus on the patient), (ii) communications (with patient, second nurse, and physician), and (iii) problem solving (request assistance when needed). Zhao et al. (2011) Performance of cortical Ability Assessment. Treatment: 67.0; mastoidectomy on a cadaveric Score range: 0-100 points Control: 29.0 (***) temporal bone 45 Larsen et al. (2009) Performance of laparoscopic Ability Assessment. Treatment: 33.0; surgery Score range: 0-100 points Control: 23.0 (***) Engineering, Science, and Technical Education Lampi (2013) Computer network Ability Assessment. Treatment: 19.36; configuration accuracy Score range: 0-22 points Control: 18.36 (N.S.) Lampi (2013) Computer network Ability Assessment. Treatment:4.6; troubleshooting accuracy Score range: 0-6 points Control: 4.4; (N.S.) McLaurin and Stone Performance of horizontal filet Ability Assessment. Treatment: 90; (2012) weld (2F) Score range: 0-100 points Control: 92 (N.S.) McLaurin and Stone Performance of flat groove Ability Assessment. Treatment: 90; (2012) weld (1G) Score range: 0-100 points Control: 88 (N.S.) McLaurin and Stone Performance of vertical filet Ability Assessment. Treatment:72; (2012) weld (3F) Score range: 0-100 points Control:81 (N.S.) McLaurin and Stone Performance of vertical groove Ability Assessment. Treatment: 10; (2012) weld (3G) Certification rate: 0-100% Control:45 (**) McLaurin and Stone Performance of vertical groove Ability Assessment. Treatment:53; (2012) weld (3G) Score range: 0-100 points Control: 61 (N.S.) Note: N.S: Not statistically significant at a 10 percent confidence level. *10% significance level; ** 5% significance level; *** 1% significance level Table A7: Impact of VR training on learning performance (socio-emotional skills) as proxied by results in posttest Article Skills Developed Performance Metric Posttest results (1) (2) (3) (4) Engineering, Science, and Technical Education Stone et al. (2011) Continuous improvement Socio-emotional Assessment Treatment: 4.47; seeking Score range: 1-5 points Control: 4.14 (N.S.) Stone et al. (2011) Dialogue promotion and open Socio-emotional Assessment Treatment: 4.63; communication Score range: 1-5 points Control: 3.85 (***) Stone et al. (2011) Collaborative learning Socio-emotional Assessment Treatment: 4.73; Score range: 1-5 points Control: 3.30 (***) Note: N.S: Not statistically significant at a 10 percent confidence level. *10% significance level; ** 5% significance level; *** 1% significance level. 46 Table A8: Impact of VR training on learning gains (cognitive skills) as proxied by results in posttest minus pretest for treatment group only. Article Skills Developed Performance Metric Value added (1) (2) (3) (4) Engineering, Science, and Technical Education Buttussi et al (2018) Knowledge of airplane cabin Cognitive assessment. Pretest: 4.7; safety (VR Narrow view) Score range: 0-9 points Posttest: 7.5 (***) Buttussi et al (2018) Knowledge of airplane cabin Cognitive assessment. Pretest: 4.5; safety (VR Wide view) Score range: 0-9 points Posttest: 7.7 (***) Tatli and Ayas (2013) Knowledge about chemical Cognitive Assessment Pretest: 39.66; changes Score range: 0-100 Posttest: 59.33 (***) Tatli and Ayas (2013) Knowledge about laboratory Cognitive Assessment Pretest: 29.66; equipment Score range: 0-100 Posttest: 67.41 (***) General Education Allcoat et al. Knowledge of solar-power Cognitive Assessment Pretest: 1.96; (forthcoming) panel efficiency. Score range: 0-8 Posttest: 5.30 (***) Webster (2015) Knowledge of basic corrosion Cognitive Assessment Pretest: 66.9; prevention and control. Score range: 0-100 points Posttest: 79.3 (***) Akpan and Strayer Knowledge of frog anatomy Cognitive Assessment. Pretest: 10.18; (2010) Score range: 0-25 points. Posttest: 23.33 (***) Note: N.S: Not statistically significant at a 10 percent confidence level. *10% significance level; ** 5% significance level; *** 1% significance level. Table A9: Impact of VR training on learning gains (technical skills) as proxied by results in posttest minus pretest for treatment group only. Article Skills Developed Performance Metric Value-added (1) (2) (3) (4) Health and Safety Skou-Thomsen et al. Performance of eye cataract Ability Assessment. Pretest: 15.33; (2017) removal for novice surgeons Score range: 0-53 points Posttest: 20.31 (***) (those with less than 75 procedures completed) Skou-Thomsen et al. Performance of eye cataract Ability Assessment. Pretest: 25.81; (2017) removal for intermediate Score range: 0-53 points Posttest: 35.58 (**) surgeons (75-999 procedures completed) Skou-Thomsen et al. Performance of eye cataract Ability Assessment. Pretest: 43; (2017) removal for experienced Score range: 0-53 points Posttest: 43 (N.S.) surgeons (1000+ procedures completed) Kiely et al. (2015 Performance of robotic Ability Assessment (GOALS+) Pretest:15.1; suturing (vaginal cuff model) Score range: 0-35 points Posttest:21.4 (***) Kiely et al. (2015 Performance of robotic Ability Assessment (GEARS) Pretest:12.7; suturing (vaginal cuff model) Score range: 6-30 points Posttest:18.4 (***) Kiely et al. (2015 Performance of robotic Ability Assessment. Pretest:1.65; suturing (vaginal cuff model) Total knots completed Posttest:3.38 (***) Kiely et al. (2015 Performance of robotic Ability Assessment. Pretest:1.04; suturing (vaginal cuff model) Satisfactory knots completed Posttest:2.73 (***) Vincent et al. (2008) Performance of mass casualty Ability assessment (triage) Pretest: 9.7; triage Score range: 0-15 points Posttest: 13.4 (***) 47 Vincent et al. (2008) Performance of mass casualty Ability assessment (accuracy) Pretest: 3.4; triage Score range 0-5 points Posttest: 4.7 (***) General Education Smith (2015) Job Interview performance Ability assessment Pretest: 33.8; Score range: 0-100 points Posttest: 36.5 (***) Note: N.S: Not statistically significant at a 10 percent confidence level. *10% significance level; ** 5% significance level; *** 1% significance level. Table A10: Impact of VR training on learning gains (socio-emotional skills) as proxied by results in posttest minus pretest for treatment group only. Article Skills Developed Performance Metric Value added (1) (2) (3) (4) Health and Safety Vincent et al. (2008) Self-efficacy: I am confident in Socio-emotional Assessment Pretest: 3.8; my ability to prioritize the Score range: 1(never)-5(always) Posttest: 4.1 (***) treatment of patients in a mass casualty situation Vincent et al. (2008) Self-efficacy: I am confident in Socio-emotional Assessment Pretest: 3.1; my ability to prioritize the use Score range: 1(never)-5(always) Posttest: 4.2 (***) of resources in a mass casualty situation Vincent et al. (2008) Self-efficacy: I am confident in Socio-emotional Assessment Pretest: 3.4; my ability to identify high risk Score range: 1(never)-5(always) Posttest: 4.2 (***) patients for immediate treatment in a mass casualty situation. Vincent et al. (2008) Self-efficacy: I am confident Socio-emotional Assessment Pretest: 4.0; that I will learn to be an Score range: 1(never)-5(always) Posttest: 4.2 (**) effective first responder Vincent et al. (2008) Self-efficacy: I am confident Socio-emotional Assessment Pretest: 3.8; that patients will consider me Score range: 1(never)-5(always) Posttest: 4.1 (***) an effective first responder. Engineering, Science, and Technical Education Buttussi et al (2018) Self-efficacy addressing an Socio-emotional assessment Pretest: 2.9; emergency (VR narrow view) Scope range: 1(not at all), Posttest: 4.3 (***) 7(very) Buttussi et al (2018) Self-efficacy addressing an Socio-emotional assessment Pretest: 3.0; emergency (VR Wide view) Scope range: 1(not at all), Posttest: 4.0 (***) 7(very) General Education Smith (2015) Self-confidence interviewing Socio-emotional assessment Pretest: 42.5; Score range: 1-7 points. Posttest: 50.2 (N.S.) Akpan and Strayer Attitude towards frog Socio-emotional assessment Pretest: 2.60 (2010) dissection Score range: 1-5 points. Posttest: 2.66 (N.S) Akpan and Strayer Attitude towards science Socio-emotional assessment Pretest: 2.64 (2010) Score range: 1-5 points. Posttest: 2.73 (N.S.) Akpan and Strayer Attitude towards use of PC Socio-emotional assessment Pretest: 2.63 (2010) Score range: 1-5 points. Posttest: 2.71 (N.S.) Note: N.S: Not statistically significant at a 10 percent confidence level. *10% significance level; ** 5% significance level; *** 1% significance level. 48 Table A11: Differences in learning gains (cognitive, technical and socio-emotional skills) between VR (treatment) vs. traditional training (control) Article Skills Developed Performance Metric Value added (1) (2) (3) (4) Farra et al. (2018) Knowledge on emergency Cognitive Assessment. Treatment :18.6; preparedness Score range: 0-100 points Control: 12.0 (N.S) Zaveri et al (2016) Knowledge about pediatric Cognitive Assessment. Treatment :1.0; procedural sedation Score range: 0-20 points Control: 3.0 (***) Tanyildizi and Orhan Knowledge of synchronous Cognitive assessment (written) Treatment:9.08; (2007) motors Score range: 0-30 Control: 6.18 (**) Allcoat et al. Knowledge of solar-power Cognitive Assessment VR vs Slideshow (forthcoming) panel efficiency. Score range: 0-8 Treatment:3.24; Control: 2.68 (N.S.) Makransky, Knowledge of mammalian Cognitive Assessment. VR vs Desktop Terkildsen, and Mayer transient protein expression Score range: 0-10 points Treatment:1.54; (2019) Control: 2.69 (***) Kiely et al. (2015) Performance of robotic Ability Assessment (GOALS+) Treatment: 6.4; suturing (vaginal cuff model) Score range: 0-35 points Control: 2.2 (**) Kiely et al. (2015) Performance of robotic Ability Assessment (GEARS) Treatment: 5.7; suturing (vaginal cuff model) Score range: 6-30 points Control: 2.0 (**) Kiely et al. (2015) Performance of robotic Ability Assessment. Treatment: 1.73; suturing (vaginal cuff model) Total knots completed Control: 0.97 (**) Kiely et al. (2015) Performance of robotic Ability Assessment. Treatment: 1.69; suturing (vaginal cuff model) Satisfactory knots completed Control: 0.85 (**) Note: N.S: Not statistically significant at a 10 percent confidence level. *10% significance level; ** 5% significance level; *** 1% significance level. Table A12: Impact of VR training on learning efficiency Article Skills Developed Performance Metric Learning efficiency (1) (2) (3) (4) Health and Surgical Education Logishetty Performance of total hip arthroplasty Operation time (in minutes) Treatment: 12; (2018) Control: 24 (***) Logishetty Performance of total hip arthroplasty Inclination error (in degrees) Treatment: 3; (2018) Control: 15 (***) Logishetty Performance of total hip arthroplasty Anteversion error (in degrees) Treatment: 4; (2018) Control: 16 (***) Valdis et al. Performance of robotic internal Time on task (in minutes) Treatment: 342.7; (2015). thoracic artery Control: 856.2 (***) Valdis et al. Performance of robotic internal Time on task (in minutes) Treatment: 139.6; (2015). mitral valve annuloplasty Control: 256.2 (***) Larsen et al. Performance of laparoscopic surgery Time on task (in minutes) Treatment: 12; (2009) Control: 24 (***) Engineering, Science, and Technical Education Lampi (2013) Computer network configuration Configuration time (minutes) Treatment:43.5; time Control :50.0; (N.S.) Lampi (2013) Computer network troubleshooting Troubleshooting time (minutes) Treatment:8.21; time Control:9.87 (**) Stone et al. Optimal usage of flat plates Number of plates used Treatment:210; (2011) Control: 288 (***) 49 Stone et al. Optimal usage of welding materials Number of groove plates used Treatment :50; (2011) (groove plates) Control :63 (**) Stone et al. Technical – usage of welding Number of electrodes used (in Treatment :111; (2011) materials (electrodes) pounds) Control: 188 (***) Finkelstein et Building a circuit Time in task (in minutes) Treatment :14.0; al. (2005) Control: 17.7 (***) Note: N.S: Not statistically significant at a 10 percent confidence level. *10% significance level; ** 5% significance level; *** 1% significance level. 50