Policy Research Working Paper 9896 What Explains Boys’ Educational Underachievement in the Kingdom of Saudi Arabia? Mahmoud A. A. Elsayed Aidan Clerkin Vasiliki Pitsia Nayyaf Aljabri Khaleel Al-Harbi Education Global Practice January 2022 Policy Research Working Paper 9896 Abstract This paper examines the factors that are associated with boys’ with girls attending such schools. The findings also indicate underachievement in mathematics and science in Saudi that although greater literacy and numeracy readiness was Arabia, where students attend gender-segregated schools linked with higher science achievement among boys and from grade 1 onward, as well as student achievement in girls, grade 4 boys tended to benefit more from this read- these two subjects in grades 4 and 8 more generally. The iness than girls. In addition, the results show that student paper employs data from two recent large-scale assessments absenteeism in grade 4 is particularly strongly associated of education: Trends in International Mathematics and Sci- with decreases in mathematics achievement among boys. In ence Study 2019 and Saudi Arabia’s National Assessment of grade 8, interactions between student gender and students’ Learning Outcomes 2018. The results suggest that in grade confidence in science, the degree of schools’ emphasis on 4, school climate was more strongly associated with boys’ academic success, and teachers’ age are observed. The paper compared with girls’ achievement in both mathematics and concludes by discussing some of the implications of these science, with boys attending schools of poorer school cli- findings for educators and policy makers in Saudi Arabia. mate having a considerably lower performance compared This paper is a product of the Education Global Practice. It is part of a larger effort by the World Bank to provide open access to its research and make a contribution to development policy discussions around the world. Policy Research Working Papers are also posted on the Web at http://www.worldbank.org/prwp. The authors may be contacted at melsayed@worldbank.org. The Policy Research Working Paper Series disseminates the findings of work in progress to encourage the exchange of ideas about development issues. An objective of the series is to get the findings out quickly, even if the presentations are less than fully polished. The papers carry the names of the authors and should be cited accordingly. The findings, interpretations, and conclusions expressed in this paper are entirely those of the authors. They do not necessarily represent the views of the International Bank for Reconstruction and Development/World Bank and its affiliated organizations, or those of the Executive Directors of the World Bank or the governments they represent. Produced by the Research Support Team What Explains Boys’ Educational Underachievement in the Kingdom of Saudi Arabia?* Mahmoud A. A. Elsayeda, Aidan Clerkinb, Vasiliki Pitsiab, Nayyaf Aljabric, Khaleel Al- Harbic a The World Bank, Washington DC, USA b Educational Research Centre, Dublin, Ireland c Education and Training Evaluation Commission, Riyadh, Saudi Arabia Keywords: Student Achievement, Gender Gap, Boys’ Underperformance JEL classification: I20; I21; I24; I28 * This research has been conducted as part of the Technical Cooperation Program between the World Bank and the Education and Training Evaluation Commission, funded by the Ministry of Finance of the Kingdom of Saudi Arabia under the Reimbursable Advisory Services (RAS) framework. 1. Introduction There is a large and consistent performance gap observed between boys and girls in the Middle East and North Africa (MENA) region. On top of having the second largest proportion of children in learning poverty (i.e., lacking basic skills in reading toward the end of primary school 1), MENA has the largest gender gap in learning poverty among regions with available data (Gregory et al. 2021; World Bank 2021). 2 This gender gap in student performance, however, varies across MENA countries. The Kingdom of Saudi Arabia (KSA) is among the countries with the largest gender gaps in student achievement in the world. Boys in Saudi Arabia consistently and significantly underperform compared to girls across different grades and subjects. For example, in the Trends in International Mathematics and Science Study (TIMSS) 2019, grade 4 boys in Saudi Arabia scored below girls by approximately 26 points in mathematics and 60 points in science (Mullis et al. 2020). A similar gender gap exists in grade 8, where boys underperform girls by approximately 17 points in mathematics and 47 points in science. Furthermore, in grade 4, 53 percent of boys failed to achieve minimum proficiency in mathematics compared to 45 percent of girls. Similarly, in grade 8, 58 percent of boys did not achieve the lowest international benchmark (400 points) to be considered as minimally proficient in mathematics, compared to 49 percent of girls. Data from the Programme for International Student Assessment (PISA) 2018 also show that almost twice as many 15- year-old boys (65 percent) failed to achieve minimum proficiency in reading, compared to 38 percent of girls (OECD 2019). Although the magnitude of the gender gap in Saudi Arabia is among the largest in the world, significant differences in achievement between boys and girls are also observed in many other countries (Mullis et al. 2017; 2020). A large body of research has explored the factors contributing to the gender gap in student performance internationally (Autor et al. 2016; 2019; Bertrand and Pan 2013; Buchmann and DiPrete 2006; DiPrete and Buchmann 2013; DiPrete and Jennings 2012; Fortin, Oreopoulos, and Phipps 2015; Jha and Pouezevara 2016; Legewie and DiPrete 2012; OECD 2021). Evidence from this research suggests that social norms, school characteristics, students’ social and behavioral skills, and family background are the main factors associated with the achievement gap between boys and girls. Prior research indicates that, compared to boys, girls tend to have better noncognitive skills, such as self-regulation and persistence, and spend more time doing assignments and homework (Buchmann and DiPrete 2006; Cornwell, Mustard, and Van Parys 2013; DiPrete and Jennings 2012; Downey and Vogt Yuan 2005; OECD 2021). Other studies found that school characteristics such as school quality, disciplinary practices, and school climate (i.e., the institutional norms, practices, structures, values, and relationships underpinning a student’s experience of school) can affect boys and girls differently. An important study combined birth records with school administrative data from the US state of Florida to identify the effects of school quality (with school-level gains in mathematics and reading scores used to indicate quality) on the gender achievement gap between opposite-gender siblings who attend the same sets of schools (Autor et al. 2016). This study found that boys benefitted more than girls from studying in higher quality schools. Similarly, another recent study (OECD 2021) based on data from two large-scale international assessments showed that school discipline problems affect boys more negatively than girls. Combining data from the Teaching and Learning International Survey (TALIS) 2018 and PISA 2018, the study found that increases in teachers’ perceptions of classroom discipline problems are associated with an increase in the achievement gap between girls and boys (OECD 2021). The results also showed that other school 1 More formally, learning poverty is defined as “being unable to read and understand a simple text by age 10”, whether children are enrolled in school or not (World Bank 2019, 6). 2 The learning poverty rate is calculated using the results of international student assessments, adjusted for the share of out-of-school children. Overall, 59 percent of children in MENA are not able to read and understand an age-appropriate text by age 10. About two-thirds of boys in MENA (66 percent) lack basic skills in reading compared to 56 percent of girls, resulting in the largest gender gap in learning poverty among all regions. 2 organizational issues, such as poor learning conditions and organizational problems, exacerbate the gender gap between girls and boys. That is, boys’ achievement tends to be negatively impacted by challenging learning conditions to a greater extent compared to girls’ achievement. These findings are consistent with previous work on gender gaps in attitudes, showing that girls, in general, tend to report more positive attitudes toward learning (DiPrete and Buchmann 2013). Previous research has also shown that social norms, gender stereotypes, and teacher/school expectations contribute to the performance gap between girls and boys (Stromquist 2007; Page and Jha 2009; Younger and Cobbett 2014; Jha and Pouezevara 2016). In other words, teachers and schools contribute to developing and reinforcing different expectations of appropriate behavior for both boys and girls, which can have the effect of hindering boys’ performance. Building on the existing research, this paper examines the factors that contribute to the gender gap in student performance in Saudi Arabia. In particular, the paper investigates whether a range of student, family, class, and school variables can explain the achievement gap in mathematics and science between boys and girls at grades 4 and 8. The contribution of these variables in predicting overall Saudi student achievement in these two subjects is also examined. Data from both TIMSS 2019 and Saudi Arabia’s National Assessment of Learning Outcomes (NALO) 2018 were employed. Both TIMSS and NALO assess grades 4 and 8 students’ mathematics and science achievement, as well as collect a wide range of contextual information about students, their families, teachers, and schools. This paper contributes to the current literature in two main ways. First, it addresses a need for more evidence about the factors associated with the achievement gap between boys and girls in the MENA region, and especially in Saudi Arabia. The paper explores the surprisingly large and consistent performance gap observed between boys and girls in the region. As shown in figure 1, boys in the region tend to achieve noticeably lower scores than girls in reading, mathematics, and science, with this difference being larger than those in most of the other countries participating in PISA 2018. A further feature of the paper is its examination of two large-scale assessments that have a shared focus on the same domains of study and the same grade levels in a complementary fashion, with data from NALO used to complement findings arising from the analysis of TIMSS data. This paper, therefore, provides important insights for policy makers, both in Saudi Arabia and in other countries in the region, regarding the factors that may contribute to the observed gender gaps in achievement across a range of subjects. Secondly, Saudi Arabia offers a unique setting in which boys and girls attend separate schools on a universal basis starting from grade 1, being educated only by male and female teachers, respectively, in effect inhabiting parallel education systems, as shown in figure 2, which presents the distribution of grade 4 and grade 8 students in single-gender or mixed education among countries participating in TIMSS 2019.3 Although gender-segregated schools are not uncommon in the MENA region, students do not usually attend single-gender schools until the end of primary education. The unique structure of the Saudi education system provides an opportunity to examine, in a multilevel framework, how variance in system- level factors applying only to boys or to girls contributes to the observed individual differences in achievement. This analysis exploits the existence of parallel gender-segregated school environments that operate within a shared overarching cultural context, where expectations and practices outside school also vary significantly between boys and girls. While this paper exploits this feature of the Saudi education system in its analysis, it also acknowledges potential difficulties in interpreting findings due to this extreme degree of separation as teacher and school characteristics are confounded with gender differences in learning outcomes. For example, any differences between girls’ and boys’ educational environments seen in these data are inseparable from the fact that the (male) teachers of boys have been trained and work 3 In 2019, Saudi Arabia announced that boys would start to be educated by female teachers in grades 1 through 3. These boys’ classes are kept separated from the girls’ classes. Currently, there are few girls’ schools offering boys’ classes with female teachers, though the intention is to increase the number. 3 in an environment that is completely separate to the training and work environment of the (female) teachers of girls. This paper is organized as follows. Section 2 describes the education system in Saudi Arabia; section 3 discusses the data and methodology; section 4 presents the results from the models; and section 5 includes a discussion of the main findings and their policy implications. 2. Education System in Saudi Arabia Preuniversity education in Saudi Arabia is divided into four levels: preprimary, elementary, intermediate, and secondary education. Preprimary education includes three years starting at age 3; elementary education starts normally at age 6 and includes grades 1 through 6; intermediate education comprises grades 7 to 9; and secondary education consists of grades 10 through 12. Students in Saudi Arabia attend single-gender schools, except in preprimary. Boys and girls are separated from grade 1 and taught by teachers of the same gender. A recent reform, though still on a limited scale, has allowed boys in grades 1 through 3 to enroll in girls’ elementary schools, and, as a result, boys are taught by female teachers but in separate classes. Saudi Arabia’s K–12 education system includes more than 5.5 million students and more than 450,000 teachers and is administered through 47 education directorates and 383 education offices within directorates. The Saudi Ministry of Education (MOE) plays a central role in setting the policies and regulations for schools across the country including curriculum, teacher hiring and promotion, and student assessment. Directorates and offices are responsible for implementing the directives of the MOE and tend to have a similar structure across the country (OECD 2020). Over the last few decades, Saudi Arabia has achieved substantial progress in improving access to education. For example, the gross enrollment ratio (GER) in primary education increased from 58 percent in 1979 to 101 percent in 2019. During the same period, the GER in secondary education increased from 27 percent to 112 percent. Although increased access to education is a positive development, these large gains in access have not been accompanied by similar improvements in students’ learning outcomes. Overall, learning outcomes remain below expectations in Saudi Arabia. Data from TIMSS 2019 show that, in mathematics, Saudi Arabia ranks 53rd of 58 countries (i.e., below 52 countries) in grade 4 and 37th of 39 (outperformed by 36 countries) in grade 8. In PISA 2018, less than half (48 percent) of 15- year-old students in Saudi Arabia achieved minimum proficiency in reading and almost no student was a top performer (i.e., achieving levels 5 or 6; OECD 2019). Additionally, only 27 percent of Saudi students in the same age group achieved at least minimum proficiency in mathematics, compared to an Organisation for Economic Co-operation and Development (OECD) average of 75 percent. In particular, learning outcomes in Saudi Arabia are low relative to the country’s level of wealth, and studies (Patrinos and Angrist 2019) have identified Saudi Arabia as an outlier when examining Harmonized Learning Outcomes 4 relative to gross domestic product (GDP) worldwide. Although, as noted above, significant gender differences in achievement can be found in other countries — sometimes in favor of boys, and sometimes in favor of girls (Mullis et al. 2017; 2020) — the gender differences observed in Saudi Arabia are consistently among the largest in the world, with girls showing a consistent advantage, across grade levels and subject areas. 4 A composite indicator of learning outcomes at the country level, based on data from large-scale assessments such as TIMSS, PISA, Progress in International Reading Literacy Study (PIRLS), and early-grade reading or mathematics assessments (Angrist et al. 2021). 4 Figure 1. Achievement gap between girls and boys in countries participating in PISA 2018 Source: OECD PISA 2018 data. 5 Percent Percent 0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100 Saudi Arabia Saudi Arabia Islamic Republic of Iran Islamic Republic of Iran Kuwait Jordan Bahrain Qatar Oman Pakistan Kuwait United Arab Emirates Source: TIMSS 2019 data. Malta Bahrain Ireland Egypt Singapore Turkey Qatar Hong Kong, SAR United Arab Emirates Chile Ireland New Zealand England Republic of Korea Spain New Zealand Canada Finland Singapore Russian Federation Hong Kong, SAR Georgia United States Israel Oman Mixed Mixed Australia South Africa Northern Ireland England Australia Turkey Austria 6 Serbia 2019 Chile Germany Slovak Republic Boys-only Boys-only Lebanon Grade 8 Grade 4 Croatia Malaysia Republic of Korea Chinese Taipei France Lithuania Japan Bulgaria Kazakhstan Bosnia and Herzegovina Girls-only Girls-only Japan United States Montenegro Georgia Hungary North Macedonia South Africa Kazakhstan Russian Federation Denmark Czech Republic Lithuania Latvia Portugal Armenia Cyprus Sweden Republic of Azerbaijan Cyprus Belgium (Flemish) Kosovo Finland Italy France Albania Chinese Taipei Hungary Morocco Italy Netherlands Norway Morocco Philippines Norway Poland Figure 2. Distribution of students in single-gender or mixed education among countries participating in TIMSS Portugal Romania Sweden 3. Methods 3.1 Data The analyses described in this paper are drawn from two recent large-scale assessments of education: TIMSS 2019 and Saudi Arabia’s NALO 2018. TIMSS provides a robust nationally representative sample, while the NALO study was conducted to provide regionally representative information (requiring a much larger sample size) as well as national-level data. Although the paper focuses on the national level in the analyses described below, it should be noted that subsequent analyses at the regional level would be possible using NALO data. 5 In addition, there is a substantial degree of overlap between the content covered by the TIMSS and NALO contextual questionnaires, although some variables appear in one study but not the other, or they are presented in slightly different formats. The primary analysis, reported in depth in the next section, is conducted using TIMSS 2019 data. Given the high degree of overlap between the two studies’ focus on mathematics and science, at the same two grade levels (grades 4 and 8), data from NALO 2018 are used to supplement this primary analysis by drawing on variables of particular interest to Saudi Arabia that have no equivalents in TIMSS. In other words, NALO 2018 data are used as supplementary information to shed additional light on questions arising from the multilevel analysis of TIMSS data. For this reason, NALO data are described in text, with full tabulation of the NALO analyses presented in appendix A1. 3.1.1 TIMSS 2019 TIMSS is a study of the International Association for the Evaluation of Educational Achievement (IEA). It assesses mathematics and science achievement at two grade levels, grades 4 and 8. TIMSS has been carried out every four years since 1995. In 2019, 64 countries participated in TIMSS, including Saudi Arabia. In addition to providing countries with robust data on mathematics and science achievement, TIMSS collects a wealth of contextual data from students, parents, teachers, and school principals. In Saudi Arabia, 5,453 grade 4 students (mean age 9.9 years; 49.6 percent male) across 220 public and private schools and 5,680 grade 8 students (mean age 13.9 years; 49.2 percent male) across 209 schools took part in TIMSS 2019. The TIMSS 2019 data were collected using a stratified two-stage cluster sample design, with a sample of schools selected randomly at the first stage and one or more classes of students selected per each of the sampled schools at the second stage (LaRoche, Joncas, and Foy 2020). The Saudi sample of schools was drawn systematically in order for the sampled schools to represent the populations of grade 4 and grade 8 students nationally, with representation from 13 regions and a balance between male and female schools. Implicit stratification methods were used also to ensure representation of the various school types (such as public versus private schools). The IEA requires high participation rates and adherence to standardized administration procedures for participating countries to be included in the international results. The IEA calculates and provides sampling weights (to ensure that the final sample of participating students can be generalized to the national populations of grade 4 and grade 8 students) and plausible values 6 for mathematics and science scores (to ensure accurate population-level estimates of achievement) in order to facilitate appropriate analyses, taking the complex nature of the data into account. 5 Representative regional-level data from NALO could be exploited to examine the varying availability of resources and variability in practices across the different regions of Saudi Arabia, and how region-level differences are related to differences in achievement and the gender gap. 6 Plausible values are generated by imputing a set of values (five values in the case of TIMSS) representing ‘plausible’ estimates of student achievement based on their responses to the assessment and background variables. Plausible values are not suitable for reporting individual-level results, but at the population level the use of plausible values facilitates the calculation of appropriate standard errors for complex survey designs such as those used by TIMSS where each student is administered only a small subset of the items in the assessment. 7 3.1.2 NALO 2018 Saudi Arabia’s NALO is administered by the Education and Training Evaluation Commission (ETEC), an independent government agency responsible for school evaluation, accreditation, and assessment, among other responsibilities. In 2018, the domains assessed by NALO were mathematics and science in grades 4 and 8, which means that the data from NALO 2018 are closely aligned to TIMSS 2019 both in terms of the target domains and grade levels. In NALO 2018, 27,985 grade 4 students (50.2 percent male) across 964 government, private, and Quran schools completed tests of mathematics and science, as did 30,157 grade 8 students (49.6 percent male) across 939 schools. The schools that took part in NALO were sampled using procedures similar to those used in TIMSS. 3.2 Statistical analysis Prior to the main analysis, descriptive statistics were computed, and relevant statistical tests were conducted to provide a comprehensive overview of the gender differences across the contextual variables of interest to this study. The levels of statistical significance along with the relevant effect sizes for each of these differences are reported. The phi (φ) and the Cramer’s V (φc) effect size measures were used for the contextual categorical variables for 2x2 contingency tables and for contingency tables larger than 2x2, respectively. The Cohen’s d effect size measure was used for the contextual continuous variables (Fritz, Morris, and Richler 2012). Cohen (1988) guidelines were used in conjunction with Hattie (2009) guidelines for the interpretation of the effect sizes. The IEA International Database Analyzer (IDB Analyzer) and the Statistical Package for the Social Sciences (SPSS) were used to compute descriptive statistics for TIMSS and NALO, respectively. Next, multilevel linear regression analysis was used to explore the different factors that are associated with boys’ underperformance in Saudi Arabia as well as all students’ achievement in mathematics and science in grades 4 and 8 more generally. The multilevel linear regression models draw on the TIMSS 2019 data, with NALO 2018 data used to supplement the analysis by describing data that are not available in TIMSS. Four separate families of models were constructed: grade 4 mathematics, grade 4 science, grade 8 mathematics, and grade 8 science. All plausible values of achievement and sampling weights were used in all the analyses as per the relevant guidelines by von Davier et al. (2009) and Rutkowski et al. (2010), respectively. Assumptions necessary for conducting the multilevel linear regression analysis (i.e., linearity, homogeneity of variance, normality of errors) were checked and met, and parameters for the models were estimated using the maximum likelihood estimation with robust standard errors (SEs) (Muthén and Muthén 2017). Multilevel linear regression analysis was performed using Mplus 8 software. Along with mathematics and science achievement scores, variables included in the multilevel models are drawn from the student and home questionnaires (level 1 of the analysis) as well as the teacher and school principal questionnaires (level 2 of the analysis). Variables were selected for inclusion in the models a priori, based on the previous literature on student achievement and the gender gap in education in particular, as discussed in section 1 of this paper, in addition to the expected theoretical or policy relevance of these variables to the question of gender differences in performance in the Saudi context. As far as possible, each model was constructed using the same set of variables, notwithstanding some slight differences arising from the selection of variables related specifically to mathematics or science instruction and some differences between the grade 4 and 8 questionnaires. A hierarchical approach was followed, whereby conceptually similar variables were entered into each step of each model in blocks, as shown in table 1. The first step of each model included student gender only. By including the student gender variable into the model alone, the difference in achievement between boys and girls, after controlling for the clustering of the data within classes/schools, could be observed. Next, different blocks of variables were entered into the model one by one to allow both for the examination of their contribution in explaining the difference in achievement between boys and girls 8 as well as their contribution in predicting overall achievement. In the final step of each model, the statistical significance of the interactions between student gender and each of the predictor variables in predicting achievement was explored. Each interaction term was entered into the model individually and all the statistically significant interaction terms were entered into the final model. To facilitate interpretation of the statistically significant interaction terms, those were plotted using the predicted values based on the last step (step 7) of each model. Hence, the interaction plots in the next section present the predicted, rather than the raw, gender differences in each variable in terms of mathematics and science achievement, after accounting for a range of student- and class/school-level predictor variables. Table 1. Steps in building the hierarchical two-level linear regression models Step 1 Step 2 Step 3 Step 4 Step 5 Step 6 Step 7 Student gender ✓ ✓ ✓ ✓ ✓ ✓ ✓ Demographics and home background ✓ ✓ ✓ ✓ ✓ ✓ Student engagement and attitudes ✓ ✓ ✓ ✓ ✓ School climate ✓ ✓ ✓ ✓ Teacher qualifications and practices ✓ ✓ ✓ School leadership and resources ✓ ✓ Interaction terms ✓ Note: ✓ indicates variables included in each step. Equation 1 represents the null multilevel models (i.e., models with no predictor variables), which were applied to the TIMSS data, controlling for their clustering and allowing for the estimation of the proportions of the total variance in the outcome variable that is attributable within and between clusters (i.e., intraclass correlation (ICC) coefficients). = 0 + 0 + (1) where is the outcome variable (e.g., mathematics achievement) of student i in class/school j, 0 is the mean intercept, 0 is the variation of class/school j from the mean intercept, and is the student-level residual error term. Equation 2 represents the multilevel models, which were applied to the TIMSS data, including ν number of predictor variables while controlling for the clustering of the data. = 0 + 1 1 + . . . + + 0 + (2) where is the outcome variable (e.g., mathematics achievement) of student i in class/school j, 0 is the mean intercept, 1 is the regression slope for the predictor variable 1 of student i in class/school j, is the regression slope for the predictor variable of student i in class/school j, 0 is the variation of class/school j from the mean intercept, and is the student-level residual error term. Reported statistics for each of the models include: proportions of variance (R2; expressed as a percentage of the total variance) in achievement explained at each level and step; intercepts with their SEs; unstandardized coefficients (Bs) and standardized coefficients (βs) each accompanied by their SEs for each predictor variable; fit statistics (Loglikelihood (H0), Akaike Information Criterion (AIC), and Bayesian Information Criterion (BIC)) for each step, including the null model; and ICC coefficients for the null model. Bs are expressed in the original unit of each of the predictor variables, while βs can be used to compare 9 the relative strength of each predictor variable in predicting achievement (i.e., to find the most robust predictors of achievement) in each model. Also, although fit indices are not intrinsically interpretable (i.e., their values cannot be interpreted as being large or small in themselves), they can be compared across different steps of each model to check whether changes in the model lead to better fit; for all three indices presented in the tables, smaller values indicate better model fit regardless of the absolute number. 4. Results 4.1 Descriptive statistics 4.1.1 Descriptive statistics of outcome variables Table 2 shows the mean scores in mathematics and science by student gender in the TIMSS 2019 and NALO 2018 assessments. Results from both data sets show that boys consistently underperform compared to girls in both subject domains and at both grade levels in Saudi Arabia. The differences are larger for science than for mathematics at both grade levels, in both TIMSS and NALO. Table 2. Mean mathematics and science achievement in Saudi Arabia, by grade level and student gender, TIMSS 2019 and NALO 2018 TIMSS 2019 NALO 2018 Male Female Male Female M-F M-F Mean (SD) Mean (SD) Mean (SD) Mean (SD) Mathematics 385 (108) 412 (91) -26 496 (103) 504 (97) -8 Grade 4 Science 373 (116) 434 (97) -60 483 (101) 518 (96) -35 Mathematics 385 (80) 403 (74) -17 493 (101) 507 (99) -14 Grade 8 Science 408 (91) 455 (79) -47 480 (100) 521 (96) -42 Source: TIMSS 2019 data reported by Mullis et al. (2020). NALO data are authors’ own calculations. All differences between genders have been rounded. 4.1.2 Descriptive statistics of predictor variables (TIMSS 2019) Table 3 shows the percentages of boys and girls in grades 4 and 8 in Saudi Arabia across the TIMSS 2019 contextual categorical variables of interest to this paper (i.e., categorical variables that were included in the models). Table 4 shows the means (M), standard deviations (SD), minimum (min), and maximum scores (max) for boys and girls in grades 4 and 8 in Saudi Arabia across the TIMSS contextual continuous variables of interest to this study (i.e., continuous variables that were included in the models). The tables also include the effect sizes (φ/φc and d) for the gender differences in each of the contextual variables (see section 3.2 for further information). Given that some of the estimates presented in table 4 are not intrinsically interpretable (i.e., their values cannot be interpreted as being large or small per se), cut-scores and their corresponding interpretation along the continuum of each of these variables as set by TIMSS are provided in table A2.1 in appendix A2. Across most of the TIMSS 2019 contextual variables, the differences between boys and girls are statistically significant; however, most of the differences yielded small to moderate effect sizes. Among these differences the most noticeable at grade 4 are observed in teachers’ absenteeism, with girls being more likely to attend schools where teacher absenteeism is a minor problem compared to boys who are more likely to attend schools where teacher absenteeism is either not a problem or a moderate to serious problem. Another considerable difference can be found in teachers’ major area of study, with more teachers in boys’ schools having education and mathematics or science as their major area of study, compared to teachers in girls’ schools who tend to primarily have mathematics or science but not education as their major area of study. Differences in teachers’ professional development are also noticeable, with teachers in boys’ schools reporting having attended fewer hours of professional development on mathematics and science compared to teachers in girls’ schools. Also, in comparison to grade 4 girls, grade 4 boys are less likely to have positive attitudes (liking/confidence) toward mathematics 10 and science, and a lower sense of school belonging, plus they experience more frequent bullying. Additionally, boys’ schools tend to be less safe and orderly, and teachers in boys’ schools tend to also report lower levels of job satisfaction. At grade 8, both mathematics and science teachers are younger, on average, in boys’ schools compared to teachers in girls’ schools, while, in line with findings at grade 4, more grade 8 teachers in boys’ schools have education and mathematics or science as their major area of study, compared to teachers in girls’ schools, who tend to primarily have mathematics or science but not education as their major area of study. Grade 8 teachers in boys’ schools also report having attended fewer hours of professional development on mathematics and science compared to teachers in girls’ schools. Additionally, grade 8 boys’ schools tend to be less safe and orderly, with lower levels of discipline and more frequent bullying among students compared to girls’ schools, while teachers in boys’ schools tend to also report lower levels of job satisfaction. 7 Table 3. Contextual categorical variables by student gender, TIMSS 2019 Grade 4 Grade 8 Male Female Male Female φ/φc φ/φc Student-level variables % % % % Student immigration status 0.029 0.051** native 64.8 66.9 76.7 81.4 second-generation immigrant students 28.8 27.6 15.4 11.7 first-generation immigrant students 6.4 5.5 7.9 6.8 Student owns mobile phone 0.036** 0.000 yes 62.7 58.3 81.4 82.3 no 37.3 41.7 18.6 17.7 Preschool attendance and duration 0.018 did not attend 30.8 30.0 1 year or less 39.7 39.0 2 years 18.3 18.3 3 years or more 11.2 12.7 Student absenteeism 0.131*** 0.182*** never or almost never 36.6 47.8 34.3 20.0 once every two months 10.3 9.9 15.8 11.7 once a month 12.3 10.5 17.7 21.7 once every two weeks 10.0 7.3 13.7 21.5 once a week 30.8 24.5 18.6 25.1 Time spent on mathematics homework 0.051** 15 minutes or less 75.8 73.3 16 minutes or more 24.2 26.7 Time spent on science homework 0.045** 15 minutes or less 71.0 77.4 16 minutes or more 29.0 22.6 7 The TIMSS variable describing teachers’ qualifications for grade 4 shows that a majority of teachers in Saudi Arabia have secondary education. This is not consistent with official data from the Ministry of Education or other available data sources, which indicate that most teachers in the country have at least a bachelor’s degree. Due to this inconsistency, the authors of the study decided to omit this variable from the analysis. 11 Grade 4 Grade 8 Male Female Male Female φ/φc φ/φc Class/school-level variables % % % % School location 0.068*** 0.247*** urban 61.5 58.7 66.4 47.9 suburban/medium size city or large town 27.5 27.4 15.0 33.7 small town or village/remote rural 11.0 13.9 18.6 18.4 Teacher age (mathematics teacher) 0.073*** 0.243*** 29 years or younger 9.7 4.8 28.3 3.9 30–39 years 39.6 47.0 48.2 62.1 40 years or older 50.8 48.2 23.5 34.0 Teacher age (science teacher) 0.047** 0.126*** 29 years or younger 12.7 6.5 10.2 2.1 30–39 years 38.9 46.8 56.2 50.7 40 years or older 48.4 46.7 33.6 47.2 Time assigned to mathematics homework 0.147*** 0.119*** 15 minutes or less 52.7 67.2 58.4 62.8 16 minutes or more 47.3 32.8 41.6 37.2 Time assigned to science homework 0.054** 0.180*** 15 minutes or less 77.3 69.8 66.8 81.4 16 minutes or more 22.7 30.2 33.2 18.6 Poor teacher timekeeping 0.157*** 0.165*** not a problem 42.0 52.1 39.3 56.6 minor problem 35.6 34.0 42.2 26.5 moderate or serious problem 22.4 13.9 18.5 16.9 Teacher absenteeism 0.211*** 0.066*** not a problem 45.0 33.1 47.8 42.8 minor problem 28.6 48.9 31.8 32.0 moderate or serious problem 26.4 18.0 20.4 25.2 Teacher highest level of education (mathematics teacher) 0.174*** Up to ISCED level 6 — bachelor’s or equivalent level 97.7 100.0 (grade 8) ISCED levels 7 & 8 — master’s or doctorate degree 2.3 0.0 (grade 8) Teacher highest level of education (science teacher) 0.038** Up to ISCED level 6 — bachelor’s or equivalent level 91.3 98.0 (grade 8) ISCED levels 7 & 8 — master’s or doctorate degree 8.7 2.0 (grade 8) Teacher major area of study (mathematics teacher) 0.222*** 0.209*** education and mathematics (grade 4) 40.7 21.0 mathematics but not education (grade 4) 47.4 59.6 mathematics and mathematics education (grade 8) 24.9 11.6 mathematics but not mathematics education (grade 8) 68.3 70.6 all other majors 11.9 19.4 6.8 17.8 Teacher major area of study (science teacher) 0.245*** 0.167*** education and science (grade 4) 40.9 15.8 science but not education (grade 4) 49.1 66.9 science and science education (grade 8) 27.2 10.3 science but not science education (grade 8) 65.6 85.0 all other majors 10.0 17.3 7.2 4.7 12 Grade 4 Grade 8 Male Female Male Female φ/φc φ/φc Class/school-level variables % % % % Teacher major area of study (science teacher) 0.245*** 0.167*** education and science (grade 4) 40.9 15.8 science but not education (grade 4) 49.1 66.9 science and science education (grade 8) 27.2 10.3 science but not science education (grade 8) 65.6 85.0 all other majors 10.0 17.3 7.2 4.7 Professional development hours on mathematics 0.326*** 0.196*** none 10.5 4.6 11.9 3.8 less than 6 hours 20.1 5.4 14.2 12.2 6–15 hours 34.0 28.0 34.4 26.1 16–35 hours 19.2 27.1 20.9 31.3 more than 35 hours 16.2 34.9 18.5 26.7 Professional development hours on science 0.238*** 0.324*** none 18.6 8.7 13.5 0.2 less than 6 hours 12.4 13.0 12.8 8.9 6–15 hours 32.9 28.1 35.2 26.0 16–35 hours 21.1 16.9 16.9 34.4 more than 35 hours 15.0 33.3 21.5 30.6 Professional development on mathematics content 0.007 0.098*** yes 55.3 55.0 42.0 50.0 no 44.7 45.0 58.0 50.0 Professional development on science content 0.131*** 0.182*** yes 43.5 56.1 43.6 62.3 no 56.5 43.9 56.4 37.7 Professional development on mathematics pedagogy 0.075*** 0.081*** yes 56.8 61.8 60.5 67.1 no 43.2 38.2 39.5 32.9 Professional development on science pedagogy 0.053** 0.172*** yes 46.3 47.1 57.2 76.8 no 53.7 52.9 42.8 23.2 Principal highest level of education 0.112*** 0.027* ISCED Level 6 — bachelor’s or equivalent level 82.4 92.9 90.9 93.7 ISCED levels 7 & 8 — master’s or doctorate degree 17.6 7.1 9.1 6.3 Principal qualification in educational leadership 0.168*** 0.233*** yes 25.6 14.8 28.5 8.5 no 74.4 85.2 71.5 91.5 School library 0.187*** 0.026 yes 69.5 49.4 74.7 72.4 no 30.5 50.6 25.3 27.6 Note: *p < .05; **p < .01; ***p < .001. 13 Table 4. Contextual continuous variables by student gender, TIMSS 2019 Grade 4 Grade 8 Male Female Male Female d d Student-level variables m SD min max m SD min max m SD min max m SD min max Home resources for learning (grade 4) 9.3 1.41 3.8 14.9 9.5 1.34 3.8 14.9 0.109 Home educational resources (grade 8) 9.4 1.59 4.6 13.5 9.5 1.51 4.6 13.5 0.058 Literacy and numeracy readiness for school 10.1 2.01 3.1 14.6 10.5 1.91 3.1 14.6 0.235*** Student likes learning mathematics 10.3 2.03 3.9 13.1 11.3 1.88 3.9 13.1 0.501*** 10.3 1.97 5.1 13.9 10.0 2.08 5.1 13.9 0.143** Student likes learning science 10.0 2.33 2.7 13.2 11.3 2.15 2.7 13.2 0.575*** 10.4 2.09 3.9 13.5 10.7 2.15 3.9 13.5 0.146* Student confident in mathematics 10.1 2.08 2.8 14.4 11.1 2.17 2.8 14.4 0.470*** 10.5 1.88 3.3 15.9 10.5 2.01 3.3 15.9 0.010 Student confident in science 10.0 2.01 3.4 13.3 11.0 2.05 3.4 13.3 0.517*** 10.5 1.92 3.3 14.8 10.9 2.02 3.5 14.8 0.223*** Student sense of school belonging 10.0 2.33 3.1 12.8 11.1 2.01 3.1 12.8 0.533*** 10.2 2.09 3.9 13.3 10.3 1.92 3.9 13.3 0.075 Bullying 8.9 2.25 2.9 12.7 10.2 2.14 2.9 12.7 0.619*** 9.8 2.35 2.0 12.9 10.6 1.97 3.6 12.9 0.360*** Class/school-level variables School mean of home resources for learning 9.2 0.83 6.3 11.3 9.5 0.68 7.2 11.3 0.409** (grade 4) School mean of home educational resources 9.4 0.69 7.9 11.4 9.5 0.67 8.3 11.6 0.132 (grade 8) Safe and orderly schools (mathematics teacher) 10.8 1.83 5.8 13.4 11.9 1.71 7.5 13.4 0.644*** 11.2 1.96 7.2 13.9 11.9 1.92 6.3 13.9 0.376* Safe and orderly schools (science teacher) 10.9 2.10 5.1 13.4 11.7 1.95 3.9 13.4 0.390* 10.9 1.88 6.3 13.9 11.7 2.11 4.9 13.9 0.410** School emphasis on academic success 10.8 2.12 4.2 16.4 11.8 2.14 7.2 16.4 0.446** 10.8 1.89 6.7 16.4 11.5 2.03 5.8 16.4 0.337* School discipline 9.5 2.40 3.7 12.8 10.6 2.15 3.7 12.8 0.443* 10.0 2.68 4.1 14.0 11.6 2.37 4.3 14.0 0.616** Teacher years of experience (mathematics 15.9 8.43 0.0 31.0 14.4 7.70 0.0 33.0 0.183 10.6 7.29 0.0 30.0 12.2 7.27 1.0 30.0 0.216 teacher) Teacher years of experience (science teacher) 15.3 9.36 0.0 36.0 12.8 7.86 0.0 31.0 0.295 13.1 7.43 0.0 36.0 12.6 7.53 0.0 30.0 0.066 Teacher job satisfaction (mathematics teacher) 10.6 1.36 6.4 11.7 11.2 0.87 8.3 11.7 0.569*** 10.7 1.50 5.3 11.8 11.3 0.74 7.2 11.8 0.465** Teacher job satisfaction (science teacher) 10.6 1.50 5.8 11.7 11.3 0.88 6.9 11.7 0.553*** 10.5 1.65 5.6 11.8 11.5 0.64 8.5 11.8 0.783*** Principal years of experience 9.9 6.65 0.0 30.0 8.6 7.09 0.0 33.0 0.185 9.4 7.56 0.0 32.0 8.6 8.30 0.0 39.0 0.102 Note: *p < .05; **p < .01; ***p < .001. 14 4.1.3 Additional contextual information for grades 4 and 8 (NALO 2018) All of the differences between boys and girls in grades 4 and 8 noted in this section are statistically significant at the p < .001 level. However, the effect sizes associated with these differences were generally small (appendix A1). NALO data indicate that 14.5 percent of grade 4 boys had repeated a year at school because of poor academic performance, compared to 6.6 percent of grade 4 girls (table A1.1). In addition, grade 4 boys reported engaging in a lower level of reading than grade 4 girls. For example, 22.6 percent of boys reported never having read a book, compared to 12.0 percent of girls. Conversely, 33.8 percent of boys have read more than 10 books, compared to 40.4 percent of girls. However, when asked whether mathematics is important in life, grade 4 boys and girls provided broadly similar levels of agreement (table A1.1). Among grade 4 immigrant students (those reported by their parents to have been born outside Saudi Arabia), boys were more likely to attend a private school (13.9 percent) than girls (2.6 percent) (table A1.2). However, the vast majority of both boys (85.0 percent) and girls (95.2 percent) attended government schools. NALO data also show the differences in perceptions among teachers, school principals, and parents in boys’ and girls’ schools. Teachers of grade 4 boys are substantially less likely to agree that parents have a good understanding of their child’s current academic level, suggesting greater misalignment between student performance and parental understanding in boys’ schools compared to girls’ schools. This difference was associated with the largest effect size observed among all the selected NALO variables (φ = .302; table A1.3). Teachers in boys’ schools are also less likely to report that high- achieving students were respected among their peers compared to teachers in girls’ schools (table A1.3). School principals’ reports correspond with those of their teachers in relation to parental support for learning. Principals of girls’ schools report a higher degree of parental support for learning than those in boys’ schools, and also a higher degree of satisfaction among parents with their child’s educational progress (table A1.4). More grade 4 boys (67.3 percent) have access to a school library from which they can borrow compared to grade 4 girls (48.5 percent), despite girls reporting reading more books, as noted above. Similar to grade 4, boys in grade 8 are more likely (6.9 percent) than girls (4.5 percent) to report having repeated a year at school because of poor academic performance (table A1.1). However, both in absolute and in relative terms, the differences are smaller at grade 8 than grade 4. In terms of reading behavior, grade 8 boys report a more nuanced pattern than seen at grade 4. As at the lower grade, more boys (30.6 percent) than girls (19.3 percent) report never having read a book. However, boys and girls are equally likely to report having read more than six or more than 10 books. Another difference from students’ responses at grade 4 is that, in grade 8, boys agree more strongly that mathematics is important in life relative to their female peers (47.6 percent of boys agreeing a lot compared to 36.3 percent of girls). Among immigrant students, grade 8 boys were much more likely to attend a private school (11.1 percent) than grade 8 girls (2.7 percent) (table A1.2). Nonetheless, as at grade 4, most boys (87.7 percent) and girls (96.3 percent) attended government schools. Again, similar to grade 4, NALO data for grade 8 show the differences in perceptions among teachers, school principals, and parents in boys’ and girls’ schools. A substantially lower proportion of teachers of grade 8 boys agree that parents have a good understanding of their child’s current academic level (associated with the second-largest effect size observed: φ = .236; table A1.3). Similar to grade 4, this suggests greater misalignment between student performance and parental understanding among parents in boys’ schools. A lower level of respect for students who achieve at a high academic level is reported in boys’ schools, although teachers in boys’ schools are slightly more likely to view their grade 8 students as always being keen to excel academically. School principals report weaker parental support for learning for grade 8 boys than for grade 8 girls (table A1.4), and that boys’ parents’ expectations are being met to a lesser extent. There is little difference in grade 8 boys’ and girls’ access to a school library from which they can borrow books. 15 4.2 Overall findings from multilevel models In this section, the main findings on boys’ underperformance in Saudi Arabia and factors associated with student achievement, in general, are first presented. Then, results from the multilevel models are shown in detail for grades 4 and 8 separately. Table 5 summarizes the main findings from the analysis. As discussed in the methods section, a series of hierarchical two-level linear regression models were run, starting with a simple model that includes student gender but no other predictor variables. Then, with each step, the changes in the achievement gap between girls and boys are explored when adding additional information on student demographics and home background, student engagement and attitudes, school climate, teacher qualifications, and school leadership and resources. Table 5 shows the coefficient on the gender gap. Regression estimates for the other predictor variables are presented in appendix A3. As shown in table 5, boys underperform compared to girls in Saudi Arabia across both grades and subjects (step 1). The achievement gap between boys and girls is greater in science than in mathematics across both grades. For example, boys in grade 4 underperform girls by 53 points in science compared to 20 points in mathematics. The results also show that, in grade 4, controlling for student, teacher, and school characteristics accounts for the entire gender gap in mathematics and more than half of the gap in science. Specifically, when student-level predictor variables such as student demographics, home resources for learning, and literacy and numeracy readiness 8 are taken into account, the gap in mathematics between girls and boys drops from 20 points to 8 points and it is no longer statistically significant (step 2). However, in grade 8, controlling for a wide range of observable characteristics from the student, parent, teacher, and principal questionnaires explains a relatively small portion of the achievement gap between boys and girls in both mathematics and science. As shown in table 5, the gender gap between grade 8 boys and girls declines by 4 points in mathematics and 11 points in science once all the predictor variables are included (step 6). However, a significant unexplained gender gap favoring girls still exists in both subjects in grade 8. Estimates from table 5 show that boys underperform girls by 16 points in mathematics and 40 points in science, even after controlling for all observable characteristics. Some potential explanations for this remaining gender gap are discussed in section 5. Table 5. Summary of main findings of the hierarchical two-level linear regression models Step 1 Step 2 Step 3 Step 4 Step 5 Step 6 Student gender (female = 1) 19.73* 7.68 -7.42 -17.55 -18.68 -6.71 Grade 4 mathematics (8.67) (7.95) (7.98) (8.99) (12.51) (11.92) 53.14 39.36 21.61 17.44 15.04 21.52 Grade 4 science (8.29)*** (7.04)*** (6.57)** (7.38)* (8.57) (8.79)* 20.78 15.79 18.14 15.47 9.88 16.29 Grade 8 mathematics (5.29)*** (4.78)** (4.52)*** (4.92)** (6.06) (6.98)* 50.40 45.83 42.28 35.42 34.88 39.89 Grade 8 science (5.80)*** (5.26)*** (5.00)*** (5.03)*** (7.10)*** (7.52)*** Demographics and home background ✓ ✓ ✓ ✓ ✓ Student engagement and attitudes ✓ ✓ ✓ ✓ School climate ✓ ✓ ✓ Teacher qualifications and practices ✓ ✓ School leadership and resources ✓ Note: ✓ indicates variables included in each step. 8 ‘Literacy and numeracy readiness’ is a TIMSS composite scale based on parental reports of the extent to which their child could demonstrate specified literacy and numeracy skills when they started school (e.g., read some words, write their own name, count by themselves, recognize written numbers). The scale is available only for grade 4 students, as the parents of grade 8 students were not asked to complete a questionnaire. 16 To explore the extent to which predictors in the models have different effects on boys’ and girls’ performance, the statistical significance of the interactions between student gender and each of the predictor variables is examined in step 7 of each model. The results from these interactions are presented in the final step (step 7) of each model, in tables 6 and 7. The final models explained a substantial proportion of observed achievement variance across both grades and subjects. For example, the final model for grade 4 mathematics explains 71 percent of variance at the class/school level and 26 percent of variance at the student level, or 40 percent of the total observed variance. Overall, results for the examined interaction terms show that, in grade 4, school climate, student absenteeism, and early numeracy and literacy skills contribute to the achievement gap between girls and boys in Saudi Arabia. A safe and orderly school climate is more strongly associated with improvements in boys’ mathematics and science achievement than girls’ achievement (in the two subjects). The findings also indicate that boys’ mathematics achievement decreases to a greater degree than girls’ achievement with more frequent student absenteeism. In addition, the results suggest that, even though greater literacy and numeracy readiness was linked with improvements in science achievement of both boys and girls, boys tended to benefit more from this readiness than girls. For grade 8, boys’ mathematics achievement increases to a greater degree in schools with stronger emphasis on academic success than girls’ achievement. Higher levels of confidence in science were also associated with greater achievement gains in the subject among boys compared to girls. 4.3 Findings for grade 4 In this section, the results from the final two steps (steps 6 and 7) of each model are presented for grade 4 mathematics and science. Results for all steps are shown in appendix A3. 4.3.1 Mathematics Table 6 provides the coefficients and model statistics for grade 4 mathematics from steps 6 and 7. Step 7, the final model, explains 71 percent of the class/school-level variance and 26 percent of the student-level variance, or 40 percent of the total observed variance. As shown in step 6, the coefficient of student gender is negative and statistically insignificant. This indicates that when controlling for student, home, teacher, and school characteristics, the achievement gap between grade 4 boys and girls in mathematics becomes statistically insignificant. Findings from step 6 also suggest that students’ home resources for learning, early literacy and numeracy skills, absenteeism, bullying, attitudes toward mathematics, and sense of school belonging 9 are significantly associated with student achievement. For instance, students who are absent once a week tend to underperform students who are never or almost never absent by 24 points, which is equivalent to 24 percent of a standard deviation. Also, students with stronger early literacy and numeracy skills achieved higher mathematics scores (B = 7.6, p < .001) relative to other students. Students who reported being bullied less frequently achieved higher mathematics scores (B = 4.9, p < .001). At the class/school level, school location, poor teacher timekeeping, teacher experience, and professional development are significantly associated with student performance. Surprisingly, after holding other variables constant, students in schools located in small towns or remote areas perform better in mathematics than students in urban areas (B = 54.9, p < .05), while with each additional year of teacher experience, students score 1.8 points higher in mathematics (p < .01). In step 7, the interactions between student gender and each of the predictors are examined. However, only the results for interaction terms that are statistically significant are reported. Overall, poor school climate tends to affect boys more negatively than girls. As shown in figure 3, the achievement gap 9 Surprisingly, the coefficient of sense of school belonging is negative, suggesting a negative correlation between student achievement and sense of school belonging. One potential explanation is that high-achieving students in Saudi Arabia may feel alienated within schools or not appreciated/respected by their peers, which is consistent with some of previous literature discussed in the introduction (e.g., Jha and Pouezevara 2016). 17 between boys and girls is greater in schools with poor school climate relative to other schools. In schools with a safe and orderly school climate, boys and girls tend to perform similarly. Figure 3. Interaction between student gender and safe and orderly school climate on mathematics achievement, grade 4 700 ITSEX_R Male 650 Female 600 550 Mathematics score 500 450 400 350 300 250 200 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Safe and orderly school climate Note: The plot presents the predicted values from step 7 of the mathematics model for grade 4 (table 6). As shown in figures 4 and 5, a number of factors are associated with the underperformance of grade 4 boys in comparison to girls in school. One factor includes being absent from school once a week, which is more strongly associated with decreases in boys’ mathematics achievement compared to girls’. This means that boys’ mathematics achievement appears to suffer more from frequent absences from school compared to girls’ achievement (figure 4). Teacher age also affects achievement of grade 4 boys and girls differently in Saudi Arabia. Having a younger teacher (29 years or younger), rather than an older teacher (40 years or older) is more strongly associated with lower mathematics achievement among boys compared to girls (figure 5). 18 Figure 4. Interaction between student gender and frequency of student absenteeism on mathematics achievement, grade 4 700 ITSEX_R Male 650 Female 600 550 Mathematics score 500 450 400 350 300 250 200 never or almost never once a week Student absenteeism Note: The plot presents the predicted values from step 7 of the mathematics model for grade 4 (table 6). Only the reference category and the category for which a statistically significant interaction with student gender was found are presented. Figure 5. Interaction between student gender and teacher age on mathematics achievement, grade 4 700 ITSEX_R Male 650 Female 600 550 Mathematics score 500 450 400 350 300 250 200 29 years or younger 40 years or older Teacher age Note: The plot presents the predicted values from step 7 of the mathematics model for grade 4 (table 6). Only the reference category and the category for which a statistically significant interaction with student gender was found are presented. 4.3.2 Science Table 6 also presents the coefficients and model statistics for grade 4 science. Similar to grade 4 mathematics, the results from both steps 6 and 7 are presented. The final model explains 65 percent of the class/school-level variance and 27 percent of the student-level variance, or 39 percent of the total 19 observed variance. As shown in step 6, the coefficient of student gender is positive and statistically significant (B = 21.5, p < .05). This coefficient is much smaller in magnitude compared to the basic model (step 1 in table 5), which suggests that controlling for student, home, teacher, and school characteristics reduces the science achievement gap between grade 4 boys and girls by more than half. There is still, however, a significant unexplained gap between boys and girls in grade 4 science. Some potential explanations for this gap are discussed in section 5. Students’ immigration status, early literacy and numeracy skills, absenteeism, bullying, attitudes toward science, and sense of school belonging are significantly associated with student achievement in science. For example, students who are absent once a week tend to underperform students who are never or almost never absent by 21 points. Also, students with stronger early literacy and numeracy skills tend to perform better in grade 4 science compared to other students (B = 7.6, p < .001). School location, school mean of home resources, teacher experience, age, and professional development are the class/school-level variables that were significantly associated with student performance in science. Surprisingly, after holding other variables constant, students in schools located in small towns or remote areas perform better in science than students in urban areas (B = 60.8, p < .01). Students of younger teachers scored lower in science than students of older teachers. Students whose teachers are between 30 and 39 years old scored 29 points lower than students of teachers who are 40 years or older and students whose teachers are 29 years old or younger scored 44 points lower than students of teachers who are 40 years or older. Also, students whose teachers have not participated in any professional development training on science (i.e., completed zero hours in professional development) tend to score much lower than students whose teachers have completed more than 35 hours in professional development in science (B = -35.3, p < .05). Consistent with grade 4 mathematics, results in science from step 7 show that poor school climate affects boys more negatively than girls (figure 6). The achievement gap between boys and girls is greater in schools with poor school climate relative to other schools. In addition, boys tend to benefit more from literacy and numeracy readiness than girls (figure 7), and girls in urban and suburban areas outperform boys (figure 8). Figure 6. Interaction between student gender and safe and orderly school climate on science achievement, grade 4 700 ITSEX_R Male 650 Female 600 550 Science score 500 450 400 350 300 250 200 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Safe and orderly school climate Note: The plot presents the predicted values from step 7 of the science model for grade 4 (table 6). 20 Figure 7. Interaction between student gender and literacy and numeracy readiness for school on science achievement, grade 4 700 ITSEX_R Male 650 Female 600 550 Science score 500 450 400 350 300 250 200 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Literacy and numeracy readiness for school Note: The plot presents the predicted values from step 7 of the science model for grade 4 (table 6). Figure 8. Interaction between student gender and school location on science achievement, grade 4 700 ITSEX_R Male 650 Female 600 550 Science score 500 450 400 350 300 250 200 urban suburban/medium size city or large town School location Note: The plot presents the predicted values from step 7 of the science model for grade 4 (table 6). Only the reference category and the category for which a statistically significant interaction with student gender was found are presented. 21 Table 6. Steps 6 and 7 from the hierarchical two-level linear regression models for mathematics and science achievement, grade 4, TIMSS 2019 Mathematics Science Step 6 Step 7 Step 6 Step 7 student-level (%) 21.0 26.2 22.5 27.4 R2 class/school-level (%) 54.9 71.4 46.0 64.8 Intercept (SE) 396.29 (21.00) 296.27 (37.86) 433.35 (21.20) 320.48 (35.66) Student-level variables (reference category) B (SE) B (SE) B (SE) B (SE) Student gender (male) -6.71 (11.92) 164.98 (54.64)** 21.52 (8.79)* 249.50 (59.64)*** Student immigration status (native) second-generation immigrant students 15.40 (11.36) 14.58 (11.44) 14.12 (11.98) 13.25 (12.08) first-generation immigrant students 26.74 (17.41) 26.53 (17.24) 32.10 (14.51)* 37.67 (13.70)** Home resources for learning 4.35 (2.11)* 4.64 (2.11)* 2.85 (2.79) 2.83 (2.79) Student owns mobile phone (no) -0.10 (4.70) 0.10 (4.51) -1.58 (1.96) -1.45 (1.96) Literacy and numeracy readiness for school 7.57 (1.44)*** 7.57 (1.43)*** 7.62 (1.41)*** 10.35 (1.91)*** Preschool attendance and duration (did not attend) 1 year or less 1.51 (6.97) 0.38 (6.81) 4.88 (5.19) 3.79 (5.14) 2 years -6.20 (8.84) -8.60 (8.84) 8.61 (6.40) 6.55 (6.36) 3 years or more 15.10 (9.06) 14.00 (8.81) 8.71 (8.62) 8.66 (8.49) Student absenteeism (never or almost never) once every two months 5.36 (9.85) 4.74 (9.69) 7.90 (7.20) 8.14 (7.23) once a month 7.54 (8.60) 7.21 (8.49) -2.91 (7.42) -2.24 (7.39) once every two weeks -20.34 (8.88)* -21.87 (8.85)* -15.47 (9.37) -15.07 (9.20) once a week -23.53 (7.35)** -34.61 (8.67)*** -21.05 (5.67)*** -20.78 (5.50)*** Student likes learning mathematics/science 4.67 (1.60)** 4.82 (1.55)** 5.33 (1.76)** 5.16 (1.75)** Student confident in mathematics/science 7.91 (1.50)*** 8.24 (1.47)*** 7.89 (1.62)*** 8.09 (1.61)*** Student sense of school belonging -3.10 (1.16)** -3.10 (1.17)** -3.68 (1.32)** -3.55 (1.31)** Bullyinga 4.91 (1.28)*** 5.20 (1.26)*** 5.78 (1.44)*** 5.91 (1.44)*** Class/school-level variables (reference category) School mean of home resources for learning 15.70 (9.03) 10.65 (8.07) 22.16 (6.74)** 17.82 (5.67)** School location (urban) suburban/medium size city or large town -8.40 (11.43) -3.81 (9.56) -10.96 (9.66) -36.03 (10.25)*** small town or village/remote rural 54.91 (26.49)* 42.47 (24.77) 60.77 (20.36)** 60.10 (16.90)*** Teacher age (40 years or older) 29 years or younger 15.59 (19.55) -4.73 (18.68) -43.78 (17.99)* -17.31 (16.16) 30–39 years 16.37 (14.17) 18.35 (12.26) -28.52 (10.26)** -20.29 (9.24)* Safe and orderly schools 0.32 (3.21) 6.63 (4.00) 5.29 (3.07) 9.92 (3.52)** School emphasis on academic success -0.26 (2.76) 2.93 (2.58) -1.91 (1.83) -0.75 (1.71) School discipline -3.91 (3.80) -1.60 (3.74) -1.84 (2.68) -0.85 (2.51) Teacher years of experience 1.76 (0.60)** 2.23 (0.59)*** -1.51 (0.60)* -0.95 (0.59) 22 Class/school-level variables (reference category) B (SE) B (SE) B (SE) B (SE) Teacher job satisfaction 1.10 (4.34) -0.03 (4.03) -3.32 (3.34) -4.52 (3.17) Time assigned to mathematics/science homework (15 minutes or less)b 4.63 (8.85) 9.68 (8.52) -7.30 (11.31) -5.47 (9.81) Poor teacher timekeeping (not a problem) minor problem -24.39 (11.49)* -14.89 (8.67) 2.50 (11.75) 4.86 (10.50) moderate or serious problem -2.18 (17.35) 13.44 (16.06) -1.66 (17.19) -0.39 (15.45) Teacher absenteeism (not a problem) minor problem 8.04 (13.45) 5.03 (11.02) 1.01 (13.17) -2.16 (10.94) moderate or serious problem -7.64 (23.90) -8.53 (21.01) 1.87 (16.55) 5.55 (13.59) Teacher major area of study (education and mathematics/science) mathematics/science but not education -8.20 (9.26) -6.76 (8.18) -2.69 (9.78) -7.35 (8.84) all other majors -9.88 (12.49) -18.21 (10.74) -23.80 (18.47) -34.13 (17.72) Professional development hours on mathematics/science (more than 35 hours) 16–35 hours -29.46 (11.69)* -22.62 (10.50)* 3.50 (14.09) 5.25 (11.73) 6–15 hours -5.35 (13.24) 4.99 (12.26) -5.01 (12.01) -4.52 (10.92) less than 6 hours -3.30 (14.44) 4.14 (12.28) 18.81 (15.70) 12.82 (13.31) none -1.71 (21.24) -2.60 (16.21) -35.32 (15.06)* -37.77 (15.23)* Professional development on mathematics/science content (no) 19.53 (10.85) 15.17 (10.60) 9.96 (12.76) 11.64 (11.14) Professional development on mathematics/science pedagogy (no) 13.29 (11.26) 14.85 (9.46) -2.53 (10.00) -1.60 (9.00) Principal years of experience -0.13 (0.79) -0.29 (0.71) -0.20 (0.66) 0.15 (0.60) Principal highest level of education (ISCED levels 7 & 8 — master’s or doctorate degree)c -3.76 (15.01) 4.78 (12.47) -11.91 (10.93) -13.19 (9.41) Principal qualification in educational leadership (no) -4.76 (13.02) -6.06 (10.83) 5.72 (9.43) 7.28 (8.66) School library (no) 19.09 (10.82) 15.10 (10.29) 4.69 (8.59) 7.65 (7.90) Interaction terms (reference category) Student gender*Safe and orderly schools -16.25 (5.13)** -15.61 (4.93)** Student gender*Student absenteeism (never or almost never) — once a week 22.88 (10.44)* Student gender*Teacher age (40 years or older) — 29 years or younger 89.46 (30.62)** Student gender*Literacy and numeracy readiness for school -6.29 (2.57)* Student gender*School location (urban) — suburban/medium size city or large town 75.76 (15.57)*** Loglikelihood (H0) -8549.92 -8533.11 -12381.76 -12361.03 Fit statistics AIC 17193.84 17166.23 24857.51 24822.05 BIC 17442.83 17431.12 25123.27 25104.77 Note: Null mathematics model: Intercept (SE): 402.02 (4.44), ICC = 0.31, H0 = -32210.89, AIC = 64427.78, BIC = 64447.59. Null science model: Intercept (SE): 407.42 (4.88), ICC = 0.30, H0 = - 32704.43, AIC = 65414.86, BIC = 65434.67. *p < .05; **p < .01; ***p < .001. a Higher scores indicate less frequent bullying; b Other category: 16 minutes or more; c Other category: ISCED Level 6 — bachelor’s or equivalent level. 23 4.4 Findings for grade 8 In this section, the results from the final two steps (steps 6 and 7) of each model are presented for grade 8 mathematics and science. Results for all steps are shown in appendix A3. 4.4.1 Mathematics Table 7 presents the coefficients and model statistics for grade 8 mathematics. The gender difference in mathematics achievement remains statistically significant and only slightly smaller in magnitude (B = 16.3, p < .05) than the gender difference recorded in step 1 (B = 20.8, table 5), even after the addition of the selected conceptual blocks of predictor variables. Despite the relatively small effect of the predictor variables on the extent of the gender difference, the final model, including interactions, explains a substantial proportion of the observed variance in grade 8 mathematics achievement: 27 percent at level 1 and 74 percent at level 2, or 39 percent of the total observed variance. Student-level factors that were significantly associated with higher mathematics achievement among both boys and girls were: first-generation or second-generation immigrant status, greater access to home learning resources, infrequent absence from school (no more than once every two months), greater confidence in mathematics, lower liking of mathematics, and taking less time to complete mathematics homework (15 minutes or less). School-level factors that were significantly associated with higher mathematics achievement among both boys and girls were: a higher school-average level of home resources for learning across the student body, and mathematics teachers having a postgraduate qualification (master’s or doctorate) rather than a lower qualification. One significant interaction with student gender was observed for grade 8 mathematics. This interaction, involving schools’ emphasis on academic success, is illustrated in figure 9. The interaction term indicates that boys’ mathematics achievement increases to a greater degree than girls’ achievement in schools with stronger emphasis on academic success, relative to schools with a weaker emphasis on academic success. However, given that this difference was not substantial, this finding should be interpreted with caution. Figure 9. Interaction between student gender and school emphasis on academic success on mathematics achievement, grade 8 700 ITSEX_R Male 650 Female 600 550 Mathematics score 500 450 400 350 300 250 200 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 School emphasis on academic success Note: The plot presents the predicted values from step 7 of the mathematics model for grade 8 (table 7). 24 4.4.2 Science Table 7 also presents the coefficients and model statistics for grade 8 science. The gender difference in science achievement remains statistically significant and substantial (B = 39.9, p < .001) after the addition of the selected conceptual blocks of predictor variables. Nonetheless, the final model, including interactions, explains a substantial proportion of the observed variance in grade 8 science achievement: 27 percent at level 1 and 79 percent at level 2, or 41 percent of the total observed variance. Student-level factors that were significantly associated with higher science achievement among both boys and girls were: first-generation or second-generation immigrant status, greater access to home learning resources, infrequent absence from school (no more than once every two months), greater confidence in science, and taking less time to complete science homework (15 minutes or less). School- level factors that were significantly associated with higher science achievement among both boys and girls were: a higher school-average level of home resources for learning across the student body, infrequent teacher absenteeism (regarded by principals as not a problem), and science teachers whose qualification was in an area other than science or science education. Two significant interactions with student gender were observed. These interactions, involving students’ confidence in science and teachers’ age, are illustrated in figures 10 and 11. The first interaction term indicates that higher levels of confidence in science are linked with greater gains in science achievement among grade 8 boys relative to grade 8 girls. The second interaction term indicates that boys’ achievement in grade 8 science is higher when taught by older teachers, whereas girls’ achievement is higher in classes taught by younger teachers (ages 30–39 years old) than in classes taught by older teachers. Figure 10. Interaction between student gender and student confident in science on science achievement, grade 8 700 ITSEX_R Male 650 Female 600 550 Science score 500 450 400 350 300 250 200 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Student confident in science Note: The plot presents the predicted values from step 7 of the science model for grade 8 (table 7). 25 Figure 11. Interaction between student gender and teacher age on science achievement, grade 8 700 ITSEX_R Male 650 Female 600 550 Science score 500 450 400 350 300 250 200 30-39 years 40 years or older Teacher age Note: The plot presents the predicted values from step 7 of the science model for grade 8 (table 7). Only the reference category and the category for which a statistically significant interaction with student gender was found are presented. 26 Table 7. Steps 6 and 7 from the hierarchical two-level linear regression models for mathematics and science achievement, grade 8, TIMSS 2019 Mathematics Science Step 6 Step 7 Step 6 Step 7 student-level (%) 26.7 27.3 25.5 27.2 R2 class/school-level (%) 64.0 74.4 69.8 79.0 Intercept (SE) 367.09 (23.42) 310.38 (36.78) 441.38 (19.49) 426.88 (24.75) Student-level variables (reference category) B (SE) B (SE) B (SE) B (SE) Student gender (male) 16.29 (6.98)* 90.11 (35.49)* 39.89 (7.52)*** 74.88 (25.07)** Student immigration status (native) second-generation immigrant students 17.59 (7.92)* 17.51 (7.97)* 31.81 (8.36)*** 31.84 (8.36)*** first-generation immigrant students 31.83 (9.37)** 31.92 (9.33)** 43.75 (9.67)*** 43.76 (9.57)*** Home educational resources 4.09 (1.03)*** 4.10 (1.03)*** 5.39 (1.40)*** 5.44 (1.40)*** Student owns mobile phone (no) 4.20 (5.13) 3.80 (5.07) -0.32 (6.32) -1.32 (6.53) Student absenteeism (never or almost never) once every two months -8.73 (5.81) -9.10 (5.78) -5.55 (6.26) -5.16 (6.26) once a month -19.47 (5.06)*** -20.02 (5.20)*** -13.34 (5.34)* -12.49 (5.24)* once every two weeks -29.92 (4.84)*** -30.31 (4.83)*** -10.40 (7.47) -10.19 (7.49) once a week -44.23 (6.40)*** -44.52 (6.53)*** -46.77 (6.44)*** -47.10 (6.46)*** Student likes learning mathematics/science -4.30 (1.57)** -4.35 (1.57)** -0.85 (1.17) -0.79 (1.18) Student confident in mathematics/science 15.24 (1.27)*** 15.26 (1.27)*** 11.35 (1.54)*** 14.09 (2.02)*** Student sense of school belonging -1.03 (1.12) -1.00 (1.12) -1.20 (1.45) -1.15 (1.45) Bullyinga -0.42 (0.91) -0.43 (0.91) 1.49 (1.21) 1.32(1.21) Time spent on mathematics/science homework (15 minutes or less)b -11.35 (3.98)** -11.40 (4.01)** -12.37 (3.99)** -11.94 (3.99)** Class/school-level variables (reference category) School mean of home educational resources 25.97 (5.32)*** 26.91 (5.29)*** 20.23 (4.66)*** 18.59 (4.37)*** School location (urban) suburban/medium size city or large town 6.64 (6.19) 9.63 (6.22) -6.53 (5.96) -7.84 (5.70) small town or village/remote rural 2.05 (11.06) 4.58 (11.13) 3.12 (8.44) -3.79 (8.53) Teacher age (40 years or older) 29 years or younger -20.08 (19.13) -23.36 (19.05) 9.78 (13.38) 3.56 (12.88) 30–39 years -5.54 (11.15) -6.51 (11.35) -2.14 (6.47) -20.34 (7.79)** Safe and orderly schools 0.87 (1.68) 0.87 (1.62) 2.29 (1.40) 1.96 (1.44) School emphasis on academic success 1.80 (1.70) 5.76 (2.48)* 0.07 (1.24) 0.46 (1.29) School discipline -2.60 (1.92) -3.11 (1.93) 0.64 (1.14) 0.13 (1.13) Teacher years of experience -0.49 (0.88) -0.60 (0.88) 0.54 (0.48) 0.54 (0.51) 27 Class/school-level variables (reference category) B (SE) B (SE) B (SE) B (SE) Teacher job satisfaction -1.07 (2.18) -1.70 (2.15) -3.19 (2.47) -3.35 (2.36) Time assigned to mathematics/science homework (15 minutes or less)b 10.39 (6.03) 13.77 (6.21)* -4.98 (6.32) -7.03 (5.91) Poor teacher timekeeping (not a problem) minor problem 7.38 (8.34) 7.72 (7.64) 2.15 (8.32) 0.82 (7.67) moderate or serious problem 1.67 (15.21) 10.83 (14.07) 13.22 (11.46) 12.78 (10.64) Teacher absenteeism (not a problem) minor problem -3.46 (7.99) -1.30 (7.43) 1.19 (6.79) 2.17 (6.24) moderate or serious problem -20.00 (13.91) -28.83 (14.24)* -17.19 (8.26)* -16.62 (7.64)* Teacher highest level of education (Up to ISCED level 6 — bachelor’s or equivalent level)c 54.32 (15.71)** 67.81 (17.26)*** -3.91 (13.17) -8.79 (12.29) Teacher major area of study (mathematics/science and mathematics/science education) mathematics/science but not mathematics/science education -5.55 (7.31) -5.02 (7.25) -11.49 (6.62) -11.88 (6.43) all other majors -0.44 (9.16) -5.21 (9.65) 22.96 (10.73)* 20.62 (9.38)* Professional development hours on mathematics/science (more than 35 hours) 16–35 hours -14.62 (8.62) -18.01 (8.67)* -3.53 (7.63) -3.66 (7.06) 6–15 hours -0.91 (7.97) -2.98 (7.67) -0.71 (6.72) -4.61 (6.57) less than 6 hours -9.99 (8.94) -11.23 (8.69) -0.04 (8.80) 3.27 (8.49) none -19.37 (11.70) -24.04 (12.16)* 3.65 (15.01) 6.03 (14.52) Professional development on mathematics/science content (no) 4.33 (5.31) 5.13 (5.06) 8.62 (7.78) 11.07 (7.34) Professional development on mathematics/science pedagogy (no) -0.02 (7.05) 1.28 (6.95) -0.02 (8.96) 0.27 (7.98) Principal years of experience -0.24 (0.38) -0.34 (0.39) -0.09 (0.35) -0.19 (0.35) Principal highest level of education (ISCED levels 7 & 8 — master’s or doctorate degree)d -4.32 (11.75) 1.36 (11.86) 4.73 (13.27) 6.23 (13.36) Principal qualification in educational leadership (no) 12.85 (8.06) 11.58 (7.68) 3.02 (6.33) -0.05 (6.46) School library (no) 9.32 (6.98) 9.76 (6.95) 6.66 (6.20) 9.22 (5.81) Interaction terms (reference category) Student gender*School emphasis on academic success -6.43 (3.04)* Student gender*Student confident in science -4.88 (1.91)* Student gender*Teacher age (40 years or older) — 30–39 years 32.15 (10.20)** Loglikelihood (H0) -12113.38 -12109.54 -13016.61 -13004.99 Fit statistics AIC 24316.77 24311.07 26123.23 26103.99 BIC 24572.77 24572.76 26381.44 26373.68 Note: Null mathematics model: Intercept (SE): 394.83 (2.88), ICC = 0.25, H0 = -32297.78, AIC = 64601.56, BIC = 64621.49. Null science model: Intercept (SE): 431.76 (3.69), ICC = 0.27, H0 = - 33003.98, AIC = 66013.96, BIC = 66033.89. *p < .05; **p < .01; ***p < .001. a Higher scores indicate less frequent bullying; b Other category: 16 minutes or more; c Other category: ISCED levels 7 & 8 —master’s or doctorate degree; d Other category: ISCED Level 6 — bachelor’s or equivalent level. 28 5. Discussion The results presented in the previous section shed some light on the factors that are associated with mathematics and science achievement in Saudi Arabia, and on the factors that contribute to the large observed differences in achievement between boys and girls. Although there was variation across the four sets of multilevel models in terms of which variables were associated with student achievement when considered simultaneously, some consistency was also evident. Such consistency should help identify key factors for educators and policy makers in Saudi Arabia to consider as part of broader efforts to raise levels of achievement in elementary and intermediate schools. Summarized below are the most important findings of the analyses presented in the results section and some of the broader issues arising from these findings. In particular, attention is drawn to the most robust findings with the clearest implications for educators and policy makers in Saudi Arabia. 5.1 Summary of main findings The results of the analysis described in this paper suggest that, at the elementary level, early literacy and numeracy skills, student absenteeism, and school climate contribute to the observed gender gap in student performance in Saudi Arabia. A significant interaction was noted between student gender and students’ early literacy and numeracy readiness for school in predicting grade 4 science achievement. Specifically, boys’ science achievement in grade 4 was particularly low, relative to girls’, among students with poorly developed preschool literacy and numeracy skills (as reported by their parents). In addition, the findings showed that boys in grade 4 appear to be particularly disadvantaged by disorderly or unsafe school environments. Substantial gender differences in favor of girls among students attending disorderly schools, in both mathematics and science, are reduced to minor differences among students attending highly safe and orderly schools. Similarly, while all four sets of models showed a negative association between frequent student absenteeism and achievement, the association was particularly strong among grade 4 boys. Overall, several of the variables examined were found to be significantly associated with both mathematics and science achievement in grade 4, for both boys and girls. Higher scores in mathematics and science were associated with several factors, including students: (a) having stronger early literacy and numeracy skills upon starting primary school, (b) liking mathematics or science, (c) being more confident in mathematics or science, (d) being present at school more regularly, and (e) having a lower frequency of bullying. The set of factors most consistently associated with achievement at grade 4 are predominantly at the student level and drawn mostly from the first two conceptual blocks entered into the models: the home background and student engagement and attitudes. Similarly, several variables were found to be significantly associated with both mathematics and science achievement in grade 8. However, the pattern of common variables is somewhat different between the two grade levels. Among grade 8 students, higher scores in mathematics and science were associated with students’: (a) immigration status, (b) access to more learning resources at home, (c) higher confidence in mathematics or science, (d) regular presence in school, and (e) enrollment in a school where students have a higher average level of home learning resources. Similar to the findings at grade 4, each of these variables was part of the first two conceptual blocks in the models (the home background and student engagement and attitudes), with 4 of the 5 being student-level factors. 5.2 Accounting for observed gender differences in achievement in Saudi Arabia Student gender remained a significant predictor of science achievement in grade 4, and both mathematics and science achievement in grade 8, even after accounting for the effects of the other predictors. Although the gender difference in achievement is partially accounted for by the modeled variables—leading to a reduction in the ‘remaining’ or residual gender difference in all models—grade 4 29 mathematics was the only one of the four sets of models where the final gender difference was no longer statistically significant. This implies that other factors, not examined in the models, contribute to the substantial residual gender difference in grade 4 science and grade 8 mathematics and science. One possibility is that selection effects could be driving the observed differences — that is, if only high- achieving girls attend school or sit for assessments, but most boys do so, there is a possibility of bias, such that girls’ average achievement would appear inflated. However, as enrollment in primary and intermediate education in Saudi Arabia is almost universal among both genders, selection effects are unlikely to be playing a role in this analysis. Given evidence from other studies, it is likely that differences in reading proficiency play a role in explaining at least part of the remaining gender differences. This is particularly so in relation to science achievement, where test items are by necessity embedded in a context that often requires a greater degree of reading comprehension. Differences in reading proficiency between boys and girls might also contribute to explaining the remaining gender differences observed here in mathematics achievement in grade 8, given that grade 8 mathematics assessments require more reading skills compared to mathematics in early grades. For example, test items assessing applied reasoning or problem-solving skills in grade 8 are more likely to be embedded in a short scenario requiring some level of reading. International assessments in recent years have consistently shown that reading achievement is, on average, substantially lower in Saudi Arabia than in many other countries, both among grade 4 students (Mullis et al. 2012; 2017) and among 15-year-old students (OECD 2019). Moreover, in Saudi Arabia, the reading achievement gap in which girls outperform boys is among the largest gender differences in the world. Differences between boys and girls in reading proficiency have been found to exceed half a standard deviation in both the PIRLS and PISA studies (Mullis et al. 2017; OECD 2019). Previous research on TIMSS mathematics and science items has shown that items with a higher reading load (those requiring a higher volume of reading or more complex reading skills) tend to be more difficult for students to answer correctly than items with a lower reading load, and also that weaker readers tend to be disproportionately disadvantaged by a higher reading load (Mullis, Martin, and Foy 2013). For this reason, the magnitude and consistency of Saudi boys’ relative disadvantage in reading, seen across various studies, seems likely to play a role in contributing to their poorer results found here in mathematics and science achievement even after accounting for a range of contextual variables. The NALO 2018 results provide further support for this view, with boys at both grade levels being more likely than girls to report not having read a book (although it should be noted that this was the case even for a substantial minority of girls). The proposed importance of reading skills in underpinning mathematics and scientific achievement is consistent with the pattern of residual (unexplained) variance reported in the models, which indicates a role for other factors operating largely at the student level. After accounting for a range of other student- and class/school-level factors, the majority (approximately three-quarters) of class/school-level variance was explained in the models, whereas a majority of student-level variance remained unexplained. This suggests that the residual gender differences in achievement are likely to be associated more strongly with student-level factors, such as reading skills, social and behavioral skills, or aspects of the home background, than with additional school-level factors. The finding from NALO 2018 that more boys than girls have repeated a grade at school because of poor academic performance is worth noting in this regard. Similar data on the extent of grade repetition are available from PISA 2018 (OECD 2019), where 13.0 percent of 15-year-old boys in Saudi Arabia reported repeating at least one grade, compared to 9.8 percent of girls. These findings hint at the likelihood that early disadvantages and difficulties with learning in the early grades may compound over time, and that there is a need for stronger learning supports for students with special educational needs to enable progression through the education system. Although this issue affects both boys and girls in Saudi Arabia, the figures from NALO and PISA indicate that such compounding educational disadvantage is more clearly apparent among boys. 30 Teaching quality is another factor that appears to be associated with gender differences in achievement in Saudi Arabia. Female entrants to the teaching profession in Saudi Arabia tend to score higher than their male counterparts on the teacher licensure examination. This is consistent with evidence from other countries, which shows that the teaching profession attracts more high-ability female teachers than male teachers (Carroll, Parasnis, and Tani 2021; Corcoran, Evans, and Schwab 2004). The differences in abilities between female and male teachers could be explained, in part, by the gender differences in returns to education across occupations (World Bank 2012; Cortes and Pan 2018). Research on teacher labor markets has shown that the opportunity cost of becoming a teacher is lower for women than men, due primarily to the limited occupational opportunities for women outside the field of education (Carroll, Parasnis, and Tani 2021). Additionally, teaching is traditionally a preferable profession for women in Saudi Arabia. The competition among female graduates for teaching jobs in Saudi Arabia is much higher than the competition among males. From the demand side, this implies a higher probability of selecting cognitively talented teachers from female graduates than from male graduates. The analyses presented in this paper have accounted for a high proportion of the observed variance in mathematics and science achievement in Saudi Arabia (ranging from 39 percent to 41 percent across subject domains and grade levels). Notably, these models largely account for the portion of variance in achievement that can be attributed to the class/school level. This suggests that policy makers may reasonably hope that focusing their attention on improving the class/school-level issues identified here (e.g., safe and orderly school climate, support for academic achievement, teacher attendance and timekeeping) would contribute to creating an education system that promotes higher levels of student achievement. However, the fact that the majority of variance in achievement (70–75 percent) is attributable to student-level factors means that policy makers will also have to look at the home environment and broader society, as well as the school environment, in order to raise levels of achievement and close the (currently very wide) gaps in achievement between boys and girls in Saudi Arabia. This student-level variance has been partly explained by variables examined here (e.g., students’ early literacy and numeracy skills, attitudes toward mathematics and science, experiences of bullying, and level of access to home resources for learning), but significant variance remains unexplained. 5.3 Limitations The conclusions that can be drawn from these analyses are limited to being correlational in nature, as TIMSS and NALO are both cross-sectional studies. This means that it would be incorrect to claim on the basis of these results alone that changes in any of the included variables will lead to corresponding changes in student achievement. In some cases, the findings presented here are clearly consistent with theoretical expectations and evidence from other settings—for example, that promoting more regular attendance at a school with a learning-supportive climate may reasonably be expected to have positive implications for student learning. Nonetheless, readers should be aware that the model results need to be interpreted cautiously and with due regard to the wider theoretical and empirical literature. Informed decisions should be based on a broad reading of the literature and the evidence base, including the new results presented in this paper, rather than on any single study. The results of the models hint at the importance of teachers and teaching quality as contributing factors to student outcomes. However, the strength of any conclusions drawn in this paper related to teaching are constrained by limitations in the available data. For example, the TIMSS variable describing teachers’ qualifications at grade 4 was omitted from analysis due to an error identified in the Saudi Arabia data set for TIMSS 2019. TIMSS collects some data related to teachers and classroom practices but more detailed analysis on teacher quality would be possible with research studies and, thus, data sets more focused on this topic. Finally, although the highly gender-segregated structure of the education system in Saudi Arabia presents an opportunity to examine the educational environments experienced by boys and girls in 31 relative isolation, this same feature also imposes analytic constraints. As there are no cases in the available data of boys and girls taught in the same classes, boys taught by female teachers, or girls taught by male teachers, it is impossible to disentangle gendered differences in learning outcomes from other factors that covary completely with students’ gender. For example, in these data sets boys in Saudi Arabia are universally taught by male teachers, who, in turn, received their education and teaching qualifications from all-male institutions, which may differ in important ways from the institutions attended by girls and female teachers. The ongoing rollout of a scheme to assign female teachers to boys in the early grades, as described earlier, will provide opportunities in future to reexamine outcomes among boys and girls while controlling for teacher characteristics to a greater degree. 6. Conclusions and Implications The findings of this study point to the relevance of the school climate in understanding the current gender differences in achievement observed in Saudi Arabia. Although previous research points to the value to students of a stable and supportive school climate in general (Nilsen et al. 2016; Reynolds et al. 2014), the results of this study indicate that boys in Saudi Arabia may be especially impacted by a negative or unstable school environment. Most notably, the presence of a safe and orderly school climate for grade 4 students, and a supportive climate for academic success for grade 8 students, are particularly associated with higher achievement for boys relative to attending less orderly or less supportive schools. School principals, teachers, and other educators should be cognizant of the importance of these factors and should take active steps to build and maintain positive school and classroom environments where students feel safe, connected, and positively challenged to learn and think. Where these conditions are not present, student learning is likely to be impeded. This is especially the case for boys, who may require a greater degree of behavioral support and guidance from adults to engage fully with schoolwork in a structured classroom setting in a single-gender school environment. Where such support and guidance are lacking, boys appear to fall behind in their learning and are at risk of being held back for a year to a greater degree than girls who similarly lack a positive school climate. This may be related to gendered differences in societal expectations (Ridge and Jeon 2020) and, as indicated by the NALO data, greater support for learning for girls at home (Ridge and Jeon 2020). Teachers can help to create positive learning environments and encourage active student participation in their learning by, for example, integrating students’ interests into the lesson material where possible and remaining alert to the effects of stereotypes, such as boys being more suited than girls to science and mathematics, and girls being more suited to reading, that may negatively affect how students engage with lessons and how teachers communicate with their students (Brozo et al. 2014; OECD 2015). It is also important that lessons are challenging but at a level that students can realistically engage with and understand; where basic prerequisite learning has not been solidified, teachers are likely to find themselves covering more advanced topics with limited student engagement or understanding (Niemiec and Ryan 2009). Other practices that teachers can integrate into their teaching in order to create a positive learning environment include offering students choices, providing rationales for decisions made or where choices cannot be offered, encouraging students to ask questions and to offer their perspectives, listening to and acknowledging students’ contributions, and offering constructive feedback on how students can improve (Teixeira et al. 2020). A supportive school environment is important for boys’ learning, but support for learning in the home is also crucial. By the time students begin attending school, they have been growing, developing, and learning at home and in the community for several years already. The TIMSS data show that boys in Saudi Arabia tend to begin school with weaker early literacy and numeracy skills than girls. More than that, the models indicate that boys’ science achievement is more strongly associated with their early literacy and numeracy skills compared to girls’. Boys who begin school with weak early literacy and 32 numeracy skills tend to have considerably lower science achievement than their female counterparts with equivalent early literacy and numeracy skills by grade 4, while science achievement of boys and girls with stronger early literacy and numeracy skills tends to be similar. In other words, boys who begin school at an early learning disadvantage to their peers are further disadvantaged as they progress through the education system and appear to be at more risk of falling behind than girls who begin school with weaker early skills. This can also be seen in students’ reports, in NALO, that boys are more likely to repeat a year in school because of poor academic performance. It is important that parents are aware that early childhood development lays a foundation for future education, health, well-being, and economic success. Public health and education agencies should promote awareness among parents and provide guidance and resources to encourage greater engagement in early learning in the family. For example, simple activities that can contribute to a child’s early literacy and numeracy development could include reading together, describing a scene in everyday life, counting everyday objects or singing counting songs, and using mathematical and spatial language while playing with shapes or other objects (e.g., “behind”, “above”, “beside”, “straight”, “curved”, “double”). Data from TIMSS 2019 show that parents in Saudi Arabia report engaging in activities of these types less frequently with young boys than young girls. Taking steps to increase the level of support for early childhood learning at home for boys, in particular, would likely lead to a stronger foundation in the future for boys starting school and to greater progress in learning among boys. Cultural and social barriers that contribute to low enrollment of young children in kindergarten—for example, social expectations relating to motherhood and childrearing at home—also need to be considered in this respect. It is noteworthy that, despite the differences seen across the four sets of models, two variables were found to be significantly associated with both mathematics and science achievement at both grade levels. These were students’ confidence in mathematics or science (positively associated with achievement in all cases) and students’ reported levels of absenteeism (more frequent absences being negatively associated with achievement in all cases). The consistency of these findings demands attention from Saudi Arabia’s education community. In particular, student absenteeism, as an issue that is likely more responsive to policy making compared to others (e.g., students’ socio-economic status), should be considered carefully. The analyses presented in this paper have shown that student absenteeism in Saudi Arabia is widespread, frequent, and consistently associated with achievement in at least two key areas of study (mathematics and science), at both elementary and intermediate school levels. In many countries, student absenteeism is relatively rare and structures are in place to monitor and promote regular attendance at school. These structures can encompass both informal channels (e.g., between the schoolteacher or principal and the child’s parents) and formal channels (e.g., formal communication between the school and the home or, in more extreme cases, a state agency tasked to follow up to ensure minimal levels of attendance at school). The frequency of absenteeism for many students in Saudi Arabia, coupled with the likely negative implications of regular absenteeism for achievement, suggest that Saudi Arabia’s policy makers should study efforts in other countries to combat absenteeism (e.g., Knoster 2016; Rogers and Vegas 2009) and consider how similar approaches could be usefully adapted to the local context. A similar problem is apparent with the teaching workforce in Saudi Arabia’s schools. A substantial proportion of school principals, at both elementary and intermediate levels, indicated that teacher absenteeism and poor teacher timekeeping (teachers arriving late to school or leaving early) are problems in their schools. This is consistent with previous research indicating that teachers in Saudi Arabia’s schools often lack enthusiasm for the profession and are poorly motivated (OECD 2020).. Without taking steps to ensure that teachers are both highly skilled and present and engaged in teaching during scheduled working hours, students will continue to be at risk of failing to reach their full potential as a result of failures in school management practices. Other initiatives that may be taken to, for example, build supportive school climates, are likely to be limited as long as they are undermined by poor teacher 33 attendance at school and lack of teacher enthusiasm (in itself, a contributory factor to a school environment that is not conducive to student learning). Teacher training represents another area for improvement. Results from this paper show that despite male teachers’ greater exposure to education during initial training and their higher qualifications compared to female teachers, 10 boys in Saudi schools achieve much poorer outcomes than girls in both mathematics and science. Although holding higher qualifications does not necessarily suggest a higher standard of teaching or improved student performance (Harris and Sass 2011), especially when the focus of the qualification is unknown, these patterns may signal the poor quality of teacher education and training. Further study of these dynamics as they relate to student outcomes would be useful. In general, efforts to raise educational achievement in Saudi Arabia require taking a broader view beyond the necessary focus on schools and teachers. As noted above, early child development (physical, cognitive, social) and early learning provide foundations for achievement in elementary and intermediate school, and beyond. Ongoing support for learning at home throughout childhood is also crucial, including modeling of positive behaviors (e.g., reading) and involvement in children’s education by their parents. 6.1 Suggestions for further research Future efforts to explain the observed differences between boys’ and girls’ achievement in Saudi Arabia should, if possible, seek to include a broader range of out-of-school factors in the analysis than were possible with the TIMSS 2019 data set. For example, some variables that are not available in TIMSS 2019 or NALO 2018 but that could be usefully considered in a future analysis of gender differences in achievement include students’ reading proficiency; students’ engagement in reading for leisure; and the (gendered) nature of parents’ expectations and aspirations for their child’s education, qualifications, and future career. In particular, considering the importance of literacy as a foundational skill (see also Gregory et al. 2021), the inclusion of an indicator of reading achievement would help control for gender differences relating to literacy levels and would allow more fine-grained examination of mathematical and scientific proficiency. Among international assessments, data from PISA at intermediate level or from a joint TIMSS and PIRLS assessment (such as TIMSS/PIRLS 2011) at primary level could be used for this purpose. At the national level, an administration of NALO that assessed reading as well as mathematics or science from the same students could also be used. Given that the majority of unexplained variance in the models presented in this paper was at the student level, extending future analyses in this way should provide further useful insights. As noted above (section 5.3), a focused examination of teaching quality in Saudi Arabia — incorporating teacher characteristics, quality of teacher education, professional development, availability and use of resources, classroom management, professional collaboration, and pedagogy — would shed further light on some of the points raised in this paper. In particular, differences between the classroom environments of boys and girls, given the gender-segregated structure of the education system, merit closer inspection. Finally, it would be useful to extend the work presented in this paper by drawing on data from other countries. In the first instance, subsequent research could focus on countries with similar cultural contexts such as other countries with comparable international data within the Gulf or MENA regions. Such research could examine (a) the extent of similarity between observed gender differences in Saudi Arabia compared to other countries, and (b) similarities and differences in the factors associated with student outcomes in each national context. Further work could also usefully examine factors associated with gender differences in single-gender compared to mixed-gender schools (see figure 2), particularly in, but not limited to, the MENA region. 10 Male teachers are more likely than female teachers to report holding a master’s or doctorate-level qualification, as are principals of boys’ schools compared to principals of girls’ schools. 34 Acknowledgments The authors would like to thank Harry Patrinos, Georgios Sideridis, and Tarek Mostafa for their valuable comments on a draft version of this paper. The team is also grateful to Laura Gregory, Andreas Blom, and Yisgedullish Amde for their helpful feedback and support. Jee Yoon Lee edited the paper. References Angrist, Noam, Simeon Djankov, Pinelopi K. Goldberg, and Harry A. Patrinos. 2021. “Measuring Human Capital Using Global Learning Data.” Nature 592: 403–8. https://doi.org/10.1038/s41586-021- 03323-7. Autor, David, David Figlio, Krzysztof Karbownik, Jeffrey Roth, and Melanie Wasserman. 2016. “School Quality and the Gender Gap in Educational Achievement.” American Economic Review 106 (5): 289– 95. https://doi.org/10.1257/aer.p20161074. ———. 2019. “Family Disadvantage and the Gender Gap in Behavioral and Educational Outcomes.” American Economic Journal: Applied Economics 11 (3): 338–81. https://doi.org/10.1257/app.20170571. Bertrand, Marianne, and Jessica Pan. 2013. “The Trouble with Boys: Social Influences and the Gender Gap in Disruptive Behavior.” American Economic Journal: Applied Economics 5 (1): 32–64. https://doi.org/10.1257/app.5.1.32. Brozo, William G., Sari Sulkunen, Gerry Shiel, Christine Garbe, Ambigapthy Pandian, and Renate Valtin. 2014. “Reading, Gender, and Engagement.” Journal of Adolescent & Adult Literacy 57 (7): 584–93. https://doi.org/10.1002/jaal.291. Buchmann, Claudia, and Thomas A. DiPrete. 2006. “The Growing Female Advantage in College Completion: The Role of Family Background and Academic Achievement.” American Sociological Review 71 (4): 515–41. https://doi.org/10.1177/000312240607100401. Carroll, David, Jaai Parasnis, and Massimiliano Tani. 2021. “Why Do Women Become Teachers While Men Don’t?” The B.E. Journal of Economic Analysis & Policy 21 (2): 793–823. https://doi.org/10.1515/bejeap-2020-0236. Cohen, Jacob. 1988. Statistical Power Analysis for the Behavioral Sciences. 2nd ed. Hillsdale, NJ: Lawrence Erlbaum Associates. Corcoran, Sean P., William N. Evans, and Robert M. Schwab. 2004. “Women, the Labor Market, and the Declining Relative Quality of Teachers.” Journal of Policy Analysis and Management 23 (3): 449–70. https://www.jstor.org/stable/3326261. Cornwell, Christopher, David B. Mustard, and Jessica Van Parys. 2013. “Noncognitive Skills and the Gender Disparities in Test Scores and Teacher Assessments: Evidence from Primary School.” Journal of Human Resources 48 (1): 236–64. https://doi.org/10.3368/jhr.48.1.236. Cortes, Patricia, and Jessica Pan. 2018. “Occupation and Gender.” In The Oxford Handbook of Women and the Economy, edited by Susan L. Averett, Laura M. Argys, and Saul D. Hoffman, 424–52. New York: Oxford University Press. https://doi.org/10.1093/oxfordhb/9780190628963.013.12. Davier, Matthias von, Eugenio Gonzalez, and Robert J. Mislevy. 2009. “What Are Plausible Values and Why Are They Useful?” IERI Monograph Series: Issues and Methodologies in Large-Scale Assessments 2: 9–36. https://www.ierinstitute.org/fileadmin/Documents/IERI_Monograph/IERI_Monograph_Volume_0 2_Chapter_01.pdf. DiPrete, Thomas A., and Claudia Buchmann. 2013. The Rise of Women: The Growing Gender Gap in 35 Education and What It Means for American Schools. New York: Russell Sage Foundation. DiPrete, Thomas A., and Jennifer L. Jennings. 2012. “Social and Behavioral Skills and the Gender Gap in Early Educational Achievement.” Social Science Research 41 (1): 1–15. https://doi.org/10.1016/j.ssresearch.2011.09.001. Downey, Douglas B., and Anastasia S. Vogt Yuan. 2005. “Sex Differences in School Performance during High School: Puzzling Patterns and Possible Explanations.” The Sociological Quarterly 46 (2): 299– 321. https://doi.org/10.1111/j.1533-8525.2005.00014.x. Fortin, Nicole M., Philip Oreopoulos, and Shelley Phipps. 2015. “Leaving Boys Behind: Gender Disparities in High Academic Achievement.” Journal of Human Resources 50 (3): 549–79. https://doi.org/10.3368/jhr.50.3.549. Fritz, Catherine O., Peter E. Morris, and Jennifer J. Richler. 2012. “Effect Size Estimates: Current Use, Calculations, and Interpretation.” Journal of Experimental Psychology: General 141 (1): 2–18. https://doi.org/10.1037/a0024338. Gregory, Laura, Hanada Taha Thomure, Amira Kazem, Anna Boni, Mahmoud A. A. Elsayed, and Nadia Taibah. 2021. Advancing Arabic Language Teaching and Learning — A Path to Reducing Learning Poverty in the Middle East and North Africa. Washington, DC: World Bank. https://documents1.worldbank.org/curated/en/909741624654308046/pdf/Advancing-Arabic- Language-Teaching-and-Learning-A-Path-to-Reducing-Learning-Poverty-in-the-Middle-East-and- North-Africa.pdf. Harris, Douglas N., and Tim R. Sass. 2011. “Teacher Training, Teacher Quality and Student Achievement.” Journal of Public Economics 95 (7–8): 798–812. https://doi.org/10.1016/j.jpubeco.2010.11.009. Hattie, John A. C. 2009. Visible Learning: A Synthesis of over 800 Meta-Analyses Relating to Achievement. London, New York: Routledge. Jha, Jyotsna, and Sarah Pouezevara. 2016. Measurement and Research Support to Education — Strategy Goal I — Boys’ Underachievement in Education: A Review of the Literature with a Focus on Reading in the Early Years. Washington, DC: United States Agency for International Development (USAID). Knoster, Kevin C. 2016. Strategies for Addressing Student and Teacher Absenteeism: A Literature Review. Washington, DC: U.S. Department of Education, North Central Comprehensive Center. LaRoche, Sylvie, Marc Joncas, and Pierre Foy. 2020. “Sample Design in TIMSS 2019.” In Methods and Procedures: TIMSS 2019 Technical Report, edited by Michael O. Martin, Matthias von Davier, and Ina V. S. Mullis, 3.1-3.33. TIMSS & PIRLS International Study Center, Lynch School of Education and Human Development, Boston College, and International Association for the Evaluation of Educational Achievement (IEA). https://timssandpirls.bc.edu/timss2019/methods/chapter-3.html. Legewie, Joscha, and Thomas A. DiPrete. 2012. “School Context and the Gender Gap in Educational Achievement.” American Sociological Review 77 (3): 463–85. https://doi.org/10.1177/0003122412440802. Mullis, Ina V. S., Michael O. Martin, and Pierre Foy. 2013. “The Impact of Reading Ability on TIMSS Mathematics and Science Achievement at the Fourth Grade: An Analysis by Item Reading Demands.” In TIMSS and PIRLS 2011: Relationships among Reading, Mathematics, and Science Achievement at the Fourth Grade —Implications for Early Learning, edited by Michael O. Martin and Ina V. S. Mullis, 67–108. TIMSS & PIRLS International Study Center, Lynch School of Education, Boston College, and International Association for the Evaluation of Educational Achievement (IEA). https://timssandpirls.bc.edu/timsspirls2011/downloads/TP11_Chapter_2.pdf. Mullis, Ina V. S., Michael O. Martin, Pierre Foy, and Kathleen T. Drucker. 2012. PIRLS 2011 International Results in Reading. TIMSS & PIRLS International Study Center, Lynch School of Education, Boston 36 College, and International Association for the Evaluation of Educational Achievement (IEA). https://doi.org/10.1097/01.tp.0000399132.51747.71. Mullis, Ina V. S., Michael O. Martin, Pierre Foy, and Martin Hooper. 2017. PIRLS 2016 International Results in Reading. TIMSS & PIRLS International Study Center, Lynch School of Education, Boston College, and International Association for the Evaluation of Educational Achievement (IEA). Mullis, Ina V. S., Michael O. Martin, Pierre Foy, Dana L. Kelly, and Bethany Fishbein. 2020. TIMSS 2019 International Results in Mathematics and Science. TIMSS & PIRLS International Study Center, Lynch School of Education and Human Development, Boston College, and International Association for the Evaluation of Educational Achievement (IEA).Muthén, Linda K., and Bengt O. Muthén. 2017. “Mplus User’s Guide.” 8th ed. Los Angeles, CA: Muthén and Muthén. Niemiec, Christopher P., and Richard M. Ryan. 2009. “Autonomy, Competence, and Relatedness in the Classroom: Applying Self-Determination Theory to Educational Practice.” Theory and Research in Education 7 (2): 133–44. https://doi.org/10.1177/1477878509104318. Nilsen, Trude, Sigrid Blömeke, Kajsa Yang Hansen, and J. E. Gustafsson. 2016. “Are School Characteristics Related to Equity? The Answer May Depend on a Country’s Developmental Level.” Policy Brief 10, International Association for the Evaluation of Educational Achievement (IEA), Amsterdam, Netherlands. OECD (Organisation for Economic Co-operation and Development). 2015. The ABC of Gender Equality in Education: Aptitude, Behaviour, Confidence. PISA. Paris: PISA, OECD Publishing. https://doi.org/10.1787/9789264229945-en. ———. 2019. PISA 2018 Results (Volume I): What Students Know and Can Do. Paris: PISA, OECD Publishing. https://doi.org/10.1787/5f07c754-en. ———. 2020. Reviews of National Policies for Education: Education in Saudi Arabia. Paris: OECD Publishing. https://doi.org/10.1787/76df15a2-en. ———. 2021. TALIS: Positive, High-Achieving Students? What Schools and Teachers Can Do. Paris: OECD Publishing. https://doi.org/10.1787/3b9551db-en. Page, Elspeth, and Jyotsna Jha, eds. 2009. Exploring the Bias: Gender and Stereotyping in Secondary Schools. London: Commonwealth Secretariat. Patrinos, Harry A., and Noam Angrist. 2019. “Harmonized Learning Outcomes: Transforming Learning Assessment Data into National Education Policy Reforms.” World Bank Blogs. August 12. https://blogs.worldbank.org/opendata/harmonized-learning-outcomes-transforming-learning- assessment-data-national-education. Reynolds, David, Pam Sammons, Bieke De Fraine, Jan Van Damme, Tony Townsend, Charles Teddlie, and Sam Stringfield. 2014. “Educational Effectiveness Research (EER): A State-of-the-Art Review.” School Effectiveness and School Improvement 25 (2): 197–230. https://doi.org/10.1080/09243453.2014.885450. Ridge, Natasha, and Soohyun Jeon. 2020. “Father Involvement and Education in the Middle East: Geography, Gender, and Generations.” Comparative Education Review 64 (4): 725–48. https://doi.org/10.1086/710768. Rogers, Halsey F., and Emiliana Vegas. 2009. “No More Cutting Class? Reducing Teacher Absence and Providing Incentives for Performance.” Policy Research Working Paper 4847, World Bank, Washington, DC. https://doi.org/10.1596/1813-9450-4847. Rutkowski, Leslie, Eugenio Gonzalez, Marc Joncas, and Matthias von Davier. 2010. “International Large- Scale Assessment Data: Issues in Secondary Analysis and Reporting.” Educational Researcher 39 (2): 142–51. https://doi.org/10.3102/0013189X10363170. 37 Stromquist, Nelly. 2007. The Gender Socialization Process in Schools: A Cross-National Comparison (2008/ED/EFA/MRT/PI/71). Background paper prepared for the Education for All Global Monitoring Report 2008, Education for All by 2015: Will we make it? Teixeira, Pedro J., Marta M. Marques, Marlene N. Silva, Jennifer Brunet, Joan L. Duda, Leen Haerens, Jennifer La Guardia, et al. 2020. “A Classification of Motivation and Behavior Change Techniques Used in Self-Determination Theory-Based Interventions in Health Contexts.” Motivation Science 6 (4): 438–55. https://doi.org/10.1037/mot0000172. World Bank. 2012. World Development Report 2012: Gender Equality and Development. Washington, DC: World Bank. https://openknowledge.worldbank.org/handle/10986/4391. ———. 2019. Ending Learning Poverty: What Will It Take? Washington, DC: World Bank. https://openknowledge.worldbank.org/handle/10986/32553. ———. 2021. EdStats: Education Statistics (database). Washington, DC: World Bank. https://datatopics.worldbank.org/education/. Younger, Mike, and Mary Cobbett. 2014. “Gendered Perceptions of Schooling: Classroom Dynamics and Inequalities within Four Caribbean Secondary Schools.” Educational Review 66 (1): 1–21. https://doi.org/10.1080/00131911.2012.749218. 38 Appendix A1 – Supplementary data from NALO 2018 Table A1.1 Student reports of selected variables, by school gender Male (%) Female (%) φ/φc Repeated a year at school because of poor results 14.5 6.6 .129*** 0 22.6 12.0 1-5 27.8 33.3 How many books have you read? .148*** 6-10 15.7 14.4 Grade 4 >10 33.8 40.4 A lot 81.4 84.1 Math is important in life Somewhat 13.3 12.6 .051*** Never 5.3 3.3 Repeated a year at school because of poor results 6.9 4.5 .051*** 0 30.6 19.3 1-5 42.6 53.1 How many books have you read? .137*** 6-10 10.6 11.6 Grade 8 >10 16.2 16.1 A lot 47.6 36.3 Math is important in life Somewhat 38.6 48.0 .115*** Never 13.8 15.7 Note: *p < .05; **p < .01; ***p < .001. Table A1.2. Percentage of students not born in Saudi Arabia attending schools of various types (parent reports), by school gender Male (%) Female (%) φ/φc Government school 85.0 95.2 Grade 4 Quran school 1.1 2.2 .206*** Private school 13.9 2.6 Government school 87.7 96.3 Grade 8 Quran school 1.3 1.0 .168*** Private school 11.1 2.7 Note: *p < .05; **p < .01; ***p < .001. 39 Table A1.3. Percentage of teachers ‘always’ agreeing with selected statements, by school gender Male (%) Female (%) φ/φc Parents know the [academic] level of the student 32.8 62.8 .302*** Grade 4 Students respect their [academically] excellent classmates 38.2 47.0 .094*** Students are keen to [academically] excel in school 26.5 31.6 .071*** Parents know the [academic] level of the student 22.6 40.3 .236*** Grade 8 Students respect their [academically] excellent classmates 34.8 44.0 .115*** Students are keen to [academically] excel in school 21.5 16.5 .065*** Note: *p < .05; **p < .01; ***p < .001. Table A1.4 School leaders’ reports of selected variables, by school gender Male (%) Female (%) φ/φc There is a school library from which students can borrow 67.3 48.5 .224*** Parents’ expectations of students’ performance have been 63.6 74.3 .178*** Grade 4 achieved (agree or strongly agree) Parental support to improve students’ performance (high or very 44.1 51.5 .155*** high) There is a school library from which students can borrow 58.5 62.1 .059*** Parents’ expectations of students’ performance have been 49.1 59.8 .183*** Grade 8 achieved (agree or strongly agree) Parental support to improve students’ performance (high or very 40.2 50.7 .158*** high) Note: *p < .05; **p < .01; ***p < .001. 40 Appendix A2 Table A2.1. TIMSS cut-scores for categories of continuous indices, TIMSS 2019 Index lowest category cut-score middle category cut-score highest category Home resources for learning few resources 7.4 some resources 11.8 many resources Literacy and numeracy readiness for school not ready 8.6 moderately ready 11.2 very ready Student likes learning mathematics 8.4 10.2 do not like somewhat like very much like Student likes learning science 7.6 9.7 Student confident in mathematics 8.5 10.7 not confident somewhat confident very confident Student confident in science 8.2 10.2 little sense of school some sense of school high sense of school Student sense of school belonging 7.2 9.6 Grade 4 belonging belonging belonging Bullying about weekly 7.4 about monthly 9.2 never or almost never somewhat safe and Safe and orderly schools less than safe and orderly 6.8 9.9 very safe and orderly orderly School emphasis on academic success medium emphasis 9.2 high emphasis 13.0 very high emphasis moderate to severe School discipline 7.6 minor problems 9.7 hardly any problems problems Teacher job satisfaction less than satisfied 6.5 somewhat satisfied 10.1 very satisfied Home educational resources few resources 8.4 some resources 12.2 many resources Student likes learning mathematics 9.4 11.4 do not like somewhat like very much like Student likes learning science 8.3 10.6 Student confident in mathematics 9.5 12.1 not confident somewhat confident very confident Student confident in science 9.2 11.3 little sense of school some sense of school high sense of school Student sense of school belonging 7.8 10.7 belonging belonging belonging Grade 8 Bullying about weekly 7.2 about monthly 8.8 never or almost never somewhat safe and Safe and orderly schools less than safe and orderly 7.3 10.5 very safe and orderly orderly School emphasis on academic success medium emphasis 9.6 high emphasis 13.1 very high emphasis moderate to severe School discipline 8.0 minor problems 10.8 hardly any problems problems Teacher job satisfaction less than satisfied 6.8 somewhat satisfied 10.2 very satisfied 41 Appendix A3 Table A3.1. Hierarchical two-level linear regression model for mathematics achievement, grade 4, TIMSS 2019 Step 1 Step 2 Step 3 Step 4 Step 5 Step 6 Step 7 student-level (%) 1.3 6.7 19.9 20.3 20.7 21.0 26.2 R2 class/school-level (%) 12.9 9.7 13.7 37.7 54.9 71.4 Intercept (SE) 392.48 (6.83) 406.84 (7.59) 424.25 (7.58) 430.25 (8.21) 425.93 (17.05) 396.29 (21.00) 296.27 (37.86) Student-level variables B (SE) β (SE) B (SE) β (SE) B (SE) β (SE) B (SE) β (SE) B (SE) β (SE) B (SE) β (SE) B (SE) β (SE) (reference category) 19.73 0.23 7.68 0.09 -7.42 -0.09 -17.55 -0.21 -16.21 -0.19 -6.71 -0.08 164.98 1.87 Student gender (male) (8.67)* (0.10)* (7.95) (0.09) (7.98) (0.10) (8.99) (0.11) (10.50) (0.12) (11.92) (0.14) (54.64)** (0.59)** Student immigration status (native) second-generation 3.42 0.04 4.91 0.06 6.19 0.07 11.95 0.14 15.40 0.18 14.58 0.17 immigrant students (8.36) (0.10) (8.28) (0.10) (8.55) (0.10) (10.66) (0.13) (11.36) (0.13) (11.44) (0.13) first-generation 48.05 0.57 30.19 0.36 28.08 0.33 30.49 0.36 26.74 0.31 26.53 0.30 immigrant students (12.55)*** (0.15)*** (13.04)* (0.16)* (13.44)* (0.16)* (15.55) (0.18) (17.41) (0.20) (17.24) (0.20) Home resources for 3.67 0.06 2.19 0.04 2.53 0.04 4.09 0.07 4.35 0.07 4.64 0.08 learning (1.59)* (0.03)* (1.68) (0.03) (1.90) (0.03) (1.95)* (0.03)* (2.11)* (0.04)* (2.11)* (0.03)* Student owns mobile -6.36 -0.08 -5.79 -0.07 -3.27 -0.04 -1.55 -0.02 -0.10 0.00 0.10 0.00 phone (no) (3.65) (0.04) (3.71) (0.04) (4.02) (0.05) (4.33) (0.05) (4.70) (0.06) (4.51) (0.05) Literacy and numeracy 8.54 0.20 6.44 0.15 6.37 0.15 7.60 0.18 7.57 0.17 7.57 0.17 readiness for school (1.15)*** (0.03)*** (1.13)*** (0.03)*** (1.10)*** (0.03)*** (1.26)*** (0.03)*** (1.44)*** (0.03)*** (1.43)*** (0.03)*** Preschool attendance and duration (did not attend) 0.63 0.01 1.55 0.02 3.22 0.04 -1.45 -0.02 1.51 0.02 0.38 0.00 1 year or less (4.41) (0.05) (4.31) (0.05) (4.84) (0.06) (6.21) (0.07) (6.97) (0.08) (6.81) (0.08) 0.98 0.01 0.94 0.01 -0.52 -0.01 -7.03 -0.08 -6.20 -0.07 -8.60 -0.10 2 years (5.92) (0.07) (5.61) (0.07) (6.26) (0.07) (8.65) (0.10) (8.84) (0.10) (8.84) (0.10) 7.94 0.09 12.75 0.15 17.52 0.21 11.22 0.13 15.10 0.18 14.00 0.16 3 years or more (7.40) (0.09) (7.10) (0.08) (7.39)* (0.09)* (8.52) (0.10) (9.06) (0.11) (8.81) (0.10) Student absenteeism (never or almost never) once every two 6.95 0.08 5.58 0.07 -1.03 -0.01 5.36 0.06 4.74 0.05 months (6.27) (0.08) (6.57) (0.08) (9.02) (0.11) (9.85) (0.12) (9.69) (0.11) -1.82 -0.02 0.45 0.01 7.61 0.09 7.54 0.09 7.21 0.08 once a month (5.32) (0.06) (5.91) (0.07) (7.90) (0.09) (8.60) (0.10) (8.49) (0.10) -16.69 -0.20 -18.18 -0.22 -18.31 -0.22 -20.34 -0.24 -21.87 -0.25 once every two weeks (7.09)* (0.08)* (7.34)* (0.09)* (8.51)* (0.10)* (8.88)* (0.10)* (8.85)* (0.10)* -28.14 -0.33 -26.40 -0.31 -23.86 -0.28 -23.53 -0.28 -34.61 -0.39 once a week (5.28)*** (0.06)*** (5.41)*** (0.06)*** (6.20)*** (0.07)*** (7.35)** (0.08)** (8.67)*** (0.10)*** 42 Student-level variables B (SE) β (SE) B (SE) β (SE) B (SE) β (SE) B (SE) β (SE) B (SE) β (SE) B (SE) β (SE) B (SE) β (SE) (reference category) Student likes learning 4.24 0.10 4.29 0.10 5.05 0.12 4.67 0.11 4.82 0.11 mathematics (1.19)*** (0.03)*** (1.24)** (0.03)** (1.53)** (0.04)** (1.60)** (0.04)** (1.55)** (0.04)** Student confident in 9.93 0.26 9.26 0.24 7.83 0.20 7.91 0.20 8.24 0.21 mathematics (0.99)*** (0.03)*** (1.10)*** (0.03)*** (1.38)*** (0.04)*** (1.50)*** (0.04)*** (1.47)*** (0.04)*** Student sense of school -2.81 -0.08 -3.36 -0.09 -3.10 -0.08 -3.10 -0.08 belonging (0.93)** (0.03)** (1.11)** (0.03)** (1.16)** (0.03)** (1.17)** (0.03)** 4.59 0.12 4.94 0.13 4.91 0.13 5.20 0.14 Bullyinga (0.97)*** (0.03)*** (1.21)*** (0.03)*** (1.28)*** (0.04)*** (1.26)*** (0.03)*** Class/school-level variables (reference category) School mean of home 17.17 0.28 15.00 0.26 13.99 0.24 11.57 0.22 15.70 0.30 10.65 0.20 resources for learning (7.57)* (0.13)* (7.12)* (0.13)* (7.41) (0.13) (7.23) (0.14) (9.03) (0.17) (8.07) (0.15) School location (urban) suburban/medium size -15.48 -0.31 -9.66 -0.21 -10.95 -0.24 -7.71 -0.18 -8.40 -0.19 -3.81 -0.09 city or large town (9.42) (0.18) (9.34) (0.20) (9.30) (0.20) (9.82) (0.22) (11.43) (0.26) (9.56) (0.21) small town or 30.03 0.60 24.71 0.53 18.96 0.41 24.20 0.55 54.91 1.23 42.47 0.94 village/remote rural (16.86) (0.32) (15.79) (0.32) (16.94) (0.35) (18.82) (0.41) (26.49)* (0.53)* (24.77) (0.53) Teacher age (40 years or older) -23.61 -0.47 -20.04 -0.43 -21.58 -0.47 9.73 0.22 15.59 0.35 -4.73 -0.11 29 years or younger (13.60) (0.27) (14.40) (0.31) (15.25) (0.33) (20.67) (0.47) (19.55) (0.44) (18.68) (0.42) 0.05 0.00 0.32 0.01 0.74 0.02 13.18 0.30 16.37 0.37 18.35 0.41 30–39 years (8.43) (0.17) (8.02) (0.17) (8.31) (0.18) (10.96) (0.25) (14.17) (0.32) (12.26) (0.27) 3.04 0.12 2.43 0.11 0.32 0.01 6.63 0.28 Safe and orderly schools (2.22) (0.09) (3.32) (0.15) (3.21) (0.14) (4.00) (0.16) School emphasis on 1.53 0.07 -0.18 -0.01 -0.26 -0.01 2.93 0.15 academic success (2.11) (0.10) (2.58) (0.13) (2.76) (0.14) (2.58) (0.13) 0.73 0.04 -3.26 -0.18 -3.91 -0.22 -1.60 -0.09 School discipline (1.86) (0.09) (3.58) (0.20) (3.80) (0.21) (3.74) (0.20) Teacher years of 1.25 0.23 1.76 0.33 2.23 0.41 experience (0.59)* (0.11)* (0.60)** (0.11)** (0.59)*** (0.11)*** 1.53 0.04 1.10 0.03 -0.03 0.00 Teacher job satisfaction (4.16) (0.12) (4.34) (0.12) (4.03) (0.11) Time assigned to 12.58 0.29 4.63 0.10 9.68 0.21 mathematics homework (8.62) (0.20) (8.85) (0.20) (8.52) (0.19) (15 minutes or less)b 43 Class/school-level variables (reference B (SE) β (SE) B (SE) β (SE) B (SE) β (SE) B (SE) β (SE) B (SE) β (SE) B (SE) β (SE) B (SE) β (SE) category) Poor teacher timekeeping (not a problem) -25.79 -0.59 -24.39 -0.55 -14.89 -0.33 minor problem (11.57)* (0.27)* (11.49)* (0.27)* (8.67) (0.19) moderate or serious -9.48 -0.22 -2.18 -0.05 13.44 0.30 problem (14.95) (0.34) (17.35) (0.39) (16.06) (0.36) Teacher absenteeism (not a problem) 2.67 0.06 8.04 0.18 5.03 0.11 minor problem (13.69) (0.31) (13.45) (0.31) (11.02) (0.24) moderate or serious -6.17 -0.14 -7.64 -0.17 -8.53 -0.19 problem (23.06) (0.53) (23.90) (0.54) (21.01) (0.47) Teacher major area of study (education and mathematics) mathematics but not -8.33 -0.19 -8.20 -0.18 -6.76 -0.15 education (9.88) (0.23) (9.26) (0.21) (8.18) (0.18) -23.35 -0.54 -9.88 -0.22 -18.21 -0.40 all other majors (13.66) (0.31) (12.49) (0.28) (10.74) (0.23) Professional development hours on mathematics (more than 35 hours) -24.59 -0.56 -29.46 -0.66 -22.62 -0.50 16–35 hours (11.01)* (0.26)* (11.69)* (0.26)* (10.50)* (0.23)* 3.28 0.08 -5.35 -0.12 4.99 0.11 6–15 hours (12.46) (0.28) (13.24) (0.30) (12.26) (0.27) -13.12 -0.30 -3.30 -0.08 4.14 0.09 less than 6 hours (17.06) (0.39) (14.44) (0.33) (12.28) (0.27) 6.89 0.16 -1.71 -0.04 -2.60 -0.06 none (16.99) (0.39) (21.24) (0.48) (16.21) (0.36) Professional development 14.92 0.34 19.53 0.44 15.17 0.33 on mathematics content (11.55) (0.25) (10.85) (0.23) (10.60) (0.23) (no) Professional development 7.51 0.17 13.29 0.30 14.85 0.33 on mathematics pedagogy (11.18) (0.26) (11.26) (0.25) (9.46) (0.22) (no) 44 Class/school-level variables (reference B (SE) β (SE) B (SE) β (SE) B (SE) β (SE) B (SE) β (SE) B (SE) β (SE) B (SE) β (SE) B (SE) β (SE) category) Principal years of -0.13 -0.02 -0.29 -0.04 experience (0.79) (0.12) (0.71) (0.10) Principal highest level of -3.76 -0.08 4.78 0.11 education (ISCED levels 7 (15.01) (0.34) (12.47) (0.28) & 8 — master’s or doctorate degree)c Principal qualification in -4.76 -0.11 -6.06 -0.13 educational leadership (13.02) (0.29) (10.83) (0.24) (no) 19.09 0.43 15.10 0.33 School library (no) (10.82) (0.23) (10.29) (0.22) Interaction terms (reference category) Student gender*Safe and -16.25 -1.13 orderly schools (5.13)** (0.34)** Student gender*Student absenteeism (never or 22.88 0.26 almost never) — once a (10.44)* (0.12)* week Student gender*Teacher 89.46 1.01 age (40 years or older) — (30.62)** (0.34)** 29 years or younger Loglikelihood -32207.65 -18375.32 -17089.57 -14821.53 -9654.01 -8549.92 -8533.11 Fit (H0) statistics AIC 64423.29 36784.64 34225.13 29699.05 19394.02 17193.84 17166.23 BIC 64449.71 36887.46 34362.86 29862.73 19627.00 17442.83 17431.12 Note: Null model: Intercept (SE): 402.02 (4.44), H0 = -32210.89, AIC = 64427.78, BIC = 64447.59. *p < .05; **p < .01; ***p < .001. aHigher scores indicate less frequent bullying; b Other category: 16 minutes or more; cOther category: ISCED Level 6 — bachelor’s or equivalent level. 45 Table A3.2. Hierarchical two-level linear regression model for science achievement, grade 4, TIMSS 2019 Step 1 Step 2 Step 3 Step 4 Step 5 Step 6 Step 7 student-level (%) 7.6 12.2 21.0 21.4 22.7 22.5 27.4 R2 class/school-level (%) 16.3 17.4 20.3 36.6 46.0 64.8 Intercept (SE) 381.71 (6.89) 397.44 (8.37) 415.31 (7.92) 426.93 (8.46) 427.69 (18.15) 433.35 (21.20) 320.48 (35.66) Student-level variables B (SE) β (SE) B (SE) β (SE) B (SE) β (SE) B (SE) β (SE) B (SE) β (SE) B (SE) β (SE) B (SE) β (SE) (reference category) 53.14 0.55 39.36 0.42 21.61 0.23 17.44 0.19 16.31 0.18 21.52 0.23 249.50 2.59 Student gender (male) (8.29)*** (0.08)*** (7.04)*** (0.07)*** (6.57)** (0.07)** (7.38)* (0.08)* (8.29)* (0.09)* (8.79)* (0.09)* (59.64)*** (0.59)*** Student immigration status (native) second-generation 12.11 0.13 14.70 0.16 16.13 0.18 11.93 0.13 14.12 0.15 13.25 0.14 immigrant students (8.52) (0.09) (9.16) (0.10) (10.18) (0.11) (11.74) (0.13) (11.98) (0.13) (12.08) (0.13) first-generation 44.37 0.47 36.26 0.39 30.06 0.33 37.85 0.41 32.10 0.34 37.67 0.39 immigrant students (9.32)*** (0.10)*** (10.55)** (0.11)** (12.10)* (0.13)* (13.30)** (0.14)** (14.51)* (0.16)* (13.70)** (0.14)** Home resources for 4.19 0.06 3.11 0.05 3.31 0.05 2.58 0.04 2.85 0.04 2.83 0.04 learning (1.57)** (0.02)** (1.78) (0.03) (1.88) (0.03) (2.43) (0.04) (2.79) (0.04) (2.79) (0.04) Student owns mobile -3.96 -0.04 -3.16 -0.03 -2.30 -0.03 -1.99 -0.02 -1.58 -0.02 -1.45 -0.02 phone (no) (1.11)*** (0.01)*** (1.24)* (0.01)* (1.69) (0.02) (1.94) (0.02) (1.96) (0.02) (1.96) (0.02) Literacy and numeracy 9.51 0.20 7.29 0.16 7.11 0.15 7.33 0.16 7.62 0.16 10.35 0.21 readiness for school (1.17)*** (0.02)*** (1.07)*** (0.02)*** (1.10)*** (0.02)*** (1.32)*** (0.03)*** (1.41)*** (0.03)*** (1.91)*** (0.04)*** Preschool attendance and duration (did not attend) -0.81 -0.01 1.64 0.02 1.83 0.02 5.25 0.06 4.88 0.05 3.79 0.04 1 year or less (5.24) (0.06) (4.92) (0.05) (4.87) (0.05) (5.08) (0.05) (5.19) (0.06) (5.14) (0.05) 10.23 0.11 10.44 0.11 8.92 0.10 11.80 0.13 8.61 0.09 6.55 0.07 2 years (6.02) (0.06) (6.19) (0.07) (6.66) (0.07) (6.88) (0.07) (6.40) (0.07) (6.36) (0.07) 2.03 0.02 7.78 0.08 6.36 0.07 10.07 0.11 8.71 0.09 8.66 0.09 3 years or more (7.38) (0.08) (7.02) (0.07) (7.75) (0.08) (8.27) (0.09) (8.62) (0.09) (8.49) (0.09) Student absenteeism (never or almost never) 8.03 0.09 4.09 0.04 4.43 0.05 7.90 0.09 8.14 0.08 once every two months (5.90) (0.06) (6.50) (0.07) (7.50) (0.08) (7.20) (0.08) (7.23) (0.08) 2.63 0.03 -1.46 -0.02 -2.88 -0.03 -2.91 -0.03 -2.24 -0.02 once a month (5.09) (0.05) (6.42) (0.07) (7.24) (0.08) (7.42) (0.08) (7.39) (0.08) -16.74 -0.18 -18.50 -0.20 -16.35 -0.18 -15.47 -0.17 -15.07 -0.16 once every two weeks (7.30)* (0.08)* (7.85)* (0.09)* (8.85) (0.10) (9.37) (0.10) (9.20) (0.10) -22.77 -0.24 -22.91 -0.25 -22.69 -0.24 -21.05 -0.23 -20.78 -0.22 once a week (4.75)*** (0.05)*** (4.79)*** (0.05)*** (5.41)*** (0.06)*** (5.67)*** (0.06)*** (5.50)*** (0.06)*** Student likes learning 4.95 0.12 4.86 0.12 5.43 0.14 5.33 0.13 5.16 0.13 science (1.27)*** (0.03)*** (1.34)*** (0.03)*** (1.66)** (0.04)** (1.76)** (0.04)** (1.75)** (0.04)** 46 Student-level variables B (SE) β (SE) B (SE) β (SE) B (SE) β (SE) B (SE) β (SE) B (SE) β (SE) B (SE) β (SE) B (SE) β (SE) (reference category) Student confident in 8.98 0.20 7.85 0.18 8.16 0.18 7.89 0.17 8.09 0.17 science (1.23)*** (0.03)*** (1.24)*** (0.03)*** (1.52)*** (0.03)*** (1.62)*** (0.04)*** (1.61)*** (0.04)*** Student sense of school -3.58 -0.09 -3.88 -0.09 -3.68 -0.09 -3.55 -0.08 belonging (1.10)** (0.03)** (1.27)** (0.03)** (1.32)** (0.03)** (1.31)** (0.03)** 5.71 0.14 6.17 0.15 5.78 0.14 5.91 0.14 Bullyinga (1.24)*** (0.03)*** (1.36)*** (0.03)*** (1.44)*** (0.03)*** (1.44)*** (0.03)*** Class/school-level variables (reference category) School mean of home 21.38 0.36 22.12 0.41 18.81 0.35 20.20 0.38 22.16 0.43 17.82 0.33 resources for learning (6.67)** (0.11)** (6.12)*** (0.11)*** (6.50)** (0.12)** (6.97)** (0.13)** (6.74)** (0.13)** (5.67)** (0.11)** School location (urban) suburban/medium size -8.85 -0.18 -5.86 -0.13 -8.89 -0.20 -15.52 -0.34 -10.96 -0.25 -36.03 -0.79 city or large town (8.22) (0.17) (7.26) (0.16) (8.17) (0.18) (9.10) (0.20) (9.66) (0.21) (10.25)*** (0.20)*** small town or 37.58 0.77 31.37 0.71 26.80 0.61 35.84 0.80 60.77 1.37 60.10 1.32 village/remote rural (14.65)* (0.28)* (13.32)* (0.28)* (14.68) (0.32) (17.51)* (0.37)* (20.36)** (0.44)** (16.90)*** (0.36)*** Teacher age (40 years or older) -29.23 -0.60 -25.94 -0.59 -30.06 -0.68 -48.67 -1.08 -43.78 -0.98 -17.31 -0.38 29 years or younger (13.04)* (0.26)* (12.86)* (0.28)* (14.19)* (0.31)* (19.04)* (0.40)* (17.99)* (0.39)* (16.16) (0.36) -8.08 0.17 -9.41 -0.21 -13.84 -0.31 -26.42 -0.59 -28.52 -0.64 -20.29 -0.45 30–39 years (7.94) (0.16) (7.37) (0.16) (8.02) (0.17) (11.01)* (0.24)* (10.26)** (0.23)** (9.24)* (0.20)* 1.61 0.07 4.41 0.20 5.29 0.24 9.92 0.44 Safe and orderly schools (2.34) (0.11) (2.74) (0.12) (3.07) (0.13) (3.52)** (0.14)** School emphasis on 1.38 0.07 -1.17 -0.05 -1.91 -0.09 -0.75 -0.04 academic success (1.83) (0.09) (1.95) (0.09) (1.83) (0.09) (1.71) (0.08) 0.22 0.01 -1.67 -0.09 -1.84 -0.10 -0.85 -0.04 School discipline (1.60) (0.08) (2.55) (0.13) (2.68) (0.14) (2.51) (0.13) Teacher years of -1.18 -0.22 -1.51 -0.28 -0.95 -0.17 experience (0.67) (0.12) (0.60)* (0.11)* (0.59) (0.11) -2.64 -0.08 -3.32 -0.10 -4.52 -0.13 Teacher job satisfaction (3.25) (0.09) (3.34) (0.10) (3.17) (0.09) Time assigned to science -3.54 -0.08 -7.30 -0.17 -5.47 -0.12 homework (15 minutes or (10.40) (0.23) (11.31) (0.25) (9.81) (0.22) less)b 47 Class/school-level variables (reference B (SE) β (SE) B (SE) β (SE) B (SE) β (SE) B (SE) β (SE) B (SE) β (SE) B (SE) β (SE) B (SE) β (SE) category) Poor teacher timekeeping (not a problem) 3.63 0.08 2.50 0.06 4.86 0.11 minor problem (10.27) (0.23) (11.75) (0.26) (10.50) (0.23) moderate or serious 1.00 0.02 -1.66 -0.04 -0.39 -0.01 problem (16.37) (0.36) (17.19) (0.39) (15.45) (0.34) Teacher absenteeism (not a problem) -7.15 -0.16 1.01 0.02 -2.16 -0.05 minor problem (11.19) (0.25) (13.17) (0.30) (10.94) (0.24) moderate or serious -16.14 -0.36 1.87 0.04 5.55 0.12 problem (14.69) (0.32) (16.55) (0.37) (13.59) (0.30) Teacher major area of study (education and science) science but not -2.38 -0.05 -2.69 -0.06 -7.35 -0.16 education (9.82) (0.22) (9.78) (0.22) (8.84) (0.19) -16.82 -0.37 -23.80 -0.53 -34.13 -0.75 all other majors (17.36) (0.38) (18.47) (0.41) (17.72) (0.38) Professional development hours on science (more than 35 hours) 5.55 0.12 3.50 0.08 5.25 0.11 16–35 hours (14.21) (0.31) (14.09) (0.32) (11.73) (0.26) 2.88 0.06 -5.01 -0.11 -4.52 -0.10 6–15 hours (11.07) (0.24) (12.01) (0.27) (10.92) (0.24) 13.39 0.30 18.81 0.42 12.82 0.28 less than 6 hours (15.77) (0.34) (15.70) (0.35) (13.31) (0.29) -13.09 -0.29 -35.32 -0.79 -37.77 -0.83 none (17.23) (0.39) (15.06)* (0.35)* (15.23)* (0.35)* Professional development 11.97 0.27 9.96 0.22 11.64 0.26 on science content (no) (11.22) (0.24) (12.76) (0.28) (11.14) (0.24) Professional development 6.66 0.15 -2.53 -0.06 -1.60 -0.04 on science pedagogy (no) (9.66) (0.21) (10.00) (0.22) (9.00) (0.20) Principal years of -0.20 -0.03 0.15 0.02 experience (0.66) (0.10) (0.60) (0.09) 48 Class/school-level variables (reference B (SE) β (SE) B (SE) β (SE) B (SE) β (SE) B (SE) β (SE) B (SE) β (SE) B (SE) β (SE) B (SE) β (SE) category) Principal highest level of education (ISCED levels 7 -11.91 -0.27 -13.19 -0.29 & 8 — master’s or (10.93) (0.25) (9.41) (0.21) doctorate degree)c Principal qualification in 5.72 0.13 7.28 0.16 educational leadership (9.43) (0.21) (8.66) (0.19) (no) 4.69 0.11 7.65 0.17 School library (no) (8.59) (0.19) (7.90) (0.17) Interaction terms (reference category) Student gender*Safe and -15.61 -0.94 orderly schools (4.93)** (0.29)** Student gender*Literacy -6.29 -0.35 and numeracy readiness (2.57)* (0.15)* for school Student gender*School location (urban) — 75.76 0.79 suburban/medium size city (15.57)*** (0.16)*** or large town Loglikelihood -13820.91 -12381.76 -12361.03 -32681.95 -24789.30 -22223.42 -17688.83 (H0) Fit 27727.81 24857.51 24822.05 statistics AIC 65371.91 49612.60 44492.84 35433.66 BIC 65398.32 49720.30 44636.27 35601.96 27975.68 25123.27 25104.77 Note: Null model: Intercept (SE): 407.42 (4.88), H0 = -32704.43, AIC = 65414.86, BIC = 65434.67. *p < .05; **p < .01; ***p < .001. aHigher scores indicate less frequent bullying; b Other category: 16 minutes or more; cOther category: ISCED Level 6 — bachelor’s or equivalent level. 49 Table A3.3. Hierarchical two-level regression model for mathematics achievement, grade 8, TIMSS 2019 Step 1 Step 2 Step 3 Step 4 Step 5 Step 6 Step 7 student-level (%) 2.3 6.2 25.3 25.3 25.2 26.7 27.3 R2 class/school-level (%) 39.5 40.7 45.4 63.4 64.0 74.4 Intercept (SE) 384.54 (4.27) 393.90 (6.09) 408.61 (5.96) 410.32 (6.24) 406.71 (16.40) 367.09 (23.42) 310.38 (36.78) Student-level variables B (SE) β (SE) B (SE) β (SE) B (SE) β (SE) B (SE) β (SE) B (SE) β (SE) B (SE) β (SE) B (SE) β (SE) (reference category) 20.78 0.30 15.79 0.23 18.14 0.26 15.47 0.22 9.88 0.14 16.29 0.23 90.11 1.27 Student gender (male) (5.29)*** (0.08)*** (4.78)** (0.07)** (4.52)*** (0.07)*** (4.92)** (0.07)** (6.06) (0.09) (6.98)* (0.10)* (35.49)* (0.49)* Student immigration status (native) second-generation 27.44 0.40 22.17 0.32 22.45 0.32 20.24 0.29 17.59 0.25 17.51 0.25 immigrant students (6.04)*** (0.09)*** (5.17)*** (0.07)*** (5.52)*** (0.08)*** (7.56)** (0.11)** (7.92)* (0.11)* (7.97)* (0.11)* first-generation 55.40 0.80 42.56 0.61 43.43 0.63 34.47 0.49 31.83 0.45 31.92 0.45 immigrant students (6.87)*** (0.10)*** (6.82)*** (0.10)*** (7.11)*** (0.10)*** (8.79)*** (0.12)*** (9.37)** (0.13)** (9.33)** (0.13)** Home educational 6.68 0.15 3.85 0.08 3.65 0.08 4.17 0.09 4.09 0.09 4.10 0.09 resources (0.92)*** (0.02)*** (0.83)*** (0.02)*** (0.93)*** (0.02)*** (1.12)*** (0.02)*** (1.03)*** (0.02)*** (1.03)*** (0.02)*** Student owns mobile -3.47 -0.05 0.03 0.00 0.10 0.00 -1.32 -0.02 4.20 0.06 3.80 0.05 phone (no) (3.52) (0.05) (3.74) (0.05) (3.77) (0.05) (4.82) (0.07) (5.13) (0.07) (5.07) (0.07) Student absenteeism (never or almost never) -5.25 -0.08 -6.73 -0.10 -7.08 -0.10 -8.73 -0.12 -9.10 -0.13 once every two months (4.70) (0.07) (4.44) (0.06) (5.85) (0.08) (5.81) (0.08) (5.78) (0.08) -16.71 -0.24 -17.72 -0.26 -17.28 -0.25 -19.47 -0.28 -20.02 -0.28 once a month (4.18)*** (0.06)*** (3.96)*** (0.06)*** (5.18)** (0.07)** (5.06)*** (0.07)*** (5.20)*** (0.07)*** -20.82 -0.30 -22.48 -0.33 -27.00 -0.38 -29.92 -0.42 -30.31 -0.43 once every two weeks (3.74)*** (0.05)*** (3.86)*** (0.06)*** (4.68)*** (0.07)*** (4.84)*** (0.07)*** (4.83)*** (0.07)*** -39.72 -0.57 -41.01 -0.59 -38.91 -0.55 -44.23 -0.63 -44.52 -0.63 once a week (4.98)*** (0.07)*** (5.15)*** (0.08)*** (6.36)*** (0.09)*** (6.40)*** (0.09)*** (6.53)*** (0.09)*** Student likes learning -4.76 -0.14 -4.25 -0.13 -4.15 -0.12 -4.30 -0.13 -4.35 -0.13 mathematics (1.07)*** (0.03)*** (1.12)*** (0.03)*** (1.49)** (0.04)** (1.57)** (0.05)** (1.57)** (0.05)** Student confident in 15.27 0.44 15.04 0.43 15.27 0.44 15.24 0.44 15.26 0.44 mathematics (0.94)*** (0.03)*** (1.02)*** (0.03)*** (1.13)*** (0.03)*** (1.27)*** (0.03)*** (1.27)*** (0.03)*** Student sense of school -1.62 -0.05 -1.22 -0.03 -1.03 -0.03 -1.00 -0.03 belonging (0.79)* (0.02)* (1.04) (0.03) (1.12) (0.03) (1.12) (0.03) 0.86 0.03 -0.37 -0.01 -0.42 -0.01 -0.43 -0.01 Bullyinga (0.57) (0.02) (0.85) (0.03) (0.91) (0.03) (0.91) (0.03) Time spent on -11.19 -0.16 -11.35 -0.16 -11.40 -0.16 mathematics homework (3.70)** (0.05)** (3.98)** (0.06)** (4.01)** (0.06)** (15 minutes or less)b 50 Class/school-level variables (reference B (SE) β (SE) B (SE) β (SE) B (SE) β (SE) B (SE) β (SE) B (SE) β (SE) B (SE) β (SE) B (SE) β (SE) category) School mean of home 27.21 0.61 24.39 0.60 22.21 0.52 23.74 0.57 25.97 0.62 26.91 0.57 educational resources (3.70)*** (0.08)*** (3.40)*** (0.08)*** (3.67)*** (0.08)*** (4.24)*** (0.10)*** (5.32)*** (0.11)*** (5.29)*** (0.11)*** School location (urban) suburban/medium size -6.63 -0.21 -8.79 -0.30 -10.80 -0.35 3.47 0.11 6.64 0.22 9.63 0.28 city or large town (5.37) (0.17) (4.86) (0.16) (5.03)* (0.16)* (5.28) (0.17) (6.19) (0.20) (6.22) (0.17) small town or 0.83 0.03 -2.18 -0.07 -4.79 -0.16 -0.98 -0.03 2.05 0.06 4.58 0.13 village/remote rural (7.39) (0.23) (6.84) (0.23) (7.36) (0.24) (7.13) (0.24) (11.06) (0.36) (11.13) (0.32) Teacher age (40 years or older) -9.81 -0.30 -11.77 -0.39 -12.72 -0.41 -22.96 -0.76 -20.08 -0.66 -23.36 -0.68 29 years or younger (8.36) (0.25) (7.88) (0.26) (8.30) (0.27) (15.40) (0.49) (19.13) (0.62) (19.05) (0.54) -3.95 -0.12 -4.00 -0.13 -0.61 -0.02 -6.02 -0.20 -5.54 -0.18 -6.51 -0.19 30–39 years (4.75) (0.15) (4.50) (0.15) (4.85) (0.16) (9.72) (0.32) (11.15) (0.36) (11.35) (0.33) 0.57 0.04 1.85 0.12 0.87 0.05 0.87 0.05 Safe and orderly schools (1.03) (0.07) (1.38) (0.08) (1.68) (0.10) (1.62) (0.09) School emphasis on 2.92 0.20 2.40 0.16 1.80 0.13 5.76 0.36 academic success (1.11)** (0.07)** (1.57) (0.10) (1.70) (0.12) (2.48)* (0.13)* 0.05 0.00 -2.23 -0.21 -2.60 -0.24 -3.11 -0.25 School discipline (0.83) (0.07) (1.50) (0.14) (1.92) (0.18) (1.93) (0.15) Teacher years of -0.28 -0.07 -0.49 -0.12 -0.60 -0.13 experience (0.69) (0.16) (0.88) (0.21) (0.88) (0.19) -1.87 -0.08 -1.07 -0.05 -1.70 -0.07 Teacher job satisfaction (2.21) (0.09) (2.18) (0.09) (2.15) (0.08) Time assigned to 6.50 0.22 10.39 0.34 13.77 0.40 mathematics homework (5.12) (0.17) (6.03) (0.20) (6.21)* (0.17)* (15 minutes or less)b Poor teacher timekeeping (not a problem) 6.14 0.21 7.38 0.25 7.72 0.23 minor problem (7.12) (0.23) (8.34) (0.27) (7.64) (0.22) moderate or serious -1.58 -0.05 1.67 0.06 10.83 0.32 problem (11.47) (0.38) (15.21) (0.50) (14.07) (0.40) Teacher absenteeism (not a problem) -2.58 -0.09 -3.46 -0.12 -1.30 -0.04 minor problem (6.76) (0.22) (7.99) (0.26) (7.43) (0.22) moderate or serious -20.51 -0.68 -20.00 -0.66 -28.83 -0.84 problem (11.34) (0.37) (13.91) (0.45) (14.24)* (0.39)* 51 Class/school-level variables (reference B (SE) β (SE) B (SE) β (SE) B (SE) β (SE) B (SE) β (SE) B (SE) β (SE) B (SE) β (SE) B (SE) β (SE) category) Teacher highest level of education (Up to ISCED 28.40 0.95 54.32 1.80 67.81 1.99 level 6 — bachelor’s or (13.35)* (0.46)* (15.71)** (0.56)** (17.26)*** (0.52)*** equivalent level)c Teacher major area of study (mathematics and mathematics education) mathematics but not -2.03 -0.07 -5.55 -0.18 -5.02 -0.15 mathematics education (6.44) (0.21) (7.31) (0.23) (7.25) (0.21) 4.00 0.14 -0.44 -0.01 -5.21 -0.15 all other majors (8.74) (0.30) (9.16) (0.30) (9.65) (0.27) Professional development hours on mathematics (more than 35 hours) -15.52 -0.52 -14.62 -0.48 -18.01 -0.53 16–35 hours (7.44)* (0.24)* (8.62) (0.27) (8.67)* (0.24)* -4.68 -0.15 -0.91 -0.03 -2.98 -0.08 6–15 hours (7.23) (0.24) (7.97) (0.26) (7.67) (0.22) -7.32 -0.25 -9.99 -0.33 -11.23 -0.33 less than 6 hours (8.94) (0.30) (8.94) (0.29) (8.69) (0.25) -22.28 -0.75 -19.37 -0.64 -24.04 -0.71 none (11.69) (0.39) (11.70) (0.39) (12.16)* (0.35)* Professional development 7.65 0.25 4.33 0.14 5.13 0.15 on mathematics content (4.86) (0.16) (5.31) (0.17) (5.06) (0.15) (no) Professional development -4.76 -0.16 -0.02 -0.01 1.28 0.03 on mathematics pedagogy (6.15) (0.21) (7.05) (0.23) (6.95) (0.20) (no) Principal years of -0.24 -0.06 -0.34 -0.07 experience (0.38) (0.09) (0.39) (0.08) Principal highest level of education (ISCED levels 7 -4.32 -0.15 1.36 0.03 & 8 — master’s or (11.75) (0.39) (11.86) (0.35) doctorate degree)d Principal qualification in 12.85 0.42 11.58 0.34 educational leadership (8.06) (0.27) (7.68) (0.22) (no) 9.32 0.31 9.76 0.28 School library (no) (6.98) (0.22) (6.95) (0.19) 52 Interaction terms B (SE) β (SE) B (SE) β (SE) B (SE) β (SE) B (SE) β (SE) B (SE) β (SE) B (SE) β (SE) B (SE) β (SE) (reference category) Student gender*School -6.43 -0.56 emphasis on academic (3.04)* (0.26)* success Loglikelihood -13670.53 -12113.38 -32290.17 -27131.49 -25465.23 -23557.67 -12109.54 (H0) Fit 27423.06 24316.77 statistics AIC 64588.35 54288.99 50968.47 47163.34 24311.07 BIC 64614.93 54373.18 51090.67 47315.85 27661.20 24572.77 24572.76 Note: Null model: Intercept (SE): 394.83 (2.88), H0 = -32297.78, AIC = 64601.56, BIC = 64621.49. *p < .05; **p < .01; ***p < .001. aHigher scores indicate less frequent bullying; b Other category: 16 minutes or more; cOther category: ISCED levels 7 & 8 — master’s or doctorate degree; dOther category: ISCED Level 6 — bachelor’s or equivalent level. 53 Table A3.4. Hierarchical two-level regression model for science achievement, grade 8, TIMSS 2019 Step 1 Step 2 Step 3 Step 4 Step 5 Step 6 Step 7 student-level (%) 9.6 11.8 23.1 23.0 24.0 25.5 27.2 R2 class/school-level (%) 37.9 39.1 45.6 65.5 69.8 79.0 Intercept (SE) 406.87 (4.78) 412.26 (5.99) 427.80 (6.01) 437.21 (6.16) 462.45 (13.44) 441.38 (19.49) 426.88 (24.75) Student-level variables B (SE) β (SE) B (SE) β (SE) B (SE) β (SE) B (SE) β (SE) B (SE) β (SE) B (SE) β (SE) B (SE) β (SE) (reference category) 50.40 0.62 45.83 0.57 42.28 0.53 35.42 0.45 34.88 0.44 39.89 0.50 74.88 0.92 Student gender (male) (5.80)*** (0.07)*** (5.26)*** (0.06)*** (5.00)*** (0.06)*** (5.03)*** (0.06)*** (7.10)*** (0.08)*** (7.52)*** (0.09)*** (25.07)** (0.31)** Student immigration status (native) second-generation 25.22 0.31 22.15 0.28 20.70 0.26 33.19 0.42 31.81 0.40 31.84 0.39 immigrant students (6.31)*** (0.08)*** (5.85)*** (0.07)*** (6.13)** (0.08)** (7.73)*** (0.10)*** (8.36)*** (0.11)*** (8.36)*** (0.11)*** first-generation 53.93 0.67 42.57 0.53 44.95 0.57 44.76 0.56 43.75 0.55 43.76 0.54 immigrant students (7.72)*** (0.09)*** (7.72)*** (0.10)*** (8.34)*** (0.10)*** (8.96)*** (0.11)*** (9.67)*** (0.12)*** (9.57)*** (0.12)*** Home educational 7.03 0.13 4.17 0.08 4.16 0.08 5.16 0.10 5.39 0.10 5.44 0.10 resources (0.93)*** (0.02)*** (0.89)*** (0.02)*** (0.93)*** (0.02)*** (1.34)*** (0.03)*** (1.40)*** (0.03)*** (1.40)*** (0.03)*** Student owns mobile -0.25 0.00 -0.04 0.00 -1.35 -0.02 -3.04 -0.04 -0.32 0.00 -1.32 -0.02 phone (no) (3.97) (0.05) (4.05) (0.05) (4.12) (0.05) (5.90) (0.07) (6.32) (0.08) (6.53) (0.08) Student absenteeism (never or almost never) 2.46 0.03 2.54 0.03 -3.73 -0.05 -5.55 -0.07 -5.16 -0.06 once every two months (4.59) (0.06) (4.77) (0.06) (6.04) (0.08) (6.26) (0.08) (6.26) (0.08) -7.15 -0.09 -7.93 -0.10 -11.55 -0.15 -13.34 -0.17 -12.49 -0.15 once a month (4.32) (0.05) (4.29) (0.05) (5.26)* (0.07)* (5.34)* (0.07)* (5.24)* (0.07)* -10.48 -0.13 -12.70 -0.16 -9.67 -0.12 -10.40 -0.13 -10.19 -0.13 once every two weeks (5.31)* (0.07)* (5.83)* (0.07)* (7.23) (0.09) (7.47) (0.09) (7.49) (0.09) -37.93 -0.47 -39.38 -0.50 -44.29 -0.56 -46.77 -0.58 -47.10 -0.58 once a week (4.58)*** (0.06)*** (4.68)*** (0.06)*** (6.50)*** (0.08)*** (6.44)*** (0.08)*** (6.46)*** (0.08)*** Student likes learning -0.38 -0.01 -0.02 0.00 -0.57 -0.02 -0.85 -0.02 -0.79 -0.02 science (0.86) (0.02) (0.91) (0.03) (1.11) (0.03) (1.17) (0.03) (1.18) (0.03) Student confident in 11.66 0.29 10.98 0.28 11.14 0.28 11.35 0.28 14.09 0.35 science (0.98)*** (0.02)*** (1.08)*** (0.03)*** (1.39)*** (0.03)*** (1.54)*** (0.04)*** (2.02)*** (0.05)*** Student sense of school -2.94 -0.07 -1.09 -0.03 -1.20 -0.03 -1.15 -0.03 belonging (0.85)** (0.02)** (1.29) (0.03) (1.45) (0.04) (1.45) (0.04) 3.73 0.10 1.77 0.05 1.49 0.04 1.32 0.03 Bullyinga (0.75)*** (0.02)*** (1.19) (0.03) (1.21) (0.03) (1.21) (0.03) Time spent on science -11.92 -0.15 -12.37 -0.15 -11.94 -0.15 homework (15 minutes or (3.65)** (0.05)** (3.99)** (0.05)** (3.99)** (0.05)** less)b 54 Class/school-level variables (reference B (SE) β (SE) B (SE) β (SE) B (SE) β (SE) B (SE) β (SE) B (SE) β (SE) B (SE) β (SE) B (SE) β (SE) category) School mean of home 28.03 0.61 25.49 0.60 21.87 0.51 21.33 0.55 20.23 0.55 18.59 0.46 educational resources (4.16)*** (0.08)*** (4.12)*** (0.09)*** (4.54)*** (0.10)*** (4.13)*** (0.10)*** (4.66)*** (0.11)*** (4.37)*** (0.11)*** School location (urban) suburban/medium size -7.02 -0.21 -7.48 -0.25 -8.50 -0.28 -7.82 -0.27 -6.53 -0.24 -7.84 -0.26 city or large town (5.65) (0.17) (5.15) (0.17) (5.19) (0.17) (6.08) (0.21) (5.96) (0.22) (5.70) (0.19) small town or 0.35 0.01 -2.71 -0.09 -2.24 -0.07 0.61 0.02 3.12 0.11 -3.79 -0.13 village/remote rural (8.28) (0.25) (8.22) (0.27) (8.52) (0.27) (8.14) (0.28) (8.44) (0.31) (8.53) (0.28) Teacher age (40 years or older) -9.55 -0.29 -8.28 -0.27 -17.24 -0.56 1.49 0.05 9.78 0.35 3.56 0.12 29 years or younger (10.34) (0.31) (9.17) (0.30) (9.12) (0.30) (11.87) (0.41) (13.38) (0.48) (12.88) (0.43) -1.06 -0.03 -1.13 -0.04 -3.97 -0.13 -0.81 -0.03 -2.14 -0.08 -20.34 -0.68 30–39 years (4.80) (0.14) (4.59) (0.15) (4.72) (0.15) (6.27) (0.22) (6.47) (0.23) (7.79)** (0.23)** 2.45 0.16 3.01 0.20 2.29 0.16 1.96 0.13 Safe and orderly schools (1.24)* (0.08)* (1.31)* (0.08)* (1.40) (0.09) (1.44) (0.09) School emphasis on 2.50 0.17 0.45 0.03 0.07 0.01 0.46 0.03 academic success (1.13)* (0.08)* (1.35) (0.10) (1.24) (0.10) (1.29) (0.09) 0.13 0.01 0.07 0.01 0.64 0.07 0.13 0.01 School discipline (0.96) (0.08) (1.16) (0.11) (1.14) (0.12) (1.13) (0.11) Teacher years of 0.56 0.15 0.54 0.15 0.54 0.14 experience (0.47) (0.12) (0.48) (0.13) (0.51) (0.13) -3.47 -0.15 -3.19 -0.14 -3.35 -0.14 Teacher job satisfaction (2.41) (0.10) (2.47) (0.11) (2.36) (0.10) Time assigned to science -8.64 -0.30 -4.98 -0.18 -7.03 -0.23 homework (15 minutes or (6.37) (0.22) (6.32) (0.23) (5.91) (0.19) less)b Poor teacher timekeeping (not a problem) -0.02 0.00 2.15 0.08 0.82 0.03 minor problem (8.55) (0.30) (8.32) (0.30) (7.67) (0.26) moderate or serious 8.14 0.28 13.22 0.48 12.78 0.42 problem (11.38) (0.39) (11.46) (0.41) (10.64) (0.36) Teacher absenteeism (not a problem) -3.51 -0.12 1.19 0.04 2.17 0.07 minor problem (7.01) (0.24) (6.79) (0.25) (6.24) (0.21) moderate or serious -21.43 -0.75 -17.19 -0.62 -16.62 -0.55 problem (8.61)* (0.29)* (8.26)* (0.29)* (7.64)* (0.25)* 55 Class/school-level variables (reference B (SE) β (SE) B (SE) β (SE) B (SE) β (SE) B (SE) β (SE) B (SE) β (SE) B (SE) β (SE) B (SE) β (SE) category) Teacher highest level of education (Up to ISCED -5.26 -0.18 -3.91 -0.14 -8.79 -0.29 level 6 — bachelor’s or (12.99) (0.45) (13.17) (0.48) (12.29) (0.40) equivalent level)c Teacher major area of study (science and science education) science but not science -10.46 -0.37 -11.49 -0.42 -11.88 -0.40 education (6.76) (0.24) (6.62) (0.24) (6.43) (0.22) 12.21 0.43 22.96 0.83 20.62 0.69 all other majors (11.37) (0.39) (10.73)* (0.37)* (9.38)* (0.31)* Professional development hours on science (more than 35 hours) 0.68 0.02 -3.53 -0.13 -3.66 -0.12 16–35 hours (7.57) (0.26) (7.63) (0.28) (7.06) (0.24) 3.92 0.14 -0.71 -0.03 -4.61 -0.15 6–15 hours (6.53) (0.22) (6.72) (0.24) (6.57) (0.22) 0.83 0.03 -0.04 0.00 3.27 0.11 less than 6 hours (8.63) (0.30) (8.80) (0.32) (8.49) (0.28) 0.73 0.03 3.65 0.13 6.03 0.20 none (14.42) (0.50) (15.01) (0.54) (14.52) (0.48) Professional development 7.01 0.25 8.62 0.32 11.07 0.37 on science content (no) (7.01) (0.25) (7.78) (0.28) (7.34) (0.24) Professional development -3.52 -0.12 -0.02 0.00 0.27 0.01 on science pedagogy (no) (8.32) (0.29) (8.96) (0.33) (7.98) (0.27) Principal years of -0.09 -0.02 -0.19 -0.05 experience (0.35) (0.09) (0.35) (0.08) Principal highest level of education (ISCED levels 7 4.73 0.17 6.23 0.21 & 8 — master’s or (13.27) (0.48) (13.36) (0.45) doctorate degree)d Principal qualification in 3.02 0.11 -0.05 0.00 educational leadership (6.33) (0.23) (6.46) (0.22) (no) 6.66 0.24 9.22 0.31 School library (no) (6.20) (0.23) (5.81) (0.20) 56 Interaction terms B (SE) β (SE) B (SE) β (SE) B (SE) β (SE) B (SE) β (SE) B (SE) β (SE) B (SE) β (SE) B (SE) β (SE) (reference category) Student gender*Student -4.88 -0.35 confident in science (1.91)* (0.13)* Student gender*Teacher 32.15 0.40 age (40 years or older) — (10.20)** (0.12)** 30-39 years Loglikelihood -14301.28 -13016.61 -32968.21 -29618.39 -27632.53 -25284.64 -13004.99 (H0) Fit 28684.55 26123.23 statistics AIC 65944.43 59262.78 55303.05 50617.28 26103.99 BIC 65971.01 59262.78 55426.29 50770.87 28923.63 26381.44 26373.68 Note: Null model: Intercept (SE): 431.76 (3.69), H0 = -33003.98, AIC = 66013.96, BIC = 66033.89. *p < .05; **p < .01; ***p < .001. Higher scores indicate less frequent bullying; a b Other category: 16 minutes or more; cOther category: ISCED levels 7 & 8 — master’s or doctorate degree; dOther category: ISCED Level 6 — bachelor’s or equivalent level. 57