WHAT CAN BE LEARNED FROM BENCHMARKING EDUCATION ACROSS ECONOMIES?

v

There is a large debate over what inputs are needed to achieve better education outcomes. A large part of the debate arises from the difficulty in obtaining measures that adequately capture the quality of educational inputs. This has led to many governments and donors focusing primarily on observable measures of educational financing and expansion in years of schooling, with little attention to whether these inputs are actually leading to expansion in skills.
An increasing number of studies are finding that students in developing economies have learning outcomes that are well below standards, despite the implementation of a host of educational interventions, greater financing, and expansion in the years of education (e.g., Pritchett 2001, Glewwe et al. 2012, Hanushek and Woessmann 2016, World Bank 2018. While trillions of dollars are spent on education each year by governments and households, investing in inputs that are only based on loose notions of what matters, rather than solid evidence, could be a critical failure in ensuring that education investments are effective. 1 The goal of this paper is to provide new evidence on the types of inputs that matter to improving educational outcomes by documenting educational inputs across economies, using a scoring system that attempts to capture quality dimensions. This allows us to identify critical inputs that have a significant relationship with skill outcomes and focus on a smaller set of inputs that serve as valid benchmarks over which to compare and assess education systems. As a result, we developed, collected and analyzed 171 indicators of basic and national-level education inputs in 69 economies globally. 2 The development of these indicators was based on an in-depth review of the literature that has found rigorous evidence of inputs that lead to improved learning outcomes. These indicators attempted to capture aspects of input quality that could be a major source of omitted indicator bias in prior studies. Using learning outcomes measured by an economy's average test score on the Programme for International Student Assessment (PISA) or Trends in International Mathematics and Science Study (TIMSS) test at age 15, we find that only six indicators matter to improved learning outcomes. 3 In particular, gross enrollment in secondary school, targeted public information that reveals student gaps, and strategic budgeting that provides programs for at-risk students are found to be the top three most important factors. Teacher quality that ensures wages are high and incentives are aligned with learning outcomes; information collection that enables timely, data-driven decision-making; and curriculum content that is matched to student skills are also significant in the majority of our analysis. In contrast, there is relatively weak evidence that economies that systematically invest more in education as a share of gross domestic product (GDP) or have better student-teacher ratios have better educational outcomes. Our results show that some of the standard measures used to compare and contrast education systems are less meaningful for comparing economies with the aim of improving learning outcomes. 1 It was estimated that governments and households in Asia alone spent $1.2 trillion and $690 billion, respectively ($ purchasing power parity [PPP]) in 2010 (ADB 2015). 2 We collected indicators for an additional 11 Asian economies that did not take the PISA or TIMSS to have a more complete representation of economies. As shown in Table 1, our questionnaire comprises a total of 368 indicators. However, for the main analysis of this paper, we focus on the 229 indicators for which we have a complete set of data for the 69 economies with PISA or TIMSS scores, out of which 171 are focused on primary and secondary education.
Our benchmarking exercise shows that many developing economies systematically face gaps in the quality of their inputs, particularly on having systems for managing teachers effectively, providing career guidance, and having modernized curricula. This does not necessarily imply that economies need to obtain greater financing for education, but that there should be attention to strategically refocusing financial investments to areas that are critical to achieving enhanced skill outcomes. Moreover, the indicators reflect that underlying institutions are critical. Without quality institutions that make decisions based on evidence, it may be difficult to use finances efficiently to improve skill outcomes.
Our indicators provide an important methodological way forward for undertaking future benchmarking by complementing the extensive literature that relates educational inputs to educational outcomes. 4 First, we develop new benchmarking indicators that codify policies for a large set of developed and developing economies. This includes concrete objective measures that are typically covered in existing benchmarking exercises, but also measures that capture the quality of policy implementation and are likely a major source of omitted indicator bias in prior studies. Second, the significance of the relationship of these new indicators with educational outcomes are examined independent of an economy's level of development and existing human capital. Third, by identifying the subset of indicators that are significantly related to improved educational outcomes across economies based on empirical analysis, it is possible to document critical gaps in educational investments across economies. Fourth, the indicators provide evidence on important benchmarking indicators that are valid for comparison purposes and useful to continue to invest in collecting across economies.
The remainder of the paper is organized as follows. Section II provides a brief overview of existing benchmarking studies in terms of indicators and their limitations. Section III details the method for developing the questionnaire that is used as the basis for collecting and codifying policies and investments. Section IV provides an overview of how we documented the indicators for the different economies and the construction of the composite indexes, while section V describes the data sources for the key outcome measures and other control variables. Section VI describes the empirical approach that correlates various indicators with educational outcomes. Section VII introduces the results from the empirical analysis and summarizes where different economies stand relative to the top performing economies over the key indicators that were found to be important. Finally, section VIII concludes with some policy implications and directions for future research.

II. EXISTING BENCHMARKING EXERCISES
Benchmarking-that is, creating comparable measures of educational inputs and outcomes across economies, schools, and classrooms to identify investment gaps and evaluate performance-is pervasive in the education sector. Benchmarking exercises can range from qualitative to quantitative assessments. However, there are only a small set of benchmarking exercises that cover a large set of developed and developing economies.
The Center on International Education Benchmarking takes a qualitative approach to benchmarking using education expert reviews from discussions with stakeholders within the most developed countries. These reviews are used to identify key areas that are important to focus on in education systems. The Center on International Education Benchmarking emphasizes institutions and targets higher amounts of funding to vulnerable populations to close gaps in access.
Benchmarking exercises based on outcome measures provide a comparison of how economies compare in terms of skill development or educational attainment, but do not always provide a clear direction for the types of policies and reforms that can effectively improve these outcomes. For example, Pearson's Learning Curve provides a simple comparison of economies using the average performance on the PISA or TIMSS tests and level of educational attainment to provide a ranking of 40 economies. Their index finds that the Republic of Korea and Japan were ranked first and second, while Mexico and Indonesia were ranked 39th and 40th in 2014.
Benchmarking exercises that use input measures have varied from comparable and easy to collect measures of inputs across economies to a more systematic analysis of education systems that includes laws and functioning of institutions. Indicators of Education Systems, produced by the Organisation for Economic Co-operation and Development (OECD), primarily falls into the latter group, collecting objective measures of inputs and educational outcomes for a large set of developed countries and a few developing economies. However, the measures are not collected on institutional features of education systems or try to relate the input measures to outcomes, limiting the conclusions that can be drawn.
The World Bank's Systems Approach for Better Education Results (SABER) in contrast is a highly comprehensive documentation that was designed to identify constraints and challenges faced in improving education systems in developing economies. 5 SABER developed ordinal measures that allows for codifying the quality of the education system in terms of institutions, legislation, and implementation. The extensive in-depth reviews required to document the education system is both time intensive and costly, resulting in a more limited set of economies that can be compared to-date for different education policies. While extensive, the measures are less likely to be comparable across economies due to the usage of different experts undertaking economy assessments.
Of the existing benchmarking exercises, GEMS Education Solutions Efficiency Index takes an approach similar to the one undertaken in this paper by relating education inputs to learning outcomes. However, their approach is focused primarily on using inputs that are readily available, such as student-teacher ratios, financing, and teacher wages. The input indicators reviewed provide no clear theory for why they are evaluated, and the modeling specification appears to omit key variables such as an economy's level of income. This is likely to be a key source of omitted variable bias driving the finding that teacher salaries and student-teacher ratios are the two most important inputs for improving efficiency in learning outcomes.
In sum, many of the prior benchmarking exercises are limited in their ability to pinpoint policies of what works and to identify measures that matter across economies. This makes it difficult to prioritize investments and turn indicators from benchmarking into actionable evidence-based strategic priorities that improve learning outcomes. This could imply that extensive time, effort, and money are spent to collect, gather, and compare measures across economies that have little importance for improving educational outcomes. Given that the existing rigorous causal evidence relating educational inputs to outcomes rarely finds consistent results across economies or within regions, it is important to validate if these benchmarking indicators are relevant for evaluating and assessing important gaps in 5 World Bank. Systems Approach for Better Education Results (SABER). http://saber.worldbank.org/index.cfm. educational investments. Our approach to benchmarking seeks to rectify these limitations and narrow down the set of indicators that should be evaluated and prioritized as part of educational policy.

III. QUESTIONNAIRE STRUCTURE AND DESIGN
The questionnaire was undertaken during 2014-2016 for the full set of economies that have taken the PISA or TIMSS test since 2003. We started with a total of 369 indicators capturing all education systems from early childhood education to tertiary education. The aim was to systematically document features of education systems across economies. In total, basic and secondary education system inputs were documented completely for 171 indicators for 69 economies globally. Appendix Table A.1 lists the 69 economies and provides information on their development status and educational attainment.
Education system indicators for TVET and higher education were also documented for 25 Asian economies covering over 95% of the population in the region. However, it is worth noting that the dearth of rigorous evaluations for TVET and higher education make it more difficult to formulate an extensive set of policies and lessons to include as part of the questionnaire and are considered a key area for further research.
The full questionnaire is in an online Appendix. 6 Table 1 shows the structure of the questionnaire and lists the broad indicator areas and components. A team of three research analysts reviewed the United Nations Educational, Scientific and Cultural Organization (UNESCO), World Bank SABER, UNESCO-UNEVOC International Center for Technical and Vocational Education and Training, OECD, and Ministry of Education documents to fill in the questionnaires. If information was difficult to obtain, or if there were conflicting reports by different sources, attempts were made to consult economy experts. The most recent source documents and databases were used for the documentation process (i.e., reports within the last 5 years). While this approach could limit the accuracy of the data, the advantage is that it provides greater consistency and evaluation in scoring across economies.
The documentation was complemented with more standard objective indicators available from the World Development Indicators and the UNESCO Institute of Statistics that include the amount of public education expenditures as a share of GDP, teacher certification rates, enrollment rates in different levels of education by gender, and student-teacher ratios.
The questionnaire was designed based on an extensive review of the rigorous literature on the causal relationship between different policies, institutions, and inputs based on peer-reviewed journal articles and their effects on educational access and skill outcomes. The literature review primarily focused on randomized controlled trials and studies that exploit natural experiments or regression discontinuities (e.g., Ganimian and Murnane 2016, Glewwe and Muralidharan 2015, Kremer and Holla 2009, Glewwe et al. 2012, Hanushek and Woessmann 2011. These studies have clear identification strategies for interpreting the findings as causal and are less likely to suffer from omitted variable bias and underlying modeling assumptions. In addition, qualitative evidence on aspects that are consistently cited by those pushing for educational financing and reforms were also reviewed. This literature was reviewed for technical vocational education and training (TVET) and higher education, where the rigorous evidence is sparser. The literature was used to identify the characteristics of policies, projects, and investments that were found to be important and effective in the implementation of education reforms.
This information was used in designing a questionnaire that attempts to capture dimensions of quality educational inputs that could be critical to improved educational outcomes. The survey questionnaire was designed to systematically encode an economy's level of development using closeended questions. They were assigned a score to different levels of quality, similar to the World Bank's SABER diagnostic tool with higher values being associated with better quality or breadth of implementation. The questionnaire is designed to cover three major policy areas: governance and accountability that promotes financial efficiency, educational quality, and educational access. These policy areas cover national policy legislation and investments that span the different education levels of basic and upper secondary, TVET, and higher education.
The literature revealed a large set of inputs that could be essential to improving skill outcomes both across economies and within economies. However, heterogeneities in intervention design, implementation, and targeted population showed that it was rare that inputs were effective in every context. In the remaining discussion, we briefly summarize the key findings from the three major policy areas before introducing the broad input indicators captured by our benchmarking exercise.

A.
Governance and Accountability Governance and accountability are the foundation to achieving financial efficiency and ensuring that inputs translate into outcomes. In a variety of contexts, they are found to be important to improve learning outcomes (Glewwe et al. 2012;Ooghe and Schokkaert 2016;Woo, Lee, and Kim 2015). Good governance entails making evidence-based policy decisions. This relies on gathering credible and relevant information that can inform important policy making and budgetary decisions to ensure that the finances flow to the most worthwhile investments. For example, using public finances for school inputs that can more easily be financed by families such as books, uniforms, and other basic materials may have more limited effectiveness than expenditures on physical infrastructure, curriculum, and teachers that benefit a larger population (Rockoff and Turner 2010). This is because public spending on small inputs are likely to offset private household spending leading to more limited effects of public financing (Das et al. 2013).
Good governance requires that administrators, schools, and teachers are accountable (Bishop 2006;Fuchs and Woessmann 2007;Woessmann 2011Woessmann , 2003. However, to implement accountability a necessary condition is the availability of credible and relevant information on student attendance and performance on standardized tests. This information is the basis for managing day-to-day operations of education systems that align schools and teachers' incentives with learning outcomes. In the absence of accountability, teachers may be absent from school, thus adversely affecting student outcomes (Chaudhury et al. 2006, Luschei 2012.
Accountability can take a variety of forms. Governments that have flexibility and foresight to implement legislation can impose accountability by defining performance standards, measuring them, and allocating budgets based on these standards (Hanushek and Woessmann 2011). However, many developing economies face inflexible education systems that can prevent them from implementing performance-based legislation. In this case, school competition or significant private educational expenses (e.g., out-of-school tutoring) can improve accountability when combined with clear and concise information on school quality that allows parents and society to redirect their educational investments to better educational providers (Hoxby 2003, Rouse and Barrow 2009, Ganimian and Murnane 2016, Ehren et al. 2015. Autonomy has the potential to complement accountability and improve the effectiveness of educational investments. In the absence of accountability, however, autonomy can result in worse learning outcomes (Hanushek, Link, and Woessmann 2013). Private sector involvement or decentralization of process decisions are several ways to create greater autonomy in education (Hanushek and Woessmann 2011). Autonomy allows school managers with better information on localized conditions to have flexibility to best identify the most cost-effective and efficient ways to achieve targeted outcomes. Autonomy gives freedom to school managers to import critical human resource management practices into schools. Ultimately the main defining characteristic of schools and institutions that deliver better learning outcomes across economies are high quality management practices (Bloom et al. 2015).

B.
Educational Quality Educational quality requires having the right curriculum and improved instruction that leverages motivated and skilled teachers. It involves pedagogical innovations that target curricula to a student's capabilities, such as technology-assisted instruction, remedial education, or tracking that splits students into different ability levels (Kremer and Holla 2009). All of these aspects leverage resources that are able to change the school environment and how children experience learning (Ganimian and Murnane 2016). However, the timing of tracking matters, as even the best tracking systems have shown that early tracking can lead to larger increases in the inequality of schooling outcomes (Hanushek and Woessmann 2006).
Instructional delivery traditionally involves leveraging motivated and skilled teachers to enter the teaching profession and using the right policies to retain them. The quality of the teaching force is found to account for a large amount of international differences in the level and equity of student achievement Woessmann 2011, Rothstein 2010). Nevertheless, standard observable characteristics such as teacher training, certifications, and number of degrees are poor measures of teacher quality and have limited power to explain student learning outcomes (Woessmann 2011, Metzler andWoessmann 2012). It points to the difficulty of ensuring that investments in simple training and capacity building lead to better skills. A number of studies have found that for more quantitative subjects, such as math, the subject-specific skills of the teacher have a significant impact on student learning outcomes (Metzler and Woessmann 2012, Buddin and Zamarro 2009, Boyd et al. 2009). Moreover, industry experience for professional and vocational training degrees may be highly important to successful student outcomes. In the People's Republic of China, only teachers with industry experience were found to have a positive impact on student skills (Loyalka et al. 2013).
The absence of key skills does not mean that investments in training will be effective. Several studies have shown that in-service teacher training and feedback on more engaging instruction methods may do little to improve learning outcomes, even when teachers are lacking in relevant skills (Glewwe et al. 2012, Loyalka et al. 2013. The critical missing component could be quality human resource management policies in education that properly incentivize and motivate skilled individuals to join and remain in the teaching profession that are found to be critical to learning outcomes across economies (Bloom et al. 2015).

C. Educational Access
Educational access is important to improving the equity of outcomes and ensuring better equality of opportunity. Sufficient school infrastructure and teachers to support the school-age population are a minimum requirement. Economies with a large population living in remote areas often resort to boarding schools to control costs of providing quality educational access. The Millennium Development Goals noted that basic access is increasingly less of an issue, with enrollment in primary and secondary school nearly universal, and the target of 97% enrollment rate reached across most developed and developing economies with the exception of Sub-Saharan Africa (United Nations 2015).
With universal access nearly achieved, understanding how inputs such as school quality, credit constraints, and behavioral factors interact to affect student enrollment, attendance, and retention becomes increasingly important to improving learning outcomes.
Demand-side interventions such as conditional cash transfers that reduce the uncertainty families face in evaluating the returns to school enrollment are a popular tool used by developing economies to increase time in school. However, conditional cash transfers can be costly and often assume that the benefits of additional schooling are worth the costs, with little attention to gains in learning outcomes. As students get older, the trade-off between school and work means that higher payments are typically needed to induce students to remain in school. As such, these programs tend to be less effective for older ages. Thus, greater benefits could be derived by providing financial assistance through low-risk loans and merit-based scholarships that have transparent selection criteria intended to help identify and support high-achieving, financially needy candidates to obtain access to higher levels of education (Kremer and Holla 2009).
Finally, publicly sharing information may be a simple and cost-effective way to provide access to higher or more relevant education. Less well-off families and students may not invest in further schooling simply because they are misinformed about the returns to education and perceive returns to Taking Education to the Next Level | 9 be lower than reality. As a result, providing information on the returns to an additional year of education or types of training related to various occupations are shown to significantly increase student investments in schooling in contexts where expectations on education returns are low compared to reality (e.g., Jensen 2010, Hicks et al. 2013).

IV. CONSTRUCTION OF COMPOSITE EDUCATION INDEXES
To reduce concerns with multicollinearity between input indicators and ensure that we have identification in simple regressions, we combine our input indicators into composite indexes. These composite indexes intend to capture key inputs that are typically implemented as a package of investments or reforms. For example, information collection is often a necessary condition for implementing evidence-based budgeting.
To construct the composite indexes, we first identified inputs that were available for a majority of the 69 economies with PISA or TIMSS scores (229 indicators) and transformed them to values that range from 0 to 1. Next, we checked variables that had a nonmissing correlation with PISA or TIMSS scores and were focused on primary and secondary education. This reduced the number of indicators to 171. Next, we performed common factor analysis. Common factor analysis is used to prioritize indicators to include in each of the composite indexes by exploiting comovements in the indicators across economies. The factor analysis found that most of the variation in the indicators could be explained by 28 factors, based on these factors having an eigenvalue greater than 1.
We constructed two separate sets of indicators-"set 1" and "set 2"-to better understand how variations in the underlying assumptions might affect results. The details of the different sets of indicators considered are provided in Table 2.  Taking Education to the Next Level | 11 Set 1 is intended as a higher-level overview of different educational policies. We grouped indicators together based on factor loads above 0.5 and included all additional indicators that apply to a grouping to enable each composite index to have a more homogeneous representation (e.g., all variables related to "information" are included in the composite index, independent of whether they are significant in the factor analysis). Equal weights were applied to each of the indicators rather than assume the factor load weights were the correct representation. This procedure resulted in 10 distinct indexes covering the 171 indicators. 7 Set 2, on the other hand, was automatically derived from the factor analysis and provides a greater degree of specificity in the indexes. Indicators with factor loads above 0.5 were again the starting basis for grouping indicators into composite indexes. Given the indicator meets the factor load criteria and has not already been included in prior composite index, we included the indicator into the index. Each indicator included in the composite similarly is assigned an equal weight. This procedure resulted in 27 distinct indexes comprised of 138 indicators. The subset of indicators is fewer than set 1, as a number of indicators had no factor load that was above 0.5 for any of the identified factors. 8 The average values of the composite indexes for set 1 and set 2 are presented in Table 3. The results show that the average index is greater than 0.5 for most indicators, with the exception of information targeting to the disadvantaged, strategic budgeting, and information and communication technology. There are also clear differences across regions, with high-income OECD economies having far higher measures of quality inputs compared to developing regions. In set 2, the indexes show that investments in curriculum, public information, accountability, and programs that increase education access by monitoring at-risk students are lower in developing compared to developing economies.

V. LEARNING OUTCOMES AND OTHER CONTROL VARIABLES
Test scores of students that took the PISA and TIMSS test from 2003 to 2012 are the primary outcome of interest in this study. Since a number of the economies in the sample took the PISA or TIMSS test only once during this period, we effectively have only a single cross-section of economies. In the case that an economy took a test more than once, or took both the PISA and TIMSS, we opted to use the most recent test results. While the PISA and the 8th grade TIMSS tests have slightly different focuses, performance between economies that took both tests in 2003 were shown to have very high levels of correlation, with a 0.87 on the math test and 0.97 on the science portion (Hanushek and Woessmann 2012). Appendix Table A.2 provides detailed information and a comparison of the two major international student assessments. This was used to construct the average sciencemathematics test score in the economy and the average science-mathematics test scores of those in the bottom 20% of the socioeconomic status in the economy based on an index asset value of students provided in the 2009 and 2012 PISA database. These outcome measures are the focus, under the presumption that it is cognitive skills or measures of student learning, rather than years of schooling, that should be the key outcome and measure by which to assess an economy's performance. 7 Thirty-eight indicators of the 229 indicators are not completely available for all 69 economies. When only partially available, the composite index is adjusted based on the number of nonmissing indicators. 8 Because the appropriate threshold for factor loadings often rests on various assumptions (or a rule of thumb), we performed a robustness check by reducing the threshold of the factor load to 0.3 and weighting the observations by the factor load weight.
We also complemented the data with economy-level information on the average years of schooling of the population age 25-65 in 2010 from the Barro and Lee (2013) database and per capita GDP based on purchasing power parity (PPP) in 2010 using the Penn World Table 8.1. These are viewed as key variables to control for and could explain both differences in the quality of inputs and skills across economies.
The average values of education outcomes are reported in Table 3. The average score on the PISA or TIMSS test based on a scale of 1,000 is about 50 to 100 points lower in middle-and lowincome economies compared to the OECD economies. Latin American economies have some of the lowest scores, conditional on taking the test, with average scores below 400. The average years of schooling of the 15-19-year-old population in 2010 is about 9.3 years in high-income OECD countries compared to around 8.3 years in developing Asian economies.

VI. RELATING EDUCATION SYSTEMS TO EDUCATION OUTCOMES
The objective is to identify education inputs that are important to enhanced learning outcomes with Y representing a measure of average test scores in economy i. We start with a simple education production function approach that assumes skills produced in an economy are a function of public inputs and private inputs. Specifically, a regression model of the following form would relate key inputs and factors to educational outcomes for each economy i in region r: In this model, β is the coefficient of interest, representing the estimated correlation of a vector of education input indicators, I, that includes time spent in school, teacher quality, curriculum, and institutional structure with the outcome of interest. ɛ is the unobserved error. In addition, these models control for an economy's income or level of development, as proxied by log GDP ($ 2010 PPP), G, and current human capital characteristics, H, captured by the average level of the population aged 25-65. Broad regional fixed effects ρ r , are intended to capture cultural and economic differences, which vary across the different regions. These control measures are believed to be the most important factors affecting skill outcomes that should be controlled for, as economies have more difficulties in altering these factors in the short run, compared to the input indicators that are considered in the analysis.
The small sample of economies observed for a single cross-section of time poses an econometric challenge in identifying the separate effects of the different input indicators in a single regression. Small sample size, multicollinearity between input indicators, and omitted variable bias are all concerns in estimation. To balance these concerns, we undertook an approach that ran regressions that included at most two indicators. This balances the need to better identify inputs that are key drivers of outcomes with the constraints of the sample size. It results in the following specification for each economy i, in region r, with input indicator I m and I n , where {m, n} in set M with m≠n such that: Equation (2) represents the modification of equation (1) where the β's are the key coefficients of interest, G represents log GDP, H represents human capital characteristics, and ρ r are regional fixed effects. Equation (3) represents the average of the indicators across the different regressions, where each regression is weighted equally and divided by N, the total number of indicators evaluated in set M, minus 1. We also documented the number of times that the estimates were statistically significant at the 10% level across the pairwise regressions, providing an indication of the importance and significance of the indicator in influencing skill outcomes.
The value of β is difficult to interpret due to differences in the mean and variances of the input indicators. As a result, we use a simple simulation to evaluate the effect of moving the indicator from the average of the five lowest-tier economies to that of the five highest-tier economies in the sample. That is, the expected change in outcome ∆Y i from raising an indicator for input I m from the level of the five bottom-tier economies, I m b5 , to the five top-tier economies, I m t5 , is expressed as: The inputs I and are the coefficients and variables defined in equation (3). Table 4 provides results from the pairwise regressions, along with the simulated gains in moving the input indicator from the average of the bottom five to the average of the top five economies. It finds that enrollment ratios in secondary education are significantly associated with better skill outcomes across economies. The analysis also found that career guidance, information, and programs that focus on at-risk students, strategic budgeting that develops financing priorities based on evidence and monitoring and evaluation, curriculum content that is well matched to student capabilities and emphasizes cognitive and noncognitive skills rather than rote memorization, and collection of information to target the disadvantaged and monitor accountability are the features of an education system which appeared to have the largest effect on the average skill outcomes. For example, by increasing the input of the bottom five economies to the input level of the top five economies in career guidance programs, the simulations for set 1 imply that students' test scores will improve by 60 points or about 13% over the mean scores. Improving set 1 inputs in strategic budgeting, teacher quality, or information collection up to the level of the top five countries in our data is expected to raise scores between 39-44 points. Very few other items were significant in the regressions. In particular, public educational expenditures as a share of GDP did not have any significant relationship with improved skill outcomes.  The results are somewhat similar when we use 'set 2,' which comprises a more disaggregated set of indexes. We find that strategic auditing and budget and information collection are highly significant. We also find that public funding for private education, increased gender parity and support for secondary TVET courses focused on business are also commonly a significant factor in higher education performance.

A. Educational Inputs Associated with Skill Outcomes
The results across the two sets of indicators show that establishing institutions and mechanisms that improve education financing and collecting information for strategic budgeting are generally important to enhancing skill outcomes. Similarly, adopting a performance-based budgeting process (i.e., setting clear standards, goals and performance indicators, alongside a government institution that guides the education sector budget) is relevant to higher learning outcomes. Monitoring and working on student's transition rates is also associated with better skills. Notably, we find no significant association of public education expenditure as a share of the economy's GDP with learning outcomes.
In general, the results convey that the most important items related to skill outcomes in an economy are related to information collection, sharing and strategic action based on this information. While there could be concern that the results of the regressions are driven by low coefficients of variation, all indexes examined had values over 20, suggesting that this was not a concern.
The education inputs that matter for the average student in the economy are also highly important for those from lower socioeconomic groups, with information and teacher quality having a slightly more significant relationship with average outcomes of these vulnerable populations.

B. Complementary Educational Inputs
There are some inputs that could complement each other and provide significant improvements in learning outcomes, even while they have little individual effect. To investigate the possibility that different inputs have a complementary effect, we modified regressions observed in equation (2) to allow for two input indicators I m and I n where m≠n: In this equation we are largely interested in cases where = + > 0 and = > 0 .
Estimating the interactions between various inputs and outcomes, we found few cases where there were positive and significant interaction effects as reflected by the coefficient . One of the few exceptions was the interaction between the quality of information collected by education institutions and vouchers, PPPs, and financial aid. The interaction effect between greater public expenditures and various inputs did not come out as significant in any of the regressions. These findings provide no evidence that increasing educational financing can consistently deliver better skills. What generally appears to be critical is the implementation that emphasizes collection of information that can inform decision-making. Without the institutions that invest in credible, timely, and detailed information gathering, economies could find it difficult to deliver specific programs that result in improved learning outcomes.

C. Where Do Countries Stand in Terms of Development on Key Benchmarking Indicators?
The key indicators that come up as significant are potentially meaningful for assessing and comparing the quality of education institutions across countries. Figure 1 graphs the top four indicators outside of the gross secondary enrollment ratios for a selected set of countries benchmarked relative to the top five countries globally. It shows that a few developing countries have large gaps in investments for career guidance and providing program assistance to at-risk students. We have similar findings for teacher quality, where there is significant variation among different countries in the sample. Japan and Canada having teacher quality indicators that are close to the top countries, while India, the People's Republic of China, and Argentina have indicators on the lower end. In contrast, the gaps between the curriculum content and strategic budgeting are less varied and only a few of the selected countries have curricula that are far below those of the top countries. In Figure 2, we examine similar measures for a set of developing Asian countries. These indicators show that the gaps in investments are significant. While countries such as the People's Republic of China are ranked low on key input indicators on a global scale, it tends to fall in the middle to top end in the Asian region and ranks highly on curriculum content.
Taking Education to the Next Level | 21

D. Some Evidence on Gaps in Educational Investments in Technical Vocational Education and Training and Higher Education
As universal primary enrollment has been achieved across most developed and developing economies, the focus has started to turn toward investments and inputs in TVET and higher education. These systems are often more complex in at least two dimensions: first, they require greater coordination efforts to monitor and review different subject or technical areas; and second, they have significantly higher rates of private providers. All these factors can make it a challenge to ensure improved learning outcomes. Moreover, the lack of consistent measures of outcomes that can be evaluated across economies makes it even more difficult to assess and measure performance compared to basic education. Anecdotally, these education systems face large challenges in delivering on quality.
Identifying performance measures and implementing greater standardization of these systems could be critical to ensuring that investments in these areas are effective in the longer term. The OECD's Programme for the International Assessment of Adult Competencies is an initial step in the right direction. This test attempts to measure the performance of TVET and higher education systems. However, the costs of administering the test means that small samples make it difficult to directly link performance to more specific TVET and higher educational programs. In the absence of clear outcome data that is comparable across economies, we draw on some of the key findings from basic education to draw conjectures about the TVET and higher education systems. In particular, the finding that information was of critical importance to improving the effectiveness of inputs suggests that this could be also highly important for TVET and higher education. However, Figure 3 shows that the investment gaps in information collection relative to the best economies are far larger in both higher education and TVET compared to basic education and may reflect the challenges these systems are facing in delivering better quality learning outcomes.

VIII. CONCLUSIONS AND POLICY IMPLICATIONS
The results provide evidence that the key inputs that are consistently associated with higher learning outcomes across economies are ensuring that students stay in school, timely data collection, and curriculum content that fosters cognitive and noncognitive skills. These inputs are viewed as the foundation for a quality education system that can raise the level of student skills and are important for economies to continue to track and benchmark over time. We also show that more standard objective measures of investments, such as amount of education expenditure and student-teacher ratios, which are often used to benchmark and evaluate education systems, are less meaningful in determining skill outcomes.
This study is viewed as providing an important step in revising existing approaches to benchmarking to develop a set of education input indicators that measure quality and have a clearer linkage to key learning outcomes. This is believed to be an important step in improving the prioritization process of education investments. Nevertheless, there is room for future research to improve upon the current set of approaches. Continued investment in rigorous evaluations and experiments in education can help increase our understanding of the implementation of educational programs that generate success. These results could be used to review and extend educational input indicators to better measure the quality of implementation, especially in cases where inputs were not found to have a significant effect. There is also value in developing a more systematic and sustainable way to collect and evaluate economy performance in the event that resource constraints limit doing a full diagnostic. For example, while our study primarily was a desk study, investments could be made to automate the information extraction process from the Internet and reduce concerns that differences in measurements are driven by differences in human evaluation. Finally, creating well-defined measures of quality inputs and skill outcomes for TVET and higher education is important to help define and set strategic priorities for these investments. These are all areas that are seen as important for future research to drive more strategic and efficient investments in education.