Preparing a Road Map on the Use of Administrative Data for Compiling Employment Statistics (ADB Brief 179)

People increasingly recognize that having high-quality data as inputs to guide governments and policy makers is important to ensure that program implementation can yield highly beneficial outputs, outcomes, and impact for societies. Data-driven policy-making and monitoring also helps invigorate national and international developmental efforts. However, the budget constraints experienced by many national statistical offices (NSOs) in recent years have made it even more challenging to meet the increasing demand to produce high-quality data. Consequently, NSOs need to be more resourceful in producing such outputs, including maximizing the use of all available data.1


INTRODUCTION
People increasingly recognize that having high-quality data as inputs to guide governments and policy makers is important to ensure that program implementation can yield highly beneficial outputs, outcomes, and impact for societies. Data-driven policy-making and monitoring also helps invigorate national and international developmental efforts. However, the budget constraints experienced by many national statistical offices (NSOs) in recent years have made it even more challenging to meet the increasing demand to produce high-quality data. Consequently, NSOs need to be more resourceful in producing such outputs, including maximizing the use of all available data. 1 The importance of high-quality data was further magnified with the global adoption of the Millennium Development Agenda in 2000, and the adoption in 2015 of the 2030 Agenda for Sustainable Development, which has four times as many indicators as its predecessor. Progress on the Millennium Development Goals (MDGs) and the SDGs have been monitored using corresponding indicators for each goal and target. It is imperative that the data used in these indicators are of good quality as they can help shape essential policies.
Countries gradually moved from setting their development priorities independently to integrating their national development strategies with the global development agenda. National statistical systems were prompted to strengthen their capacity and methodologies and ensure that they adhere to international statistical standards for compiling development data. This further cemented the important role of data in socioeconomic development and underlined development challenges such as lack of reliable and internationally comparable data that can undermine governments' ability to set goals, optimize investments decisions, and measure progress.
Preparing a Road Map on the Use of Administrative Data for Compiling Employment Statistics KEY POINTS • Integration of data from multiple sources can provide more nuanced and meaningful information to meet the enormous data requirements of the Sustainable Development Goals (SDGs) more efficiently.
• The need to improve the efficiency of the statistical production process prompts official statisticians to explore alternatives to traditional survey-based approach of collecting data. Administrative data sources, when used properly, can complement survey data by providing more timely and comprehensive information.
• Compilation of work and employment statistics is an area that can significantly benefit from integrating administrative data. In Asia and the Pacific, how administrative sources are used to produce workrelated statistics varies considerably-further magnifying the need for statistical capacity building in this area, especially the ability of statistical systems to bear, recover from, and respond and/or adapt to external disturbance.
As policy makers' need for data expands, there is a need to revolutionize data collection, processing, and usage by capitalizing on different data sources and technological developments. To do so, it is important to understand the many ways in which data are being collected, compiled, analyzed, and disseminated. The four common sources of data for development are censuses, sample surveys, big data, and administrative data.
Censuses. By conducting a complete enumeration of all units in the population, a census provides reliable baseline data on the structure and key characteristics of the target population against which changes through time can be measured. However, given their scope and range, censuses entail high costs and robust staffing. Furthermore, the large number of respondents and differences in understanding of concepts, definitions, and instructions predispose censuses to data collection errors. As with other data sources, changes in census methodologies can also make it challenging to compare data from one census to another.
Sample surveys. Sample surveys, on the other hand, collect data from a fraction of the population and aim to draw inferences about the entire population. They are a more cost-effective means of collecting data where a complete assessment of the whole population is not required. Because they are administered in a more controlled manner, sample surveys can include detailed inquiries with multiple questions on the characteristics of the target population that are of interest. They also minimize nonsampling errors. However, as much as sample surveys generate data in a more cost-effective manner, they also bear the trade-off for a good sampling design and the required sample size which usually result to sampling errors. The success of a sample survey depends on the percentage of response and the quality of the responses, including the respondents' ability to recall, their honesty, and their motivation to respond to the set of questions. As with censuses, comparability over time is also a challenge because estimates of key variables may require similar designs and methods that are highly unlikely to be perfectly replicated. Furthermore, there is a need for adequately trained staff to administer the survey with minimal deviation from the standards.
Big data. As modernization in information and communication technology (ICT) resulted in a data revolution, the world gets to experience an upsurge in data capture, production, storage, access, analysis, archive, and reanalysis more than ever. Though access to and storage of large volume of data for analytics have existed for quite a while, the concept of big data started to gain popularity in the early 2000s. Big data is the "information asset characterized by such a high volume, velocity and variety to require specific technology and analytical methods for its transformation into value." 2 In the context of the 2030 Sustainable Development Agenda that no one will be left behind, there is an apparent need to enhance current statistical data collection system if the national statistical offices are to use big data as a data source. Among others, the current hardware and software systems in place in the NSOs definitely require an upgrading. As big data comes in various formats, national statisticians also need to be reinforced in terms of their capacity to analyze unstructured, semi-structured, and structured data.
Administrative data. Several MDG and SDG indicators have used administrative data as a main or supplementary source of information. Administrative data are also widely used in setting up other data systems such as firm registration, health records, school enrollment, customs data, and tax records. The use of administrative data has several benefits. For instance, a complete count of units can be produced and disaggregated data from smaller areas of interests can be derived. In addition, using existing data incurs lower costs than designing a specialized data collection initiative that is solely intended to serve specific data needs. Indirectly, the use of administrative data to collect information that is generally sourced from surveys and censuses can also help reduce respondent burden and consequently minimize nonresponse bias. Within the international statistical community, there is also an increasing recognition of the importance of strengthening administrative data collection systems. For instance, Collaborative 3 is an initiative established to strengthen the use of administrative records for statistical purposes that aims to address the need for timely and disaggregated statistics to mitigate the impact of the pandemic. This partnership is supported by NSOs and regional and international agencies and co-convened by the United Nations (UN) Statistics Division and the Global Partnership for Sustainable Development Data.
In 2010, the Asian Development Bank published a handbook on the use of administrative data sources for compiling MDG and related indicators, focusing on education, health, and registration systems, and featuring practices and experiences from selected countries. This brief supplements the handbook by providing additional insights on important elements of developing a road map on how national statistical systems can capitalize on administrative data sources to monitor the SDGs, particularly indicators related to the labor market and employment. These types of development indicator make for an interesting case study because labor statistics derived from administrative data, such as employment registration or labor inspection records, can provide information to formulate and evaluate action plans on unemployment and work conditions, and to gauge the prevalence of the working poor.
Understanding how countries use administrative data to compile work-and employment-related indicators to design labor policies, including its associated limitations and challenges with or without crisis, can help address the increasing demand for information and insights. Such demand compels statistical agencies to look for alternative data sources. It is well-recognized that quickly modifying existing traditional data sources such as surveys by adding new questions to produce new sets of data requires enormous government resources while also adding to respondents' burden. Building new traditional data sources is even more costly. Exploring the use of administrative data promises many advantages, including cost efficiency, as merging these sets of data can produce rich new data sources. Furthermore, it can provide important lessons for existing users and others who have yet to capitalize on administrative data.

SUSTAINABLE DEVELOPMENT GOAL LABOR INDICATORS
Dealing with work, productive activities, and workers and their characteristics, labor statistics encompass and reflect the attributes of the labor market and its operations. The scope of labor statistics is vast and includes both the supply of and demand for labor. International standards must be followed to ensure the comparability of labor statistics. Similarly, the quality of these statistics-anchored in the methodology, including the strengths and limitations of the data source-must also be guaranteed. At the core of inclusive economic growth and development are labor markets providing decent and dignified opportunities for people within countries and across borders.
Labor statistics help decision makers engage with and better understand common labor market problems and devise and promote actions to address them. They also play a vital role in obtaining labor-related information, analyzing key trends, identifying potential strengths and weaknesses, formulating policies and programs, and evaluating labor-related efforts. Their ability to monitor labor at the macroeconomic level is key to achieving SDG 8 of the UN 2030 Agenda-decent work for all.
SDG 8 pursues the realization of decent work for all men and women; productive, high-quality employment; and inclusive labor markets. It cuts across other goals and is connected with many targets in the agenda. The UN has defined 12 targets and 17 indicators for SDG 8 covering a wide range of labor-related topics. Most SDG labor market indicators relate to SDG 8, but some refer to other goals, such as SDG 1 (End poverty), SDG 4 (Ensure quality education), SDG 5 (Achieve gender equality), SDG 10 (Reduce inequality), SDG 14 (Conserve marine resources), and SDG 16 (Promote justice and institutions).
The International Labour Organization (ILO) is either the sole custodian agency, one of the custodian agencies, or a partner agency for the 17 SDG labor market indicators. Exploring the use of administrative data promises countless advantages, including cost efficiency, as merging these sets of data can produce rich new data sources. There are potential merits in maximizing the use of administrative data for compiling work and employment statistics. When combined with survey data, administrative sources can provide more timely and granular information about labor indicators. In some instances, new labor indicators can also be integrated in existing administrative data collection systems more easily and at a lower cost than conducting a new survey.

DATA SOURCES OF LABOR STATISTICS
Availability of reliable data related to labor that can be used to monitor progress toward the achievement of the goals is of primary importance. Labor statistics can be derived from traditional sources, such as household and establishment surveys, or from other sources such as administrative data, national accounts, and official estimates.
Household surveys. Surveys of households are an important source of data for some SDG indicators and as supplements for other data collection efforts to strengthen policy-making and monitoring programs. Establishment surveys. Establishment surveys collect statistics using establishments as the sampling unit. They are most often used to collect information about the economic units. Hourly earnings and pay gap (indicator 8.5.1) can usually be derived from establishment surveys. Data on occupational injuries (indicator 8.8.1) can also be sourced from these surveys. This type of survey can also be an alternative source of data to estimate the labor share in gross domestic product (indicator 10.4.1) using statistics on the compensation of employees.
Administrative data. Another potential source of data that can be utilized to derive labor statistics is administrative data. Administrative records are typically the most timely and comprehensive source of statistics on social protection coverage and social security benefits (indicator 1.3.1). Indicator 8.8.1, which assesses the incidence rate of fatal and nonfatal injuries, can be derived from various sources, including administrative records such as insurance records, labor inspection records, and records kept by the labor ministry or the relevant social security institution. While data for indicator 8.5.1 are often derived from establishment surveys, administrative records, such as social security records, can provide earnings data in the absence of establishment surveys.
National accounts. National accounts are a comprehensive accounting framework compiled to measure economic performance and are a major source of statistics for macroeconomic indicators. The most important indicator derived from the national accounts is gross domestic product. Labor indicators 8.2.1 and 10.4.1 are best derived from national accounts.
Modeled estimates. In cases where there are no reliable statistics available from surveys or administrative records, estimated, imputed, or modeled data can be used. The ILO produces model-based estimates and projections of labor market indicators to provide the data needed by policy makers and researchers. The models estimate various SDG labor market indicators, such as working poverty rates (indicator 1.1.1), labor productivity (indicator 8.2.1), and unemployment rates (indicator 8.5.2). The ILO has also developed models to derive estimates for other topics related to the labor market, such as social protection coverage (indicator 1.3.1) and child labor (indicator 8.7.1).
Big data. More data are being produced because of technological advancement, the increasing number of electronic devices, and wider internet access. This has led to the introduction of big data. Big data based on job searches, vacancies, and skills is a potential source of labor statistics. However, there are still no internationally agreed methodologies and guidelines on deriving labor statistics from this source.

ADMINISTRATIVE DATA AS SOURCE FOR LABOR INDICATORS OF THE SUSTAINABLE DEVELOPMENT GOALS
Administrative data are derived from the administrative functions of an agency. Although they are not designed to produce statistics, administrative records offer rich data that can play a significant part in the statistical system, including labor statistics, making them a good alternative to address the data challenges of the national statistical systems.
Together with statistics from other sources, labor statistics collected from administrative records can provide governments and decision makers with information to formulate efficient labor policies. There are diverse types of administrative records from which labor statistics may be compiled. Table 2 lists the common administrative records and their possible statistical outputs.
There are several advantages to using administrative data in compiling statistics. Given that the data are already available, no additional data collection is required, so the cost of collecting data from administrative records is low compared with censuses and surveys. In addition, compiling data is easy given that both the agency and the respondent will only manage and answer a single enquiry since the data which the respondents have already provided for administrative purposes can also be used for statistical compilation. Furthermore, a complete count of units can be produced since all the records in the administrative system are available for statistical compilation, and from this statistics for smaller areas of interest can be derived.
However, the use of administrative data for statistical purposes presents several challenges. Administrative data are not collected primarily for statistical purposes, therefore vital factors such as content, scope and coverage, and procedures may not be appropriate for statistical use. For instance, where labor inspection is not undertaken regularly, the records in each period may not constitute a representative sample of the working population. These nuances of administrative data must be addressed for it to be useful for statistical purposes. Nevertheless, there is great potential for national statistical systems to leverage administrative data for compiling statistical indicators. The ILO produces model-based estimates and projections of labor market indicators to provide the data needed by policy makers and researchers.

HOW COUNTRIES ARE USING ADMINISTRATIVE DATA FOR STATISTICAL COMPILATION
Promoting the use of administrative data to meet the data requirements of the SDGs requires an understanding of how countries use this type of data source to compile development statistics. Some countries in Asia and the Pacific are already using administrative data to compile and monitor labor market Data on expatriate employment are collected using two quota application forms, which should be completed to acquire or increase quota to bring expatriates, and application for work permit to obtain permission to bring expatriates into the country.

New Zealand New Zealand's Linked Employer-Employee Data
Tax data used in the annual statistics are obtained from three sources: • The Inland Revenue's employer monthly schedule, • annual tax returns, and • the Ministry of Social Development's Benefit Dynamics Dataset. The Business Frame, is a regularly maintained list of all economically significant businesses and organizations (with a turnover greater than $30,000) engaged in the production of goods and services in New Zealand.

Philippines
The indicators. This section briefly enumerates initiatives in selected countries in the region.
In Malaysia, the Immigration Department collects data on net tourist arrivals; application of the foreign worker quota; and refugees, asylees, and asylum seekers to obtain proxy information needed in estimating the number of irregular foreign workers.
In the Philippines, the Department of Labor and Employment conducted the JobsFit Labor Market Information study to address the jobs-skills mismatch. The report enumerates the industries that will create high demand jobs in the future and the skills they will require. The study used administrative data, including records from other government sectors, the private sector, employer associations, labor groups, and academe, to complement the data from the national Labor Force Survey and the Survey on Overseas Filipinos. New Zealand's Linked Employer-Employee Data report, which provides information on the interaction of the labor market and sources of income, uses administrative data from the tax system together with business data from the Statistics New Zealand Business Frame. Table 3 provides additional details.
The international statistical community increasingly recognizes the importance of strengthening administrative data collection systems, especially during periods of uncertainty that require timely and disaggregated statistics. For instance, the Collaborative initiative, established to strengthen the use of administrative records for statistical purposes, aims to address the need for timely and disaggregated statistics to better assess and mitigate the impact of the COVID-19 pandemic. This partnership is supported by NSOs and regional and international agencies, and is coconvened by the UN Statistics Division and the Global Partnership for Sustainable Development Data.

COMMON CHALLENGES AND SOLUTIONS IN USING ADMINISTRATIVE DATA FOR LABOR-AND WORK-RELATED INDICATORS
The use of administrative records and other forms of administrative data has been growing considerably, including in policy-making, where it serves as input to help address pressing issues. In the labor sector in particular, administrative data can facilitate a more comprehensive and nuanced understanding of the work and employment landscape. However, when developing a road map, it is important to identify several challenges that need to be navigated in the context of using administrative data. These vary from country to country and are discussed in the following sections.
Coordination. Coordination is one of the most important elements in the production of administrative data for labor and work-related indicators as it greatly impacts how the other challenges are addressed. Coordination among statistical and other government agencies that produce administrative data is important to achieve an efficient statistical system. The NSOs and other relevant government agencies should work closely to strengthen the use of administrative records in compiling labor statistics. Policies must be established to determine whether labor indicators can be sourced from administrative records. Proper coordination ensures that activities in the production of administrative data are harmonized, creating an efficient use of government resources. A well-coordinated administrative system can even help indirectly bridge and address the inconsistencies in the definition and terms of work-related indicators through workshops in which stakeholders participate. Coordination with international partners is also deemed highly important as it provides the venue to share common standards, classifications, and norms implemented in many countries.
Consistency. Administrative data are collected for purposes that may be different from those behind the collection of survey and/ or census data. Further, statistical agencies in many countries have no control over how administrative data are collected. As a result, one of the challenges in using administrative data sources is that the units used in administrative data do not directly correspond to the required statistical unit. For instance, the forms used to collect administrative data do not generally follow established standards and may cause inconsistencies and errors in reporting, ultimately leading to programs that do not solve social and economic problems. In addition, the definition of variables in administrative systems is different from the definition in statistical systems. For example, the standard statistical definition of unemployed is all persons above a specified age who during the reference period are without work, currently available for work, and seeking work; whereas the administrative definition of unemployment is usually based on the number of people claiming unemployment benefits or registered as looking for work. The differences in units and definition may significantly impact the estimates. Hence, standardization of concepts is of great importance. Labor statistics from administrative records should be collected and compiled based on the standard national or international definitions and classification.
Scope. In some countries, the need to develop administrative data systems emerged during the second half of the 20th century. Significant milestones in this area have been achieved since then. However, a substantial amount of work is still needed to improve the scope of administrative data systems. Various administrative records may be incomplete or of limited scope for statistical use. The coverage of administrative records usually corresponds only to the defined administrative purposes and does not always coincide with the statistical needs. For instance, not all jobseekers are unemployed-some are seeking additional or alternative employment. Administrative data may also be prone to spatial bias. For example, they may not capture people living and working in remote areas.
Accessibility of data. Accessibility pertains to the ease of obtaining and collecting administrative records. Although it may be considered as the actual receipt of data from an agency, accessibility essentially involves several layers, including determining the availability of data, receiving the data, merging multiple data sets, and understanding what the data do and do not show.
Coordination among statistical and other government agencies that produce administrative data is important to achieve an efficient statistical system.
Determining availability entails ascertaining whether the data of interest exist. Data users may assume that some agencies produce certain data. Asking an authorized member of staff is a practical step to determine whether the data sought is available. Receiving the data involves the user and the authorized staff member determining which data variables the user needs, establishing a data sharing agreement approved by an authorized staff member, and determining the data format. After receiving the data sets, the user may need to merge several data sets to arrive at the data of interest. Merging may be difficult if the unit of analysis is not consistent with the level of the data, or if a unique identifier is not available; thus, there is a need to match data from different data sets using several variables. Understanding what the data mean is a process that requires understanding the variable definitions, the data collection and data entry, and the policy on which data collection was anchored.
Some administrative data may be stored in paper form in different physical locations. Finding and digitizing these records can be time consuming. For digital records, data need to be extracted before sharing and integration into the statistical system.
Qualifications of staff. Most countries face a shortage of staff qualified to produce administrative data for labor-related indicators. Agencies managing administrative data systems in the labor sector commonly have staff assigned at their headquarters and field offices. However, unattractive salary and benefits packages make it difficult to retain qualified staff, and this issue has been detrimental to the quality of the administrative data produced. Poor management of field office staff is a further contributing factor in poor data quality.

Summary and Conclusion
Labor statistics are an important component of official statistics as they can be used to understand labor market conditions and help ease the plight of workers. The demand for quality, granular data for labor statistics has intensified. As a result, administrative data has played a stronger role in supplementing the data collected from statistical surveys and censuses and as an additional data source for work statistics. As countries start developing road maps on how to harness administrative data for such purpose, this brief outlined several things that need to be considered. A wide and growing scope of potential administrative sources that can be used to monitor and evaluate labor statistics is waiting to be tapped for this purpose. While challenges remain in the use of administrative data, they can be overcome with effective data management and better interagency coordination.
It is important to continuously strengthen labor-related data systems by supplementing them with administrative data. The need to develop efficient sources of information, such as an efficient and reliable data system that is readily available during a crisis, cannot be overemphasized. Equally important in the effort to improve data systems is an emphasis on hiring and retaining highly qualified staff involved in data production. These staff need to be provided with continuous training, including on the policies behind the data produced. Properly recognizing staff by providing an attractive remuneration package and opportunities for skill upgrading and career advancement should help retain the right staff to keep the system running.
Cooperation among local and international agencies, which involves interaction among people involved in generating the data, is also vital in creating efficient labor data systems through use of administrative data. It will play a crucial role in setting high standards of data production through continued discussion of the best practices in the use of administrative data.
Lastly, creating a mindset in which data are highly regarded and used to formulate well-informed and timely decisions, especially during periods of uncertainty, could encourage further investments in exploring alternative data sources such as administrative data collection systems. The coverage of administrative records usually corresponds only to the defined administrative purposes and does not always coincide with the statistical needs. Hence, careful examination is needed when using administrative data.