Asian Development Bank - Fighting Poverty in Asia and the Pacific
What's New  |   e-Notification  |   Sitemap  |   Contact Us  |   Help

Catalog

Home : Publications : Catalog : Online Publications : Asian Development Outlook 2008 - Technical Note
The Global Slowdown and Developing Asia
Workers in Asia
Economic Trends and Prospects in Developing Asia

Technical Note

A note on statistical discrepancies in the national income accounts of selected Asian economies

Introduction

 

In 2006, the Philippine economy grew by 5.4%. All of its components contributed positively to this growth, except for "statistical discrepancy," which deducted 3.9 percentage points from the 9.4 percentage points total contribution of consumption, investment, and net exports. In contrast in 2002, the country posted growth of 4.6%—but this time, the statistical discrepancy made a positive contribution, of 78% of the total. What is this statistical discrepancy? Why does it occur? And how important is it when measuring growth in developing Asia?

Gross domestic product (GDP) can be estimated in three ways: via the expenditure approach (summation of private consumption, government expenditure, investment, and net exports); production approach (total value of final goods and services produced by the economy); and income approach (total income of the factors involved in the production process). In principle, these three approaches should yield identical GDP values because total expenditure on goods and services must by definition equal the value of goods and services produced, which in turn must equal total income paid to the factors that produced those goods and services. Yet as each measurement exercise is independent, there may be differences in the resulting estimates. The difference between these estimates is called the statistical discrepancy (SD).

There has been growing concern about the existence and magnitude of the SD in national accounts statistics. A large and persistent SD is seen as a reflection of the authorities' inability to generate accurate and reliable national accounts estimates. Considering that these numbers represent a country's economic activity, data dependability is of utmost importance.

This technical note examines the SDs in the national income accounts of six developing Asian countries: Bangladesh, India, Republic of Korea (Korea), Philippines, Thailand, and Viet Nam. These countries were selected on the basis of availability of sufficient time series national accounts data to allow rigorous analysis. Focusing on the expenditure and production approaches to estimating GDP, this note will assess the data collection methodologies of the selected countries to identify probable sources of the SD and its allocation across various GDP components.

Framework and methodology

The estimation procedure in this note is based on a framework proposed by Weale (1992), who showed how consistent estimates of the national accounts can be obtained by exploiting statistical information in the data. He applied the approach to allocate the SD between the measurement errors of aggregate expenditure and income estimates of the United States' real gross national product (GNP). These errors averaged 0.6% of GNP over the period 1980-2006. This note applies Weale's approach by examining, at a disaggregated level, the sources of SD using national accounts data of six Asian countries. (A similar exercise for Argentina was undertaken by Bajada [2001]). Weale's method is described in Box 4.1.1. Essentially, it uses information from the variances and covariances of observed national accounts data to draw inferences about their reliability. It then attempts to minimize the "distance" between the revised estimates, which are constrained to be consistent with the national accounts adding-up conditions, and the initial estimates, but allowing larger adjustments for elements that are believed less reliable. Underlying the approach are two assumptions: that initial estimates are subject to measurement errors; and that these measurement errors are independent of the true, but unknown, components. It follows from these assumptions that if there are two estimates of the same variable, the one that exhibits less variance can be considered the more reliable.

To illustrate the procedure described in Box 4.1.1, let Ŷ1 and Ŷ2 be two measures of the true value of GDP, Ŷ . Further, assume that Ŷ1a and Ŷ1b are components of Ŷ1 while Ŷ2a and Ŷ2b are components of Ŷ2. Let the covariance matrix of Ŷ1a, Ŷ1b, Ŷ2a, and Ŷ2b be the positive definite matrix:

As seen in its diagonal elements, estimates of the four variables have relatively large variances, suggesting that they are unreliable measures. The covariances (i.e., off-diagonal elements) also indicate substantial interrelations among the estimates.

The linear constraint A = (1 1 -1 -1), thus:

and AWA' = 1,308.

Substituting the above in equation (12) in Box 4.1.1,

 

4.1.1 Statistical methodology

In national accounts data, estimates of the true value of gross domestic product (GDP) from the expenditure approach should equal those from the value-added approach, i.e.,

(1)
(2)

where

GDPE =C+I+G+X-M = expenditure measure of GDP

GDPVA =Ag+In+Se+Ta = value-added measure of GDP

C = consumption spending
I = total investment
G = government spending
X = exports of goods and services
M = imports of goods and services
Ag = agriculture
In = industry
Se = services
Ta = indirect taxes less subsidies.

In reported data, however, GDPE may not equal GDPVA, creating a discrepancy. Differences in estimates arise from varying data collection methods and estimation procedures of GDP components. In general, the statistical discrepancy (SD) is reported on the expenditure side, such that equations (1) and (2) become:

(3)
(4)

Since SD can be defined as the sum of the measurement errors of each component of GDP, equation 4 can be written as:

(5)

where εC, εI, εG, εX, εM, εAg, εIn, εSe, and εTa are the components of SD.

To show how SD can be allocated to each component, consider Ŷ to be a vector of true data, Ŷ a vector of observed values, and ε a vector of measurement errors, i.e.,

(6)

With the assumptions that ε and Ŷ are uncorrelated and that ε is identically normally distributed with mean 0 and variance V, then the true value can be estimated by maximizing the following log-likelihood function:

(7)

where N is the number of observations, subject to the condition that:

(8)

where A is a linear constraint. This leads to an estimate of Ŷ , i.e.,

(9)

where I is an identity matrix and V is the unknown variance-covariance matrix.

Once estimates of the true values of the components are obtained, equation (2) will be satisfied, such that,

(10)

Weale (1992) shows that the maximum likelihood estimate of WA' converges in probability to VA' , i.e.,
plim WA' =VA' (11)

where W is a maximum likelihood data covariance matrix. Thus, equation (9) can be written as:

(12)

From equation (12), the share of each component to SD is estimated from:

(13)

hence,

The above can be further simplified:

which leads to the final weights, shown in equation (13), that determine the allocation of SD to each component. Summing up Ŷ1a and Ŷ1b into Ŷ1 and Ŷ2a and Ŷ2binto Ŷ2 should lead to balanced values of Ŷ1 and Ŷ2 , which are estimates of the true value, Ŷ .

In the illustrative example, the application of the maximum likelihood estimation suggests that 89 of the SD is due to measurement errors in Ŷ1 and 11% is due to Ŷ2.

Some properties of the statistical discrepancy

Figure 4.1.1 shows the SD level in the six countries. The data confirm that the SD has been volatile and has exhibited no signs of decline. As a ratio to GDP, the SD's volatility has become somewhat less pronounced, but it remains highly visible. In the last 4 or 5 years, the SDs of Bangladesh and the Philippines have generally been positive, meaning that income estimates of GDP exceed expenditure estimates. The opposite is true for India, Korea, and Viet Nam, where SDs have turned negative in recent years. Thailand's SD has been positive since 1996.

 

Table 4.1.1 presents summary statistics for the observed SDs. Bangladesh, India, and Viet Nam registered negative means ranging from -5.4% to -0.3% of GDP while Korea, Philippines, and Thailand posted positive means ranging from 0.1% to 1.6% of GDP. Since the mean as a measure of central tendency may give misleading results due to its sensitivity to the sign of the SD, mean absolute values of the SD are calculated as a proportion of GDP. Korea, Thailand, and Viet Nam have mean absolute rates of at most 1% of GDP. By contrast, India has overestimated its expenditure components by an average of 5.7% of GDP. Bangladesh and the Philippines have statistical discrepancies that average over 2% of GDP.

 

In Table 4.1.2, the impact of the SD on annual growth estimates is shown. In India for instance, the contribution of SD to GDP growth peaked at a remarkable 4,700% (in absolute terms) of GDP. This happened in 1966 when the Indian economy was estimated to have contracted by 0.03%, but in a context where the statistical discrepancy deducted 1.68 percentage points from growth. Indeed, there have been several instances in which the contribution to growth of SD has been larger than the contribution of the relevant expenditure components (among others, as mentioned in the first paragraph of this technical note, the Philippines).

 

A simple correlation analysis does not suggest that any particular GDP component is strongly related to the SD (Table 4.1.3). Indeed, except for Bangladesh and Thailand, the SDs of all the countries show no strong correlation with any single component of GDP, whether from the expenditure side or from the economic activity side.

 
 

Given the inconclusiveness of the correlation results, a more detailed analysis to identify sources of the SD is implemented.

Reconciling the estimates

Before applying the methods of Box 4.1.1, two statistical issues need to be dealt with. The estimate of the covariance matrix that is needed to apportion the SD is potentially vitiated by heteroskedasticity and by non-stationarity in the data. To circumvent these difficulties, the covariance matrix is estimated using the level differences in the estimates, normalized on the lagged level of GDP. This transformation does not affect the accounting identity, but results in homoskedastic and stationary data from which the covariance can then be calculated. Results are presented in Table 4.1.4. The numbers report the shares of the observed SD that the estimation procedure attributes to the expenditure and income components of the national accounts.

 
 

The yellow-shaded cells show the component with the highest contribution to SD. The green-colored cells show the component next in line. Either the trade or investment variables account for the highest proportion of SD. Notable exceptions are India and Viet Nam, where agriculture contributed the highest and second-highest proportion of SD, respectively, and Bangladesh where private consumption accounted for the second-highest proportion of SD.

For all the countries studied in this note, estimates of investment are sourced mainly from annual and periodic surveys of business and industries. In estimating investment, national statistics offices frequently use data extrapolations and manipulations to approximate population from sample data and to transform survey results to national income accounts data.

For example, the value of additions and modifications to buildings and other physical structures are estimated using data from surveys of building and construction activities, building permits, and building and construction materials. In Korea, where the highest proportion of the SD is attributed to capital formation, part of the investment data is estimated from input-output tables, which are compiled only every 5 years. Further, the buildings component of gross fixed capital formation is obtained directly from the output of the construction industry after some adjustments such as deductions for repairs. Meanwhile, Thailand, aside from investment data from the government and business sectors, uses related indicators (such as tax revenues to estimate machinery and office equipment) to produce investment estimates. Also, the commodity flow method1 is used to derive changes in stocks. Both of these require some interpolations and extrapolations, which contribute to measurement errors.

Although imports and exports are generally regarded as relatively well measured given that timely records of them are usually kept, the findings here indicate otherwise. Private consumption data tend to have fewer measurement errors than trade data, despite reliance on surveys and input-output tables, which are usually deemed less reliable than government records such as customs trade data. Possibly small-sample problems, particularly for Bangladesh and Thailand, detract from the robustness of the results. Several factors have been cited to account for errors in trade data. These include underestimation of transactions, incomplete data in documents submitted by exporters and importers, failure of traders to submit required documents, undervaluation of export shipments to lessen or avoid imposition of taxes and other duties, overvaluation of the costs of imports to reduce income tax liabilities, and inability to capture some import and export flows.

A case in point is the Philippines, when in August 2005 large adjustments to its trade figures were made for the previous 3 years. The adjustments were due to intracompany import transactions as well as export transactions that were unreported (Habito 2005). A World Bank study, which looked into the trade statistics between India and Bangladesh, attributed the discrepancy between India's exports and Bangladesh's imports to "'tax evasion' schemes where Indian exporters understate the value of their exports and Bangladeshi importers overstate the costs" (World Bank 2006). In Viet Nam, exports and imports of goods and services are derived from the balance-of-payments estimates, with imports adjusted for unrecorded cross-border trade. The International Monetary Fund has noted that the compilation of the national accounts suffers from poor data collection practices and a lack of coordination and communication between data collection agencies. Further, data on invisibles are based largely on banking records, which provide incomplete coverage and identification of the types of transactions.

In India, the agriculture data suffer from several shortcomings. First, survey results are available only after a lag of about 2 years. For example, land-use statistics for the computation of value added in agriculture are supplied with a lag of 2 or 3 years. Prices collected by state governments also are available with a lag of 1 or 2 years. Second, there is no consistent data collection methodology across the various agencies gathering information. Third, statistics for some subsectoral components, such as forestry, are not systematically obtained, and rely simply on irregular, ad hoc publications. All these factors contribute to the reduced precision of estimates for agriculture data.

In Viet Nam, where the agriculture sector contributes the secondhighest proportion of SD, output is estimated using annual surveys of agricultural and livestock production. Other than the inherent sampling errors present in the surveys, a possible source of discrepancy lies in the seasonal differences in agricultural production. The difficulty lies in measuring output of crops in a continuous manner throughout the year given that the length and timing of crop production vary (the problem of measuring work in progress).

Errors in the measurement of private consumption can be attributed to the fact that information is based on various surveys and administrative record systems and on a set of indicators. Household income and expenditure surveys, which provide benchmark estimates of private consumption, are carried out every 3 to 5 years. Data derived from the more frequent surveys are then used to interpolate and extrapolate the benchmark estimates. India, Korea, and Thailand are examples of countries that apply the commodity flow method to survey data to get estimates of personal consumption as well as other expenditure components of GDP.

As to the argument that data sourced primarily from government records tend to be more accurate than nongovernment sources, results do support this view. Government spending accounts for at most 4% of the measurement error.

In general, the SD is attributed more to expenditure components than to production. This supports the common notion that data support is stronger in the production accounts. Usually, estimates of GDP levels are based on the income estimates from the production side of the national accounts.

Figure 4.1.2 plots the differences between the calculated GDP levels and the old observed values. Positive values indicate underestimation of original observations. In the main, GDP estimates via the expenditure approach appear to be underrecorded for Bangladesh and the Philippines, while those for India, Korea, Thailand, and Viet Nam appear to be overestimated. True GDP values for Bangladesh and India seem to lie between observed values from the expenditure and production approaches. For Korea, Thailand, and Viet Nam, new GDP estimates are generally less than both the income and expenditure estimates. The reverse is true for the Philippines. However, while large differences in levels are apparent, revised growth rates are in line with old figures.

 
 

Conclusion

This note has characterized SDs in six developing Asian countries. It has provided a short description of why SDs exist in national income accounts and has analyzed which of the various components of GDP have contributed most to SDs. This has helped provide a clearer sense of how each economy performed and how each sector contributed to growth.

The presence of SDs highlights the need to strengthen the statistical capability of government data collection agencies. The results suggest possible areas where the quality of data is poor and where government efforts to improve data quality need to be directed.


This chapter was written by Shiela Camingue, Gemma Estrada, Juan Paolo Hernando, Edith Laviņa, Nedelyn Magtibay-Ramos, Pilipinas Quising, and Lea Sumulong of the Economics and Research Department, ADB, Manila.
Endnote
1 A method of compiling national accounts in which the total supplies and uses of individual types of goods and services have to be balanced with each other (United Nations 2001).
References

Bajada, C. 2001. "An Examination of the Statistical Discrepancy and Private Investment Expenditure." Journal of Applied Economics 4(1):27-61.

Habito, C. 2005. "Why the trade deficit was understated." Philippine Daily Inquirer. 8 August. Available: http://www.census.gov.ph/data/technotes/Habito_article.html.

International Monetary Fund. Special Data Dissemination Standards, various countries.

Organisation for Economic Co-operation and Development. 2001. "Quarterly National Accounts in Asia: Sources and Methods." Proceedings of Joint OECD-ADB-ESCAP Workshop on Quarterly National Accounts. Bangkok, October.

United Nations. 2001. The 1993 System of National Accounts. Available: http:// unstats.un.org/unsd/sna1993/introduction.asp.

Weale, M. 1992. "Estimation of Data Measured with error and Subject to Linear Restrictions." Journal of Applied Econometrics 7(7):167-74.

World Bank. 2006. "India-Bangladesh Bilateral Trade and Potential Free Trade Agreement." Bangladesh Development Series. Paper No. 13. World Bank Office, Dhaka. Available: http://siteresources.worldbank.org/ INTBANGLADESH/Resources/Trade.pdf.

 
© 2008 Asian Development Bank
Privacy | Terms of Use
 Top of page