Urban Agglomeration and Firm Innovation: Evidence from Asia

The findings, interpretations, and conclusions expressed in this paper do not necessarily reflect the views of the Asian Development Bank (ADB), its Board of Governors, or the governments they represent. The ADB does not guarantee the accuracy of the data included in this document and accepts no responsibility for any consequence of their use. The mention of specific companies or products of manufacturers does not imply that they are endorsed or recommended by ADB in preference to others of a similar nature that are not mentioned.


INTRODUCTION
Urban agglomeration plays an important role in fostering innovation. 2 Endogenous growth theory states that innovation and knowledge are significant contributors to the economic growth of nations (Romer 1986). As innovation investment has become more globalized, emerging countries in Asia are increasingly basing their economic growth on the capacity to generate new product and process innovations. 3 Coinciding with ongoing economic development and increasing innovation, these countries also experience rapid urbanization. 4 However, most evidence on the urban agglomeration effect on innovation activities is from developed economies. There is a lack of systemic evidence as to whether urban agglomeration affects innovation and, if so, what is the internal channel for it to take place in the developing world, including Asia. This paper examines the effects of city size on three types of firm innovation activities-product innovation, process innovation, and research and development (R&D). We have constructed a unique dataset that defines agglomeration boundaries, or "natural cities", using nighttime light (NTL) imagery across 25 developing Asian countries. The advantages of "natural cities" over administratively defined ones for this study are (i) that they are consistently defined and measured across countries and (ii) that they better represent urban agglomerations in which the firms operate as agglomeration units of human activities often extend beyond administrative boundaries. We have also identified the respective population sizes and firm innovation activities within these boundaries, which allows us to assess the correlation between the two. To address the endogeneity issue, we take an instrumental variable (IV) approach, using historical population figures as the source of exogenous variation for current city populations. We find statistically significant effects of city population size on all three firm innovation activities. All else being equal at our sample average level, doubling city size would increase a firm's propensity for process innovation by 3.1 percentage points, product innovation propensity by 4.5 percentage points, and R&D propensity by 2.0 percentage points. Since, in our data for developing Asia, the biggest natural city is more than 2 10 times as large of the smallest one, the effects of agglomeration on firm innovation propensity can be substantial.
We also estimate the agglomeration effects by country development levels and for large individual countries. The results show that an agglomeration effect is present in both the upper middle-income and the low-and lower middle-income country groups. Across countries, we find the agglomeration effect most present in the People's Republic of China (PRC) and India-the two most populous countries in the region-and in countries from regional cooperation organizations such as the Greater Mekong Subregion (GMS) and the Association of Southeast Asian Nations (ASEAN). We do not find agglomeration effect in countries from the Central Asia Regional Economic Cooperation (CAREC) other than the PRC. 5 flow rapidly (Porter 1993 andStrange 2002). 7 Gerlach, Ronde, and Stahl (2009) find that firms located in clusters also take greater risks in R&D choices because of this, compared with spatially isolated firms.
The second channel of agglomeration effect is matching. It results from thick local labor markets, in which the quality of matches between firms and workers in improved (Helsley and Strange 1990;Berliant, Reed, and Wang 2006;and Strange, Heijazi, and Tang 2006).
The third channel, knowledge spillover, refers to intellectual gains made by exchange of information for which no direct compensation is given to the producer of the knowledge, resulting in higher degrees of innovation in those locations where economic activity is dense. They are particularly important in explaining the concentration of innovation because they depend on new knowledge more than other economic activities.
One approach of studying localized knowledge spillover is the examination of innovation colocation within highly innovative cities. This was led by Jeffe, Trajtenberg, and Henderson (1993), who use patent citations within a given city. They find that patents are 5-10 times more likely to cite other patents from innovators who are spatially closer, compared with similar patents from innovators who resided farther away. Further refinement of this topic can be found in Jaffe et al. (2000), Thompson and Fox-Kean (2005), and Murata et al. (2014).
An alternative approach is to examine the localized knowledge spillover created by the presence of a university or its R&D activities on firms' commercial innovations (Jaffe 1989;Audretsch and Feldman 1996;Audretsch and Stephan 1996;Anselin, Varga, and Acs 1997;Carlino et al. 2007;and Andrews 2017). 8 There are many ways for knowledge spillover to take place between universities and firms: (i) university-firm collaborations through market-mediated interactions; (ii) unintended knowledge flow from university-based research (D'Este and Iammarino 2010 and D'Este and Patel 2007); 9 and (iii) university as human capital producer (Toivanen and Vaananen 2016). The quality of academic research also indirectly affects the quality of firm innovation (Baba, Shichijo, and Sedita 2009;Maietta 2015;Mansfield and Lee 1996;and Zucker et al. 1998).
Although both strands of literature have developed quickly, the evidence mostly draws from the US and other advanced economies. Meanwhile, evidence from developing countries is still extremely thin, with considerable knowledge gaps remaining (Duranton 2014). One exception is Nieto Galindo (2007), which finds a significant share of firms in Colombia engage in product and process innovation, but generate minimal patents. More than 70% of these innovations were 7 Helsley and Strange (2002) developed a dynamic model of innovation to illustrate the idea that a dense network of input suppliers facilitates innovation by lowering the cost of bringing new ideas to realization. 8 Most of the evidence here is from the state or metropolitan level in the US. 9 Geographical proximity plays a fundamental role in determining both influence of university-based research (Fischer and Varga 2003) and intensity of collaborations (D'Este, Iammarino, and Guy 2013), with a heterogenous effect on formal and informal knowledge transmission (Audretsch and Feldman 1996;Morgan 2004;and Leten, Landoni, and Van Looy 2014). concentrated in three main cities, which hosted less than 40% of the country's population. Available cross-country evidence of the developing world is even scarcer. 10 On the other hand, the innovation activity in the developing countries may differ considerably from the experience with developed countries. Acemoglu, Aghion, and Zilibotti (2006) distinguish between developed economies located at the technological frontier and developing countries within this frontier. They argue that the growth problem for developed economies is to push the frontier, where formal innovations (such as patents) and R&D play a crucial role. For developing countries. However, the problem is one of catching up with the frontier. To some extent, this is about firms' capacity for product and process innovation, i.e., to produce new and better products and to produce them more efficiently. Therefore, it is important to study urban agglomeration with respect to innovation in developing countries.
This paper contributes to the existing literature in three ways. First, it fills the gap with some compelling evidence on the existence and magnitude of agglomeration on innovation activities within developing Asian countries. We do this using a unique dataset that contains consistent measures both of city size and firm innovation across nearly 500 cities across 25 countries, and utilizing historical urban population data to address endogeneity issues. Second, it shows the importance of the knowledge spillover channel in creating an agglomeration effect on innovation. It demonstrates how top universities matter and explores how returns to firm R&D vary by city size.
The remainder of this paper is organized as follows. The next section introduces our data section III describes our empirical strategy, and section IV presents the main results of agglomeration effects on firm innovation activities, followed by robustness test and heterogenous estimation. Section V reports and discusses findings on potential channels. The final section concludes the paper.

II. DATA
We assemble a unique cross-sectional dataset covering nearly 500 cities of 25 developing economies across Asia and the Pacific. The firm data come from World Bank Enterprise Surveys (WBES), which contain firms' innovation activities as well as their basic characteristics. These data are merged with city data constructed using NTL satellite imagery. City characteristics such as population, presence of university, weather, and geography are estimated or obtained from various sources.

A. Data on Firm Innovation
WBES are firm-level surveys conducted in developing economies. Each survey consists of a cross-sectional representative sample of firms from an economy's private sector, excluding the agriculture sector and all state-owned enterprises. Surveyed firms are selected through a 7 stratified random sampling, based on the sector of activity and firm size. Since 2002, the data have been collected from face-to-face interviews with high-level managers or company owners.
The survey questions cover a wide range of topics, with some of them country specific. The World Bank offers a harmonized dataset that extracts and standardizes common information from each survey. This is the dataset we use for this study. In addition to a firm's basic information-such as number of employees, age, sector, and foreign direct investment (FDI) share-the dataset records information about a firm's innovation activity through three questions: (i) During the last 3 years, has this establishment introduced any new or improved process? These include methods of manufacturing products or offering services; logistics, delivery, or distribution methods for inputs, products, or services; or supporting activities for processes. (ii) During the last 3 years, has this establishment introduced new or improved product or service? (iii) During last fiscal year, did this establishment spend on research and development activities, either in-house or contracted with other companies, excluding market research surveys?
The answers to these questions, coded as binary indicators, measure a firm's engagement in innovation with respect to process, product, and inputs, respectively. 11 The subjectivity of surveys may raise concerns to the validity of the self-reported innovation variables or WBES in general as they may not measure innovation as objectively as peerreviewed sources, e.g., patents. But these innovation measures and WBES can be considered reliable in a few ways. First, the survey organization (World Bank) has taken several steps to ensure reliability (Ayyagari, Demirgüç-Kunt, and Maksimovic 2011) detailed the steps taken by y. The firm identifiers are kept confidential and the survey is conducted by private organizations independent from government. Various logistic measures are taken before, during, and after the survey to ensure that questions elicit valid answers, including translation and localization checks, two interviewees for the same firm, and post-survey consistency checks. Second, to the extent that there may exist systematic measurement error for innovation variables because innovation may be interpreted differently depending on country's development level or technology advancement, we include country year fixed effects in all our specifications to mitigate this concern. Third, detailed descriptions of product and process innovation reveals that firms have valid reasons to consider their reported products or practices as "innovative", such as increased profit, reduced cost, or reduced production time as a direct result. Fourth, in robustness test we use two additional innovation measures based on the raw measures and the results are not sensitive to them. The first measure is "main market product innovation" from a follow-up 8 question to question (ii) on product innovation. The question states: Were any of the new or improved products or services also new for the establishment's main market? The answer to this question is also yes or no. We consider this question a stronger version of product innovation. 12 The second measure is "firm innovation" constructed from questions (i) and (ii). The variable takes value of 1 if firm has reported product or process innovation and 0 otherwise. Last, there has been a large number of studies using data from the survey (Djankov et al. 2003, Barth et al. 2009, and Besley 2018, including using innovation variables (Ayyagari et al. 2011, Paunov 2016, and Paunov and Rollo 2016 To study how urban agglomeration affects firms' innovation activities, we need to access firms' urban locations, ideally measured in a consistent way across countries. WBES indicate geographic units where each firm is located. However, the scope of the geographic units varies by country and these units do not necessarily correspond to a well-defined urban area. For example, the Kazakhstan data show only the region (e.g., north or south) of the surveyed firms, and the Indian data show only states. Even if the geographic units represent cities in some countries, they are not comparable since the official definitions of cities differ greatly across countries (United Nations 2018).
Alternatively, the WBES conducted since 2012 contain the geographic coordinates of the firms surveyed. In the publicly available data, firms are randomly shifted up to 2 kilometers (km) to mask their true locations. Based on these coordinates, we can match the firms with the geocoded natural city data to associate firms with the cities in which they are located.

B. Data on Natural Cities Based on Nighttime Satellite Imagery
We construct a dataset of more than 1,500 cities across Asia and the Pacific, using NTL satellite data available since 1992. The imagery obtained tracks the footprints of these cities from 1992 to 2016 as contiguous illuminated areas. Individual city characteristics, including population, presence of university, weather, geography, and historical population are ascertained using various data sources. For example, to estimate population, we fill the delineated urban area with grid population data from LandScan, which is available at approximately 1 km spatial resolution. We then tally all cells falling within, or interacting with, the city contour. We refer to these urban agglomerations as natural cities to distinguish them from administratively defined cities. 14 The advantages of using natural cities, at least for this study, are threefold. Firstly, natural cities are uniformly defined and their characteristics are measured consistently across countries and time. Second, they offer a better representation of the urban agglomerations in which the firms operate, compared with the administratively defined cities (many of which contain both dense urban areas and sparsely populated rural areas). The urban areas of natural cities often expand beyond the administrative boundaries and spread over multiple administrative units. Third, the natural cities are geocoded and this information can readily be merged with WBES firm data.
12 There is no corresponding follow-up question on process innovation.
13 Literature has also found that some subjective firm responses from the survey have a high degree of correlation with those objective outcomes from external data sources (Hallward-Driemeier and Aterido 2009). 14 For a more detailed account of the process of developing the dataset, please go to Appendix 1 or see Jiang (2019).

9
We map the WBES firms into natural cities using geographic information system software. To optimize the matching rate, we use 2016 natural city boundaries and allow those firms falling 2 km or less outside a natural city boundary to be assigned to that city. 15 Table A1 in Appendix 1 shows the matching results by country. The overall success rate is 87%, with matching within individual countries ranging from 40% (Mongolia) to 100% (Myanmar and the PRC) 11 . The process matches a total of 21,857 firms to 489 natural cities, an average of 45 firms per city. India alone contributes 8,100 firms, or 37% of the total sample, which were distributed in 207 natural cities. Figure 1 shows the geographic distributions of our natural cities with WBES firms.

C.
Summary Statistics Table 1 reports the summary statistics of firm-level and city-level variables used in the study (upper panel for the full sample). Around 50% of the surveyed firms reported that they had introduced new or improved processes in the past 3 years. The proportion goes down to 36% for new or improved products or services (21% for products that are also new for the establishments' main market), and 24% for any expenditure on R&D.
Following the existing literature, we define those firms that had operated for fewer than 10 years as young firms (Reyes, Robert, and Xu 2017); those that had 50 or fewer permanent employees as small firms (Beck, Demirgüç-Kunt, andMaksimovic 2008 andReyes et al. 2017); and those that had 10% or more coming from FDI as FDI firms 16 . About 21% of firms across the whole sample were young, 65% were small, and only 6% were FDI firms. About two thirds of firms are from the manufacturing sector, with the rest from the services sector. About 41% of the firms are headquarters and the average share of skilled workers is around 35%.
We use 2010 values for time-variant city characteristics, such as population and weather, taking into account that the innovation activities captured by the surveys were concentrated in 2012-2016. Overall, the sample cities are highly diverse in terms of the dimensions measured. City population ranges from 8,000 to 35.8 million, with the average at 1.4 million and the median at 438,000. About 20% of the 489 cities host at least one university that rank among the top 500 in Asia, while there is no top 500 university in the remaining 80% of cities. The average distance of the cities to the nearest seaport is 400 km, with standard deviation of 500 km. Minimum temperature has a larger variance than does maximum temperature, and the city with the most rugged terrain has an index 15 times greater than the sample mean for ruggedness.

D. Spatial Concentration of Innovation
One stylized fact that is documented in the existing literature is that innovation is highly spatially concentrated. Within a given country, there are generally a few cities acting as "innovation hubs". These hubs host a large share of the country's innovative activities, and this share is typically disproportionate to each city's share of the country's total population. For instance, Nieto Galindo (2007) found that over 70% of Colombia's innovations are concentrated in three main cities, which together host less than 40% of the country's population. In the US, Moretti (2019) shows that the top 10 innovative cities in computer science, semiconductors, and biology and chemistry account for 70%, 79%, and 59% of inventors, respectively.
Similar patterns are clearly observed in our data. We calculate each city's share of firms with process innovation, product innovation, and R&D expenditure, then compare these figures with each city's share of population as a percentage of the national total. A city with a higher share of firms engaged in innovation in comparison with its share of total population reflects a concentration of innovation activities. In the PRC, 10 cities host 64% of the country's firms involved in process innovation, 72% of firms involved in product innovation, and 64% of firms investing in R&D, while together accounting for only 55% of the national population. The contrast is even starker for India, where the top 10 cities host 70% of firms involved in process innovation, 76% involved in product innovation, and 76% undertaking R&D, while together accounting for only 43% of the country's population. Figure 2 plots the cumulative share of firms against cumulative population share for India, Indonesia, Kazakhstan, Malaysia, and the PRC-the five countries with the most natural cities in our sample. Each data point in the figure represents one natural city, and the cities are ranked by their share of firms involved in each of the three innovation activities. All the curves across the three panels in the figure demonstrate a Lorenz-type convex feature, implying consistent spatial concentration of innovation across different countries and innovation activities. In general, the curves of India and Malaysia are more convex than those of the PRC, suggesting a higher degree of innovation concentration in those two countries.
Figure 2 also shows that many large cities have a high share of innovative firms (segments with a relatively long projection on the horizontal axis and a slope greater than 45 degrees). However, we also see some mega cities with a disproportionately low share of innovative firms (segments with a long projection on the horizontal axis and a slope less than 45 degrees). For instance, Kuala Lumpur in Malaysia, Jakarta in Indonesia, and Almaty in Kazakhstan all have a share of national population larger than the corresponding share of innovative firms.
How does city size affect firms' innovative activities? To what extent is this relationship causal? What are the channels used in order for an agglomeration effect to take place? These are the questions the rest of this paper is designed to answer.

A. Baseline Probit Model
Our main aim is to investigate the impact of urban agglomeration on a firm's propensity to undertake product innovation, process innovation, and R&D activity. Given this research proposition and the nature of the data we have at our disposal, the econometric analysis is most appropriately undertaken by employing a probit model. This allows us to estimate the likelihood of a firm partaking in a particular type of innovation activity as a function of characteristics of the firm itself as well as the city in which it is domiciled. In the probit model, we control country fixed effects. This allowed us to account for unobservable differences in the innovation environment across the 25 developing Asian countries, including variations in institutions, culture, and nationwide policies that promote innovation. We controlled year fixed effects, which captures difference across year such as macroeconomic trends that affect firm dynamics across country and sectors. We also control sector fixed effects. Therefore, our results derive from linking firm innovation across different cities to city-level characteristics, while holding country, year, and sector factors constant.
The key explanatory variable of interest is city population in log form. To partially mitigate potential bias induced by endogeneity or reverse causation, we chose population as at 2010, or 2-6 years prior to the administration of firm surveys. A number of first-nature geographic characteristics could affect innovation and city population (Combes et al. 2010). These include average rainfall, maximum highest temperature, minimum lowest temperature, average terrain ruggedness, and city's distance to the nearest port. We incorporated these as city-level controls. Firm-level controls included firm size (small firm if total employment is fewer than 50 people), firm age (young firm if operation duration is fewer than 10 years), and whether a firm has a significant FDI component (over 10% foreign ownership).
We estimate three baseline models, relating to each of the three types of firm innovation activity: product innovation, process innovation, and R&D activity. The underlying latent model for our probit model can be written as: Subscripts , , , , represent firm, city, country, sector, and year, respectively. denotes the outcome variable of interest, which equals 1 if the firm responded positively to the question regarding product innovation, process innovation, or R&D expenditure, and 0 otherwise. * is the latent variable for outcome variable , which can be interpreted as a firm's propensity to innovate. Its realization is not observed.
is the natural log population of 2010 in city . Of primary interest is its coefficient, . A positive estimate suggests there exists agglomeration effects on firms' innovation activities.
is a vector of city controls. These include average rainfall, maximum and minimum temperature, average terrain ruggedness, and city's distance to the nearest port. These variables are measured at 2010 and are time invariant.
is a vector of firm controls. These include firm age, size, and FDI ownership.
, , are country, sector, and year fixed effects.
is the random error term. We clustered the standard errors at the country level in the estimation to account for factors driving correlations in the unobservables, such as national technology policies.

Let
be the vector of covariates and be the vector of coefficients listed above, so * = X ′ β + . Then denote Φ(•) and (•) the CDF and PDF of standard normal distribution. The probability of firm innovation activity conditional on can thus be written as The marginal effect to innovation probability from percentage change in city population can be derived as Results in the next section will present probit model coefficients as well as marginal effects from city population.

B. Instrument Variable
Endogeneity issues could arise in our baseline specification, resulting in bias in the estimated coefficients. One concern is that a city's size and a firm's innovation activities may be determined simultaneously (Moomaw 1981). This could happen if the innovative outputs of firms expand production scale, thereby attracting more employment to a city and resulting in reverse causality (Duranton 2007 and. Another explanation for simultaneity is missing local variables that are correlated with both city size and innovation. For example, highly skilled workers at innovative firms could be attracted to large cities by amenities not adequately controlled in our regressions.
Following Ciccone and Hall (1996) and Combes et al. (2010), we perform instrumental variables (IV-Probit) regression to address these endogeneity possibilities, using historical populations from World Urbanization Prospects (WUP) to instrument for the 2010 population used in the baseline specification. WUP collects urban populations for over 200 countries and covers cities with populations over 300,000 population from 1950. 17 A detailed discussion on the pros and cons of WUP data quality can be found in Buettner (2015). Despite limitations, the data have been found useful in studying urban issues, especially in the developing world where available datasets are scarce (Henderson 2000, Montgomery et al. 2004, and World Bank 2008. Cities within the WUP data are defined by administrative boundaries. We reconciled this with our natural cities by mapping the WUP cities into our natural city boundaries using their latitude and longitude coordinates. Of 789 WUP cities available across 23 of the 25 countries included in our study, 18 767 WUP cities could be mapped into 287 of our 489 natural cities. Of the 202 13 natural cities that did not have corresponding WUP cities, the majority were relatively small, with an average population of 277,829 in 2010. This compares with the average population of 2,227,414 in 2010 for the other 287 natural cities (Table 1). As a result, although we are down to 289 natural cities in our IV analysis, we retain 88% of our WBES firm sample because most of the surveyed firms are not located in the smaller cities. We consider this slight loss of estimation power to be a small sacrifice for dealing with endogeneity. For our instrument, we use the annual average of 1950-1959 population, the earliest available period from WUP. Using the earliest period helps further guarantee exogeneity, and a 10-year average smoothing helps avoid data inaccuracies or outliers in WUP historical data.
The validity of this instrument relies on two conditions. First, there is some persistence in the spatial distribution of population. Second, the local drivers of the dependent variable, in our case innovation activity, at the time of sampling period (2012-2016) differ from those of the past (Combes et al. 2010). The historical population data we chose should satisfy these two conditions.
For the first condition to hold, the instrument must have a strong correlation to our 2010 instrumented variable. The correlation of log population (average of 1950-1959) and log population (2010) was 0.87 and was significant at 1%, indicating the persistence in the population spatial distribution. The first stage IV regression, shown later, further confirmed its relevance.
For the second condition to hold, our instrument must be orthogonal to the error term. In other words, we require the instrument that affects innovation only through the spatial distribution of population. We argue that our instrument satisfies the exogeneity condition for following reasons.
First, long-lagged values of the same variable will remove any simultaneity or reverse causality bias. It is highly implausible that city population distribution in the 25 developing Asian countries would be determined by firm innovation activities in the 2010s, or by any other contemporaneous local shocks. That being said, some permanent city geographic characteristics, such as climate or proximity to coastal areas, may indeed explain both past population spatial distribution and current innovation outcomes. We solved this problem by directly controlling for these first-nature city characteristics in , e.g., temperature, rainfall, and distance to port.
Second, after controlling time-invariant geographic characteristics, the drivers of urban agglomeration in the 1950s are not related to determinants of local innovation activities in the 2010s. This is because the current economies of the 25 developing Asian countries are very different to what they were 60 years before. Take the two biggest countries in our sample, India and the PRC, for example. In the early 1950s, both India and the PRC had just experienced drastic regime changes. India had regained independence from British colonialism, while the PRC was recovering from decade-long wars and the main economic activity was reconstruction. In addition, during the latter half of the 20th Century, both countries experienced significant reforms. The PRC initiated its reform program and opened its economy in 1978. These reforms are directly responsible for the country's structural transformation and 500 million increase in urban population from 1978 to 2012 (World Bank 2014). What is more relevant in the case of the PRC is that this period saw the rise of some of the most innovative cities, such as Shenzhen, a small aquaculture port city in the 1970s. This illustrates significant spatial change in both urban population concentration and innovation activities. India also has undergone a series of economic reforms, including the liberalization program initiated in 1991 (Bhagwati and Panagariya 2013). The sectoral composition of India's economy has also changed significantly. The agriculture share of gross domestic product was 55% in the 1950s and fell to 22% in the 2000s. Meanwhile, the services sector increased from 30% in the 1950s to 54% in the 2000s (Mukherjee 2013).
Other countries in our sample also experienced significant economic changes. For instance, most countries in Central Asia and Wwest Asia were a part of the former Soviet Union until its dissolution. With much change across these developing Asian countries, it is highly plausible that past determinants of population spatial distribution are not major drivers of current innovation activities. Table 3 presents our baseline results. The first three columns report three baseline probit models for full sample estimates. The last three columns report results using the same baseline models, but restricting observations to those with historical urban population data (thereafter IV sample). The IV sample probit estimates provide direct comparison to the IV estimates and robustness check results.

A. Baseline and IV Estimates
The coefficients of our main variable of interest, logarithm population in 2010, are positively significant at 1% in all three models. In other words, holding everything else constant, firms residing in cities with larger populations are more likely to partake in R&D and implement both product and process innovations. For process innovation, the predicted propensity when the independent variables are at mean values is 42.7%. The coefficient corresponds to a marginal effect of 0.0336. This implies that, if the population size in a city doubles while keeping the rest of the independent variables constant at their mean values, the propensity of a firm to implement process innovation increases on average by 3.36 percentage points, or an 7.9% increase from the predicted propensity of 42.7%. For product innovation, the marginal effect is 4.21 percentage points, or a 13.4% increase from the predicted propensity of 31.4%. For R&D, the marginal effect is 2.68 percentage points, or a 13.7% increase from the predicted propensity of 19.5%. In column (4), (5), and (6) of Table 3, we perform the same regressions as in columns (1), (2), and (3) to the IV sample and obtained comparable results. The marginal effects and predicted propensities are slightly different: a 7.0% increase for process innovation, a 16.4% increase for product innovation, and an 11.4% increase for R&D, if the population size doubles.
Other city characteristics are also correlated with innovation activities. The coefficients on average precipitation are negative and significant; coefficients on maximum temperature are positive and significant in models for process innovation and R&D; coefficients on minimum temperature are positive and significant; and, perhaps surprisingly, the coefficients for both terrain ruggedness and distance to the nearest port are found to be positive in three models. In other words, firms based in inland cities and/or cities with more slopes are, on average, more likely to innovate.
In terms of firm characteristics, younger firms are more likely to implement process innovations and conduct R&D. The coefficients on product innovation are found to be positive although not significant. Firm size matters for all three innovation activities. Smaller firms are less likely to invest in R&D or participate in either product or process innovation. Firms with foreign ownership are more likely to invest in R&D and generate product innovations. Headquarters are more likely to implement product innovation. These findings on firm characteristics are consistent with existing literature (Bertschek 1995, Huergo and Jaumandreu 2004, and Yang 2017.
In Table 4, we instrument city population size with the average of 1950-1959 population size. First, the instrument is quite strong, with first-stage F-statistics above 5,000. This further validates the first condition for the instrument.
Second, a larger city size is still associated with a higher propensity for firm innovation, even after addressing endogeneity. Compared with their corresponding probit coefficients in column (4), (5), and (6) of Table 2, the second-stage coefficients remain significant at the 1% level, but the marginal effects are now slightly different. For process innovation, if population size doubles, the predicted propensity for process innovation will increase by 3.14 percentage points from an average propensity of 44.0%, or a 7.1% increase. For product innovation, the effect becomes smaller: an increase of 4.54 percentage points from a predicted average propensity of 32.6%, or a 13.9% increase. For firm R&D, the coefficient in the IV estimates is now significant at the 5% level; the marginal effect is an increase of 1.84 percentage points from predicted propensity of 20.9%, or a 8.8% increase.
To summarize, after accounting for endogeneity, we still find economically and statistically significant effects of city size on firms' innovation inputs and outputs. The average propensity for firms to engage in process innovation, product innovation, and R&D increased from 1.8 percentage points to 4.5 percentage points, or 7% to 14% when city size doubles. These estimates are larger than the elasticity of wage or firm productivity with respective to city employment or urban density, which is found to be between 2% and 10% (Duranton 2014). On the other hand, the estimates are smaller than the 20% elasticity of patent intensity with respect to employment density (Carlino et al. 2007).

B. Robustness Testing and Country Heterogeneity
To examine a few factors that might potentially swing our results, we perform robustness checks by restricting our sample in a few specifications. We first consider the possibility that the megacities might dominate our results. In addition, we consider alternative definitions of innovation. Most of the results are not affected for the tested specifications.
First, we test whether our results might be driven by a few megacities (such as Guangzhou in the PRC and Delhi in India) that host a large number of innovative firms. In other words, the agglomeration effect only significantly contributes to firm innovation activities in a few very 16 large cities, while not so much in others. We check this potential outlier effect by excluding 13 cities with populations of over 10 million. 19 This leave us with 274 natural cities and 13,567 observations. The results are presented in columns (1), (2), and (3) of Table 4. The first stage Fstatistics drops to over 3,600, although they still indicate very strong IV. In the second stage, none of the significance levels for population size is affected by this test. The marginal effects for process innovation and R&D are much larger, indicating a possible stronger agglomeration effect for non-mega cities.
Second, we use two alternative innovation measures to test if our results are sensitive to the definition of innovations. The first measure we use is from a follow-up question to product innovation that gauges whether the product innovation is "also innovative to the establishment's main market". In the sample, 21% of the firms reported main market product innovation compared with 36% firms that reported product innovation. The difference indicates that some firms consider their new product innovative for themselves but not for the market (for example, they could be adopting existing product modifications to their lineup) while the remaining consider their new products are truly innovative, at least in their main market. This follow-up question could provide a "stricter" measure for product innovation. Result shows that coefficient on population is still significant, and the percentage increase from predicted probability is 14.2% if city size doubles (Table 4, column 4), only slightly different from 13.9% using product innovation measure. The second measure is an "aggregated innovation" dummy which takes value of 1 if firm conducts either product or process innovation. The coefficient is significant at 1% level and the marginal effect is 5.0 percentage points or 8.5% increase from predicted probability.
There exists considerable heterogeneity of development progress among our 25 selected countries. To test the extent that the agglomeration effect on firm innovation varies at different stages of a country's development, we categorize our sample into upper-middle income countries and low-or lower middle-income countries, using the latest World Bank country classifications. 20 We then perform endogenous probit analysis on each category (Table 5). For low-and lower middle-income countries, the agglomeration effects on product innovation and R&D are statistically significant, with effects slightly smaller than for the pooled sample, but the coefficient for process innovation is small and insignificant. For upper middle-income countries, the agglomeration effects on process and product innovation are relatively consistent with baseline IV results, while the effect on R&D is positive but not significant. Overall, we find some differences in the agglomeration effects with respect to R&D investment and process innovation when assessing various stages of country development. Meanwhile, agglomeration effects exist consistently for product innovation, regardless of a country's development level. 19 Natural cities in our sample with populations over 10 million include: Bangkok, Beijing, Delhi, Dhaka, Guangzhou, Ho Chi Minh, Jakarta, Karachi, Kolkata, Lahore, Mumbai, Quezon City, and Shanghai. 20 When our analysis was performed, the World Bank defines low-income economies as those with a gross national income (GNI) per capita (calculated using the World Bank Atlas method) of $1,025 or less in 2018. Lower middle-income economies are those with a GNI per capita between $1,026 and $3,995. Upper middle-income economies are those with a GNI per capita between $3,996 and $12,375. High-income economies are those with a GNI per capita of $12,376 or more.
The 25 countries in our sample are diverse. It includes two of the most populous economies and countries that are similar socioeconomically. Even though we included country fixed effect in our previous analysis, it is worth examining how the agglomeration effects on firms' innovation differ in India, the PRC, and countries from several regional cooperation groups such as GMS, ASEAN, and CAREC. The results are summarized in Table 6. For India, all three coefficient estimates are sizable and significant at the 1% level, suggesting substantial impacts of city size on Indian firms' innovation activities. For the PRC, we find agglomeration effects in both process and product innovation, but not for R&D investment, which is largely in line with the results for upper middle-income countries. Due to the limited number of cities in the rest of the countries, we group them together through regional cooperation organizations. We find agglomeration effects in GMS/ASEAN countries pooled together for all innovation activities, except for R&D in ASEAN. The same cannot be said for CAREC countries other than the PRC, where we detect agglomeration effect in none of the innovation activities.

V. CHANNELS FOR AGGLOMERATION
So far, we have established the agglomeration effect of city size on firm innovation. In this section, we investigate the potential channels for such an agglomeration effect. We test two hypotheses that could take place at the city level: (i) that the agglomeration effect matters differently for R&D firms and non-R&D firms and (ii) that the presence of a top-ranked university is associated with a higher propensity for firms to implement product and process innovations. Both hypotheses are related to the knowledge spillover channel studied in existing literature.
To benchmark the analysis of the knowledge spillover channel, we first include a firm R&D dummy as an explanatory variable for the process innovation and product innovation models. In other words, we here focus on innovation outputs. The results are reported in columns (1) and (2) of Table 7. Not surprisingly, firm R&D exhibits significant impacts on innovation outputs, while the agglomeration effects remain positive and significant. 21 Local characteristics are found to have heterogeneous effects on the output of innovative firms and those non-innovative (Yang 2017). In the context of this paper, the indication that city size augments the return of firms' R&D efforts may arise for two reasons. First, as shown in previous analysis, firms in larger cities are more likely to invest in R&D. These firms could have higher efficiency of R&D investments through formal collaborations or informal exchanges of information with other R&D firms in the same locality. Second, a larger city often implies a larger pool of experts because of the higher population and more R&D job opportunities. Therefore, firms in larger cities would be more efficiently matched with the most suitable R&D specialists.
To test this hypothesis, we add the interaction term between city size and the R&D dummy to the models, which is instrumented by the interaction between firm R&D and historical population. A positive coefficient on this term implies that R&D firms reap more agglomeration benefits than do non-R&D firms. Results for process and product innovations, respectively, are reported in columns (2) and (4) of Table 7. The first-stage estimates for both endogenous variables are strong. In the second stage, two results are of interest. First, the coefficient of the interaction term is small and insignificant, suggesting little heterogenous agglomeration effect on process innovation by R&D and non-R&D firms. Further, including the interaction term has only little effect on the coefficient of population. However, for product innovation, the coefficient of the interaction term is positive and significant at the 1% level, indicating that the effect of R&D on innovative product outputs increases with city size. Meanwhile, the estimated effect of population size decreases by half compared to the no interaction case.
Our second hypothesis is that the identified agglomeration benefits may result from the presence in the local area of one or more top tier higher education institutions. Universities, especially the prestigious ones, are the pioneers of pushing knowledge frontier and harbingers of exploring the uncharted. Countries allocate abundance of resources to the top ranked universities to boost innovation capacity. Geographical proximity to a university plays a fundamental role in determining firm innovation. There are several forms of university-firm interaction through which the spillover could take place: university-firm collaborations through market-mediated interactions, unintended knowledge flow from university-based research (D'Este and Iammarino 2010 and D'Este and Patel 2007), and university as human capital producer (Toivanen and Vaananen 2016).
To measure the availability of quality tertiary education, the study utilizes the recent QS World University Rankings, which identify the top 500 universities in Asia. Of the top 500 universities in Asia, 248 are mapped into 99 cities among nine developing Asian countries. Of the top 500 universities in Asia, 248 are mapped into 99 cities among nine developing Asian countries ( Table 8). The spatial pattern shows that there is a high degree of concentration of top universities. First, the top universities are unevenly distributed across countries. Only nine of the 25 developing Asia countries have top university presence, and 211 (85% of total) of them are located among five countries: the PRC (78), India (64), Malaysia (25), Pakistan (22), and Indonesia (22). Second, within a country the universities are also unevenly distributed across cities, with some countries more concentrated than others. For example, 14 of 25 top universities in Malaysia are located in Kuala Lumpur and 12 of 18 in Thailand are located in Bangkok, making them the second and third cities with the most top universities in developing Asia. 22 On the other hand, the degree of concentration is less in India: 64 top universities are distributed in 43 cities, with Delhi being the city with the most (7) universities.
The aforementioned high degree concentrations of university and firm innovation activities suggest that there could exist correlation. To investigate whether firms that are in cities with top university presence are more likely to innovate, we adopt a logit model with firm innovation activity as dependent variable and university dummy as main explanatory variable. The university dummy indicates whether the city has one or more top universities. Results show that the presence of a top university is strongly associated with a higher propensity of product and process innovation (Table 10. The coefficients are positive and significant in all specifications. The odds that firms innovate in new production processes are 72% higher in cities with at least one top university than for firms in cities without any top universities (Table 10, column 1). After controlling for firm characteristics, sector, year, and county fixed effects, the odds are slightly higher at 77% (Table 10, column 3). The positive effect of top universities on process innovation is even more pronounced after controlling for city characteristics (Table 10, column 4), indicating that firms in cities with top university presence have twice the odds to undertake process innovation than the firms in cities without any top university presence. For product innovation, the correlation is smaller (30% more odds for local firms to innovate) but remain significantly positive (Table 10, column 9).
To investigate whether more university further promotes local firm innovation, the same model is applied with the number of universities as main explanatory variable. For an additional top university in the city where a firm is located, there is a 5%-8% increase in the odds that firms innovate in new production processes (Table 5, columns 4); for product innovation, there is a 4%-7% increase in the odds that firms innovate for an additional top-ranked university nearby firms depending on the model specification ( To summarize, through examining the knowledge spillover channel for the identified agglomeration effects, we identify that the agglomeration effects would benefit R&D firms more in terms of product innovation. Firms that locate in cities with top university presence are more likely to innovate in new product and process. Holding everything else constant, firms in cities with local university presence are 111% more likely to undertake process innovation, and 32% more likely to undertake product innovation. In addition, more top universities are related to higher chance of process innovation by local firms.

VI. CONCLUSION AND DISCUSSION
The main objective of this paper is to identify the urban agglomeration effect on firm innovation activities in developing Asian countries. The results obtained show that doubling city size would, on average, increase a firm's propensity for process innovation by 3.14 percentage points, for product innovation by 4.54 percentage points, and for R&D investment by 1.84 percentage points, with all else being equal at average levels. Because of large variances in Asian city sizes, the implied agglomeration gains of firm innovation propensity are substantial.
The second objective of this paper is to identify the channels for this agglomeration effect to take place. We present two such channels at city level: through firm R&D and through the local presence of a quality university. For the first channel, firms investing in R&D would reap a higher agglomeration benefit in the form of higher product innovation propensity. For the second channel, the presence of a top university has a significant correlation with innovation activities by firms in the same city.
We found heterogeneity in the agglomeration effect across different countries, with cities in India and the PRC enjoying agglomeration benefits that were not evident for the rest of the countries. However, the reason for such heterogeneity remains largely unidentified. To precisely 20 investigate this question, one might need to examine other variations, such as industrial composition or institutional differences between countries. Another direction worth exploring is the mechanism of the knowledge spillover by university presence. Detailed data, such as innovation characteristics and precise proximity to university, are needed to further identify and fine-tune the spillover mechanism. These are topics for future research on urban agglomeration in the developing world.
Due to the problem known as "blooming" or "overglowing", the boundaries of the illuminated areas appear quite blurry. It is caused by the relatively coarse spatial resolution of the OLS sensors, the large overlap in the footprints of adjacent OLS pixels, and the accumulation of geolocation errors in the compositing process (Small et al. 2005). The post-2013 VIIRS images have much less blooming due to their higher resolution. We applied the latest methodology developed in Abrahams et al. (2018) to deblur the imagery.
With the deblurred NTL data, we delineated polygons consisting of pixels with positive luminosity values (i.e., threshold equal to 0) as human settlements, 1 then aggregated those with 1 pixel gaps between them into one polygon to allow for measurement errors as well as unlit areas (such as roads) within an integrated human settlement. The exercise yielded between 88,000 and 187,000 geocoded polygons for Asia in various years. The middle panel of Figure A1 shows the identified human settlements in the Metro Manila area.
The majority of the polygons obtained are very small and discrete, likely representing rural settlements. The second step is to identify urban areas from all human settlements, for which we referred to the database of the Global Rural Urban Mapping Project (GRUMP) 2 . This database contains geocoded centers, names, populations, and the upper administrative divisions they belong to, for over 70,000 human settlements across the world. We focused on over 1,900 1 A pixel with positive luminosity value is illuminated. Different values are adopted as thresholds to draw urban boundaries in the existing literature. Examples include 5 in Zhang and Seto (2011); 13 in Ellis and Roberts (2015) and in Zhou, Hubacek, and Roberts (2015); 33 in Tewari et al. (2017);and 35 in Harari (2016). A positive threshold is needed if one uses the pre-deblurring data to delineate urban extent. However, a uniform threshold across years may not yield consistent definition of urban scope given that different sensors (OLS versus VIIRS) were used over the period, and the same sensor performed differently over its life cycle. Moreover, a proper threshold value to define urban extent, if it exists, should probably vary across regions. In view of these issues, we considered it less arbitrary to define human settlements as all illuminated pixels. 2 GRUMP data were generated by the Columbia University Center for International Earth Science Information Network in collaboration with the International Food Policy Research Institute, the World Bank, and Centro Internacional de Agricultura Tropical, through combining census and geospatial datasets. It can be accessed at http://sedac.ciesin.columbia.edu/data/collection/grump-v1. The units in the GRUMP database are not from the same administrative level within, as well as across, countries.
GRUMP units in Asia and the Pacific, all of which had a population greater than 100,000 in 2000. We identified more than 1,400 independent NTL polygons in 1992 that either cover these units or turn out to be the most relevant upon visual checking. These were treated as our natural cities and were named after the corresponding GRUMP units or the unit with the largest population if a natural city contained multiple units. 3 To maximize country coverage and include large cities that are missing in the GRUMP database, we added to the data 115 polygons that either were related to major cities from small countries (mostly in the Pacific) or had an area greater than 100 km 2 in 2000, despite the associated GRUMP units having populations below 100,000 in 2000. With this approach, we reached a final set of 1,527 natural cities in Asia and the Pacific. 4 As shown in the right panel of Figure  A1, there were nine natural cities in the area: Angeles, Batangas, Lipa, Lucena, Metro Manila, Olongapo, San Pablo, San Pedro, and Tarlac. The largest one, Metro Manila, contained 29 GRUMP units, including Makati, Manila, and Quezon City.
Some natural cities had expanded and became connected with each other over time. To retain the individual natural cities as our primary units for analysis, we separated the connected ones where the luminosity was the lowest to obtain the footprint of each. Thus, we obtained a balanced panel of natural cities from 1992 to 2016.
The third step to develop the dataset was to assign city characteristics to the natural cities. These characteristics included population, presence of a university, weather, distance to seaports, and historical population. We filled the delineated areas of the natural cities with grid population data from LandScan. LandScan provides global population counts at approximately 1-kilometer spatial resolution, which are generated through spatial modeling and image analysis, with inputs from census data, high-resolution imagery, land cover, and other spatial data such as various boundaries, coastlines, elevations, and slopes. 5 We overlaid the natural city polygons with the grid population data. The population of a natural city is the sum of all cells falling within or interacting with the city contour.
To measure the availability of quality tertiary education in the city, we utilized the recent QS World University Rankings, which identify the top 500 universities in Asia, based on six metrics: academic reputation, employer reputation, faculty-and-student ratio, citations per faculty, international faculty ratio, and international student ratio. We mapped these universities into natural cities using their geolocations obtained from Google Maps. Of the top 500 universities in 3 The number of the natural cities was lower than the number of units from GRUMP because some units in GRUMP were located close to each other, and thus were covered under the same polygons. Such cases arose in the relatively advanced areas of developing countries, such as the Pearl River Delta area centered around Guangzhou city in the People's Republic of China (PRC), as well as in developed economies such as the metropolitan areas surrounding Tokyo in Japan or Taipei in Taipei,China. 4 Please note that the population threshold adopted refers to the population of the GRUMP units. This helped us to capture most sizable urban agglomerations in the region. However, it by no means implies that our natural cities had population over 100,000 prior to 2000, or even after 2000, because natural cities could cover very different geographic scopes from those of GRUMP units, and their populations were estimated based on grid population data. 5 Essentially, the census population counts were disaggregated to each cell with a multivariate dasymetric modeling approach.
Data precision was improved through manual verification and modification as well as refinements to the input datasets. LandScan data have been widely used in fields such as demographics, urban planning, and remote sensing.

42
Asia, 248 were found in 99 cities of nine developing Asian countries. A binary indicator was created equal to one if a natural city hosted at least one top-500 university.
For weather indicators, we referenced the United Kingdom's Climatic Research Unit, which publishes global monthly gridded weather data with a 0.5-degree variance, from 1901 onward. We used this monthly data to obtain the annual average daily precipitation and annual maximum and minimum temperatures for each grid. The averages across the grids surrounding the centroid of the natural city were taken as the weather measures for the natural city.
Two geographic factors were considered. One was the distance to the nearest seaport. A list of seaports was obtained from the World Port Index developed by the National Geospatial-Intelligence Agency in the US. The distance was measured as the Euclidian distance from the centroid of the natural city to the nearest domestic seaport. For cities of landlocked countries, it was the distance to the nearest seaport abroad. The other geographic factor is citywide ruggedness. Nunn and Puga (2012) calculated the Terrain Ruggedness Index-originally devised by Riley, DeGloria, and Elliot (1999)-to quantify topographic heterogeneity for every 30 by 30 arcsecond cell across the world. We obtained the grid Terrain Ruggedness Index at https://diegopuga.org/data/rugged/ and averaged all grid cells in each natural city to obtain citywide ruggedness.