Data in Transport: Global South Cities
Image by: Delhi Metro Rail
Image by: Delhi Metro Rail
Tackling the Challenge of Data Collection, Storage and Accessibility in the Global
South
Introduction
A briefing from the European Parliament published in 2020 introduced a strange but relevant question: Is data the new oil? As the digital economy continues to grow at unprecedented rates, it is only expected for questions revolving the importance of data to emerge everywhere, in academia and elsewhere. Data been around since the dawn of ancient civilization, and today, it has a strong influence on decision making at a global and individual level. Advances in technology and global digitalisation have greatly reduced the cost of collecting and storing data (Tatah, 2020).
Data collection, storage, and accessibility has improved significantly around the world (Arora, 2016). However, this growth has not been equal. Countries in the global north have made significant strides in data collection and distribution, which has translated to better policies, but ‘in the context of the Global South, there is a bias in the framing of data as an instrument of empowerment’ (Arora, 2016). ‘Compared to the global North, the global South has weaker infrastructures for routine data collection, as well as lacks sufficient skilled analysts to exploit these new kinds of data’ (Tatah, 2020). A very visible victim of this inefficient data collection is the urban sector. Urban regions and cities are gateways to a global world, and yet they are worlds within themselves. Within the global south, regions and cities deal with specific geographic, historic and societal issues. While data can be instrumental in bringing positive change in policy creation and implementation (OECD, 2024), there is a significant lack of data on global south cities (Mehmood, 2021).
There is indeed an obvious gap in data collection, and accessibility in developing economies. The lack of infrastructure and capacity when it comes to data collection, accessibility and storage is evident to researchers and other interested parties (Arora, 2016; Asher, 2013; Tatah, 2020). Data in the urban sector is deeply affected by this lack of capacity. Specificities of urban issues restricts the use of more generalized patterns of data collection. Urban data include datasets and indicators distinct to urban issues; such as road safety data, urban transport data, housing and informal settlement data, urban population trends, access to basic services, data on urban economics. This data is essential for municipalities, state governments, non-governmental organisations, the private sector and even international organisations to develop better policies and have substantial positive impact on urban regions. However, within cities of global south, these data are not efficiently stored, collected, or distributed. Since these datasets are extremely specific, from city to city, there needs to be well defined collection methods specific to cities and countries which cater to the distinct needs of these regions. Indeed urban data in the global south is sparse, and restricted. There exists a lack of capacity when it comes to collection methodology, which is impacting the entire cycle of collection, retention, and distribution. This blog therefore looks at the problem statement of ‘How can the lack of data be navigated by local governments and international organisations and how can efficient data collection be stimulated in in cities of Global South? Additionally, what are the challenges of implementing those recommendations in a South Asia? Some academic attempt has been made by researchers to address and navigate this issue (Asher, 2013; Lebel & McLean, 2018; Mehmood, 2021). Better methodology for collecting data specific to global south cities have been recommended, which highlight the importance of involving local researchers, better cooperation between sub-national and national governments, and involvement of international organisations.
‘Any data-collection process represents a complex chain of communications’ (Asher, 2013) and it is essential to develop a framework that can assess the capacity of the collection method, and the actors involved. Data collection can be further made more efficient through cooperation with international organisations. Other essential factors such as language should be taken into consideration, and surveys should avoid ambiguity. A major issue with data collection and distribution in developing economies is the lack of accountability. Actors such as municipalities, sub-national and national governments, ministries, are not held accountable. There is a lack of responsibility amongst government bodies, which has significantly affected for the lack of data collection, and processing. A system where government bodies are held accountable for lack of data, by internal and external forces, is essential. Accountability, along with frameworks which are tailor made for global south cities is essential towards dealing with the problem of lack of data.
Data in the Global South: What Obstacles?
It has been established that data on global south cities is scarce, but to better navigate this issue,it is essential to identify the root of the problems in the first place. The obstacles faced when it comes to data collection, and availability in the global south are multi-faceted. Issues revolve around lack of infrastructure and resources, linguistic diversity, lack of framework which matches international requirements, political instability, and geographical challenges. These
challenges and obstacles also affect different sectors differently.
The urban sector often suffers from lack of data; this can be attributed to lack of a framework that defines specific indicators, and lack of capacity to adopt international indicators to specific countries and cities. ‘An obstacle that researchers often face is how pooling together all the valid available data also requires expert knowledge in each pathway’ (Tatah,2020). Capacity to collect, retain, and distribute data is therefore a central problem in global south cities. Many countries in the Global South face a shortage of skilled professionals with expertise in data collection, analysis, and interpretation. This can not only reduce the availability and accessibility of data, but also hinder the quality and reliability of collected data.Cities of global south encompass a wide range of cultures, languages, and ethnicities. Collecting data across such diverse contexts requires culturally sensitive approaches and multilingual capabilities, which can be challenging to implement. According to Ethnologue, over 424 living
languages are spoken in India, over 273 languages are spoken in Cameroon, and 284 living languages are spoken in Mexico. Such linguistic diversity requires nuance and trained professionals who not only are proficient in some of the major languages spoken in countries of global south, but also have the capacity and capability to collect and compile reliable data.
Political instability also deeply impacts data collection. In countries with lack of transparency and higher corruption, governments may interfere with data collection efforts for political reasons. Government bodies in global south countries can participate in censorship or manipulate data to control information flow and maintain power. Disinformation is a growing issue in the global south (Davison, 2022). In politically volatile environments, there is a risk of data
manipulation or falsification for political gain. This can include inflating or deflating statistics to support specific narratives or agendas. For instance, governments may manipulate economic data to exaggerate growth or downplay inflation to maintain public confidence (Pallero and
Ayorro, 2018). Along with this, biases amongst different sectors can lead to the neglect of data in certain sectors. (Tatah,2020), in his research on travel behaviour surveys, noted that ‘while many cities in high-income countries have repeated travel survey data, only a few sporadic travel surveys in Africancities. Some of the largest Sub-Saharan cities with over four million residents, including Lagos (est. pop. 21M), Kinshasa (est. pop. 11.8M), and Dar es Salaam (est. pop. 5.5M), simply do not have this type of data available. Obtaining this data where they exist requires that researcherstap into personal and professional networks. Other workarounds, like crunching Google Street View data as a proxy for estimating travel behaviour, are still in their early stages’ (Tatah, 2020). Data on topics such as travel survey data require household surveys, census data, and data from public transport operators. Unlike more easily available data such as per capita income, or employment rates, more nuanced data sets may be disregarded as they require more work, and are not data of interest. These datasets are often not useful for political gains, or for developing narratives, and are therefore not given priority. Urban data can be more specific, which has ledto a large gap in urban data in global south cities.
Moreover, data collection and storage is costly. Most global south countries do not finds directed specifically towards data collection, storage or analysis. ‘Data collection, processing, and storage on interconnected systems requires continuous investment to prevent loss through accidentaldamage or cyber-attacks’ (Tatah,2020). For low economies, the multiple layers of resources required for data maintenance is often perceived as an additional cost, and it is dismissed as an unnecessary expenditure. The lack of data is experienced by most researchers who explore issues of the global south (Ade & Ciuffa, 2023). This is even more so evident for students, who do not have the resources and means to conduct field research, and must rely on data available online, in books or in archives.
Navigating Lack of Data in the Global South
Identification of problems of data collection and storage is an essential step towards solving the issues, but how are international organisations, such as the ITF, and even national governments, meant to navigate the lack of data? Policy making and development are priorities for both of these bodies. While efforts for better data availability should be pursued, it is essential to navigate lack of data and still produce reports and provide analysis.
1) Establishing International Relations:
International organizations can negotiate and request access to member state data through diplomatic channels. Agreements on data sharing, transparency, and collaboration in data collection can be facilitated through diplomatic efforts, and are often used as a gateway for receiving reliable data from countries. Multilateral agreements are essential to data collection and storage. Through signing treaties and agreements, international and national organisations have been able to access exclusive data on countries, and has also facilitated data collection in countries of global south. Continuous contact and conversations with stakeholders and key actors are key to establishing a network of data sharing. These networks enable international organizations to access data through partnerships with member states and regional organizations.
2) Expand database through multiple indicators:
International and national organisations need to continuously expand their database to include more indicators and regions. Increasing the number of indicators under each specific topic can give multiple perspectives into a single problem, therefore tackling the issue of lack of data. For road data, coveringmultiple indicators such as road vehicular traffic, road infrastructure expenditure, road safety indicators, road accidents, and road goods transport, can provide answers to issues better than relying on a singular indicator can.
3) Use of historic data:
Accessing historical data can be immensely helpful in tackling the lack of current data. Historical data allows analysts to identify trends and patterns over time. By examining how variables have changed in the past, policy makers can make informed predictions about future developments even when current data is limited. Along with this, historical data provides a baseline for comparison with current conditions. By comparing present-day metrics to historical benchmarks, researchers can assess the magnitude and direction of changes, even in the absence of up-to-date data. This comparison is used to identify areas of concern or progress and subsequently lead to decision making.
5) Data Imputation:
One of the main ways international organisations deal with lack of data is data imputation, which is data imputation, which is the process of estimating missing values in a dataset. Historical data can be used to impute missing values in current datasets. Statistical techniques such as interpolation or extrapolation are applied to historical data to estimate missing values, providing a more complete picture of the current situation. Other methods include using mean/median imputation, where missing values are replaced with the mean or median of the available data for that variable. More commonly, organisations may use the Last
Observation Carried Forward method, where within time-series data, missing values are replaced with the value from the most recent observation. This assumes that the last known value is still valid for the missing period.
6) Private Partnerships:
Lastly, organisations and academic institutions should facilitate collaboration between governments, and the private sector. Presence of a designated team dealing with private partners, through corporate board partnership may help immensely in facilitating public-private partnership. Private partnerships enhance data collection efforts by leveraging the resources, expertise, and technologies of the private sector to complement government-led initiatives. In case of lack of data, organisations may tap into resources present with its private partners, especially at a global level.
Facilitating Data Collection and Storage in the Global South
Lack of data in the global south can be attributed to multifaceted issues, and therefore the approach for facilitating a better data collection, and storage must be diverse, and should combine technological innovation, capacity building, and collaboration. Efficient data collection and storage in the global south can be facilitated by addressing the problems identified earlier, and suggesting possible strategies to deal with these issues. Expanding and upgrading data infrastructure is essential for improving data collection and storage. While it is a costly endeavour, and may not produce short term benefits, it is essential for long-term benefits. Improving data infrastructure could use multiple approaches, including improving internet connectivity, investing in storage facilities, establishing data collection frameworks, and promoting data sharing. Access to reliable internet is fundamental for data collection, sharing, and analysis. In regions with limited connectivity, investing in infrastructure such as broadband networks, satellite internet, and mobile data coverage can significantly enhance access to data resources. Improving internet services also facilitates big data collection and storage. The Asian Development Bank (ADB), in its report ‘Harnessing the Potential of Big Data in Post-Pandemic Southeast Asia’, mentions that ‘public institutions have embraced big data because of its analytical power to turn voluminous datasets into actionable insights that can help them respond swiftly to crises, improve their services, and enhance resilience to future shocks’ (ADB, 2022).
Along with this, investing in storage and processing facilities is essential for cities of global south. As digitalization grows at a global scale, it is imperative that emerging economies invest in digital storage. Moreover, establishing local data storage facilities can enhance data sovereignty and reduce reliance on external providers (Bolsunovskaya et al 2020). Setting up an efficient data collection framework is essential for gathering accurate and reliable data. This may involve deploying sensors, conducting surveys, or leveraging existing administrative data systems. Investing in a system which enables data collection is essential. For example, some countries are better at providing road safety data than others, because data collection frameworks are most enhanced within the traffic police department.
Along with this, Governments, NGOs, and businesses must promote open data policies that facilitate the sharing of data with the public, and within cities of global south. South-south cooperation can improve transparency, foster innovation, and enable collaboration among stakeholders involved in development in emerging economies (Rinaldi, 2023). By making data more accessible within global south cities, policymakers can benefit by learning from the experiences of each other. Global south cities, although very distinct, may face similar issues. Higher populations, geographical similarity, and colonial history can all lead to similar problems, and sharing knowledge help decision makers adopt strategies which may be beneficial for multiple cities spread across the world (Grynspan, 2019). Moreover, it is essential to validate research from global south by local researchers. The cities in the global south are most vulnerable to the effects by global warming, so developing context- appropriate adaptation plans is essential. These plans rely on localized data on intricate elements like seasonal patterns, biodiversity, community viewpoints, and political inclination. Local researchers can gather, curate, analyse, and publish these data. In certain instances, it is imperative that the task is completed by local researchers.
They are regarded and trusted in communities, speak the required languages, comprehend customs and culture, and can thus access the traditional knowledge needed to interpret historical change (Leben and McLean, 2018). Therefore, rather than adapt to frameworks created by the west, it is essential to validate frameworks created by global south researchers, and hold them in the same regard. Furthermore, it is imperative to invest in global south research. While countries in emergingeconomies are full of potential, and researchers with excellent ideas and capability to innovate, the lack of funding is a huge barrier. Data scientists are paid much less in global south countries than in the west, which discourages many young professionals from committing to research in
the global south. Brain drain is a growing issue of emerging economies, and it is imperative that jobs on data collection be made a lucrative option.
Capacity building through developing training programs on data collection, analysis, and interpretation is crucial for building local capacity. Specialized courses can also enhance the position of data scientists in job markets. Additionally, at a more general level, integrating dataliteracy into formal and informal education curricula can foster the understanding that data is essential for development amongst children and young adults (Halliday, 2019). By equipping students with data skills, educational institutions can prepare the next generation of data professionals and empower communities to harness the power of data for positive change. ‘Scaling up of digital data literacy and internet infrastructure development as well as power accessibility’ is imperative, especially developing countries (Kabatangare, 2021).
Challenges in Implementing Recommendation: Case of South Asia
While the approach mentioned above can be useful or many countries, it is also essential to note that certain countries may have even more specific problems which may cause hindrance in developing systems of better data collection. There are several obstacles that South Asian countries face when efforts to develop a better data
infrastructure is made. Firstly, it is essential to identify the actors involved. In South Asian countries such as India, Pakistan and Bangladesh, historically, government bodies have played an essential role in developing all kinds of infrastructure. Economies of India and Bangladesh have grown exponentially since 1990, and so has data infrastructure levels. Yet, these countries face issues when it comes to data availability, accessibility, and collection method. As mentioned
earlier, diversity and language barriers have posed challenges when it comes to developing more robust frameworks of data collection in global south country. With a population of 1.6 billion, data collection in India is a goliath task. Regional diversity and its size also have, in the past, made data collection difficult. However, strides in better data infrastructure have been made, and further growth is necessary. When it comes to the obstacles of implementing this change, there are issues of limited financial resources and capacity constraints, which may make upscaling data infrastructure difficult. Although South Asian economies are growing at an unprecedented rate, the multiplicity of issues that this region faces makes acquiring funds for data infrastructure difficult. It is essential for South Asian to encourage research and development, and to form collaborations with the private sector, as well as international governments. Without a push from national governments, and other actors involved in data collection and storage, funding for data infrastructure will continue to be an afterthought.
Additionally, South Asian countries face a typical issue when it comes to data collection, and storage; bureaucracy. Ineffective bureaucratic processes for acquiring licenses, permits, or approvals has in the past caused delays on data collection efforts in South Asia (Munshi, 2012). Inefficiency in bureaucracy also hinders growth, and may delay development of data infrastructure. Along with this, South Asian countries also face fragmented bureaucratic structures and lack of coordination among government agencies. This is especially true in India, where bureaucratic and political systems are decentralised. India has a ‘decentralised setup, so most of the data collection is done by the relevant sector ministries’, notes the director general of India’s Central Statistics Office. Therefore, development in data collection can be fragmented in South Asia and cause inefficiency. Regardless, it is essential for the region to continue the pursuit of better data infrastructure. As a rapidly growing economy, the region needs to invest in data, without which policy decisions leading to well-rounded development will be difficult. Constant push for better funding, more cooperation, and capacity building is crucial.
. Arora, P. (2016). Bottom of the data pyramid: Big data and the global south. International Journal of Communication. https://ijoc.org/index.php/ijoc/article/view/4297
2. Asher, J. (2010). Collecting data in challenging settings. CHANCE, 23(2), 6–13. https://doi.org/10.1080/09332480.2010.10739799
3. Asian Development Bank. (2022, August 23). Southeast Asia Big Data use to generate
over $100 billion in health, Jobs, Social Protection - ADB. https://www.adb.org/news/southeast-asia-big-data-use-generate-over-100-billion-health-
jobs-social-protection-adb
4. Bansal, S. (2017, September 21). From missing data to unreliable numbers, India’s
statistical ecosystem needs an overhaul. Hindustan Times. https://www.hindustantimes.com/india-news/from-missing-data-to-unreliable-numbers-
india-s-public-health-ecosystem-can-use-a-booster-shot/story-hXxUNmW9DBf7KdbIzlq1LP.html
5. Bolsunovskaya, M. V., Shirokova, S. V., Loginova, A. V., & Uspenskiy, M. B. (2020,
September 1). Development of tools for improving the data storage systems reliability as a part of digital transformation strategy. IOP Conference Series: Materials Science and Engineering. https://iopscience.iop.org/article/10.1088/1757-899X/940/1/012010/meta
6. Ciuffa, C., & Ade, J. (2022, December 14). How can we make research more inclusive of the Global South?: Elsevier. www.elsevier.com. https://www.elsevier.com/connect/how-can-we-make-research-more-inclusive-of-the-global-south
7. Davison, L. (2023, October 24). Addressing disinformation in the Global South. Tech Policy Press. https://www.techpolicy.press/addressing-disinformation-in-the-global-south/
8. Frers, V. (2023, January 31). How to reduce global asymmetries in open government data in the Global South: A Latin American perspective. CIPPEC.
https://www.cippec.org/how-to-reduce-global-asymmetries-in-open-government-data in-
the-global-south-a-latin-american-perspective/
9. Grynspan , R. (2024, February 1). The importance of south-south cooperation.
International Labour Organization. https://www.ilo.org/resource/news/importance-south-south-cooperation
10. Halliday, S. D. (2019). Data Literacy in Economic Development. The Journal of Economic Education, 50(3), 284–298. https://doi.org/10.1080/00220485.2019.1618762
11. Javier Pallero, V. A. (2023, January 13). Your data used against you: Reports of
manipulation on WhatsApp ahead of Brazil’s election. Access Now. https://www.accessnow.org/your-data-used-against-you-reports-of-manipulation-on-
whatsapp-ahead-of-brazils-election/
12. Kabatangare, T. G. (2021). Data Literacy Integration into development agenda. A catalyst to achieving the Sustainable Development Goals (sdgs). IASSIST Quarterly, 45(3–4).https://doi.org/10.29173/iq1003
13. Lebel, J., & McLean, R. (2018, July 4). A better measure of research from the Global South. Nature News. https://www.nature.com/articles/d41586-018-05581-4 5. Mehmood, H. (2021). Data drought in the Global South. Our World.
https://ourworld.unu.edu/en/data-drought-in-the-global-south
14. Munshi, N. (2012, January). Indian data collection: a long way to go. Indian data collection: A long way to go. https://www.ft.com/content/00883e86-f066-39e5-9ef2-cb0ffdb9a62f
15. OECD. (2024). Data for policy. OECD Regional, rural and urban development. https://www.oecd.org/regional/regional-statistics/data-for-policy.htm
16. Pontille , D., & Denis, J. (2021, October). Demystifying and Repoliticizing Urban Data.
PCA. https://www.pca-stream.com/en/explore/demystifying-and-repoliticizing-urban-data/
17. Rinaldi. (2023, August 12). The vital role of south-south cooperation and its importance for the summit of the future. Global Governance Forum.
https://globalgovernanceforum.org/vital-role-south-south-cooperation-importance-summit-of-the-future/
18 Szczepański , M. (2020, January). Is Data the new oil? competition issues in the Digital Economy. European Parliamentary Research Service.
19. Tatah, L. (2020, November 21). Data poverty amid abundance: Challenges for the
global south. Medium. https://medium.com/good-data-initiative/data-poverty-amid-abundance-challenges-for-the-global-south-bbf7eceb8e70
20. UN Habitat. (2024). Urban Indicators Datasets. Urban Indicators Database. https://data.unhabitat.org/pages/datasets
21. World Bank Data. (2023). Digital Progress and Trends Report 2023- Digital Services Sector Growth. Digital Progress and Trends Report.
https://www.worldbank.org/en/publication/digital-progress-and-trends-report