Alternative data in China
By Hinesh Kalian , Man Group
Published: 30 November 2021
As the world’s second-largest economy with roughly 1.3 billion tech savvy consumers, high rates of penetration for mobile internet and rapidly increasing disposable incomes, China is an enticing prospect for equity investors. Indeed, for some, it’s the final frontier, a vast untapped market with the divergence to drive alpha returns and the economic growth to support beta. But these characteristics don’t just make China an attractive equity market – they also mean that China generates a wealth of alternative data. As China opens its A-shares market to foreign investors, it has now become the perfect breeding ground for alternative data strategies.
In this article, we provide an overview of the growth of alternative data in China and the need to use local proxies instead of more established global alternative data providers.
Big, big, bigger
The first thing to note is the sheer scale of the growth of the Chinese alternative data market. The size of the Chinese big data market has grown by nearly 600% since 2015 (Figure 1). Indeed, on data scouting platform Neudata, there are now more than 1,100 China-specific data sets. Likewise, the number of China-related alternative data providers has also grown rapidly over the past few years (Figure 2), showing the symbiosis of the two: the more that the size of the data market increases, the bigger the opportunity for alternative data providers.
Figure 1: Chinese big data market
Figure 2: Number of Neudata China-Focused Altdata Providers
What makes this scale important is that it is somewhat lopsided compared with the size and sophistication of the Chinese equity market. China generates data in line with its status as the world’s second-largest economy, creating datasets which are large and robust enough to have predictive power. In contrast, its equity markets have yet to reach the critical mass of institutional investors required to erode alpha, even though the data required to run sophisticated strategies is now plentiful, in our view.
Local versus global
However, all the datasets in the world isn’t enough if investors are unable to understand which has predictive power and which does not. The most important factor to understand is the unique way in which data is generated in China: the 1.3 billion Chinese consumers do not generate data via Google, Twitter or online forums such as Reddit’s WallStreetBets in the same way that consumers do in the West. Instead, there is usually a Chinese proxy which fulfils the same function, generating equivalent types of alternative data (Figure 3). As a result, investors who use alternative data signals based on Google search trends or WallStreetBets in their global portfolios will need to consider a local proxy to generate similar insights in the Chinese market.
Figure 3: Global versus local data generators
|Alt Data Taxonomy||Example Global Source||Example Local source||Use Case Examples|
|Using blogs or forums to identify retail sentiment|
|Online Search||Google Search Trends||Baidu||Gauging consumer trends|
|Gauging overall consumer spend of goods and services|
|Identifying relative pricing of e-commerce products|
|Dianping||Determining sentiment for goods and services|
|Using companies’ employment metrics as a proxy for growth|
||Gauging overall consumer trends, from sentiments to spend|
Source: Man Group; as of May 2021.
To give an example of how this data can be applied, consider a Chinese hog producer listed in the CSI300 Index. A company with its characteristics is unlikely to appear on WallStreetBets or in popular global job websites. However, by using local equivalents, we can get near real-time insights into the company’s activity through local information. In 2019, posts on China’s internet stock message board website Guba showed increased retail interest towards the stock and accurately predicted a frenzy of buying. Likewise, rising numbers of job posts for the company indicated growth throughout early 2020, despite the ongoing pandemic (Figures 4-5).
Figure 4: Weekly Guba posts – Chinese hog producer
Figure 5: Monthly Job Posts – Chinese Hog Producer
Having a focus on local data can also provide an insight into changing consumer tastes. Data from Tmall (a business-to-consumer sales platform that is a Chinese Amazon-equivalent) showed how Chinese consumers shifted their purchases from international sportwear brands such as Nike and Adidas to more domestic brands such as Anta and Li Ning. Again, this insight (and any subsequent effect on stock prices) wouldn’t be available using the normal alternative data channels, which focus on global consumption.
Figure 6: Tmall sportwear sales – by brand
Similarly, industry-wide trends can be monitored effectively by using Chinese alternative data. In this case, we use data from Ctrip and Qunar, two travel apps which cater to Chinese consumers. Figure 7 shows the number of daily active users, total time spent on the app and time per user. As we would expect, usage fell dramatically with the onset of COVID-19. However, by monitoring ongoing usage, investors are able to observe the extent to which Chinese consumers have retained interest in travelling, monitoring its rise and fall in line with changing restrictions and the progress of new variants and cases.
Figure 7a: Chinese travel apps – daily active users
Figure 7b: Chinese travel apps – time spent
Figure 7c: Chinese travel apps – time spent per user
Considerations when using Chinese alternative data
So, Chinese alternative data can provide investors with unique insights into the Chinese equity market. However, to handle the data effectively, firms must account for four factors:
- Local knowledge: Local knowledge is required to know where valuable nuggets of data can be found, and, perhaps more importantly, to judge data quality and vendor methodologies. This knowledge can be quite nuanced, such as knowing the difference between Alibaba’s Tmall versus Pinduoduo when using e-commerce data, or the terms consumers use when searching for luxury goods
- Language skills: These are required across a variety of touch points in the alternative data life cycle, from reaching out to a small local vendor to understanding data dictionaries and error messages to, ultimately, the understanding the data itself. Depending on what kind of analysis the user intends to perform, analysts may also benefit from a tech stack that can support a variety of Chinese dialects for natural language processing
- Local vendors: Some interesting, smaller vendors may not be as experienced as their global counterparts and may have different standards when it comes to data and compliance. In light of the fast-evolving space, vendors also risk becoming obsolete. Analysts must therefore have a deep understanding of the local vendor space, keeping abreast of both local trends and best practice
- Local regulation: The use of Chinese alternative data is subject to an evolving legal and regulatory regime, including the Data Security Law which will come into force in September 2021. Practitioners must be aware of regulation which covers cross-border transfer of certain data types
These challenges indisputably add to complexity and barriers to entry when exploring alternative data in China. As more data and vendors enter the space, those firms who are able to invest the time and resources, both in terms of skilled analysts and data platforms, give themselves the best chance of extracting signals from the ever-increasing noise.
China is already one of the largest markets in the world when it comes to equities. It also remains an opportunity-rich market, which gives rise to a growing demand for data. While some global alternative datasets may be accurate for the onshore market, as more and more Chinese data is created, local alternative data is becoming an increasingly important source of insight.
To use this new data well, investors should seek to adapt their processes to take account of the different way that Chinese data is generated: partnering with local data providers, looking at unfamiliar but popular websites instead of those more common globally, and ensuring that technology stacks and researchers are able to handle the nuances of the new Chinese data.