Summary of Project Plan
In this project, we seek to explore the mapping and tracking of rumours and characterise how information is diffused and propagated in populations.
Non-technical Description
In particular, we wish to make use of recent advances in computer science, computational linguistics, biology and social networks that have allowed researchers to build elaborate and sophisticated surveillance and early warning systems and applied these to the monitoring of infectious diseases, riots and earthquakes. We plan to explore how this type of methodology can be employed to study rumours more generally, with an emphasis on their effects on real and financial markets. In particular, we wish to quantify information diffusion and relate it to movements in key financial variables such as share prices and trading volumes. As a proof of principle, we will conduct a study focusing on rumours about M&A activity.
Technical Description
Information lies at the core of scientific disciplines such as economics and finance, which study the behaviour of individuals and the functioning of markets, and is also important for other social sciences, such as political science and sociology, that study social processes, elections, revolutions and uprisings. While there have been some attempts to study the detailed workings of information dissemination from a theoretical perspective, there is a fundamental trade-off that has seriously curtailed progress on this front. In traditional economics and finance, information diffusion is inferred from observed changes in either share prices or in real decisions such as investment choices. But information itself is rarely quantified directly. Our project seeks to address issue by explicitly mapping information diffusion temporally and contrast it with the type of information diffusion that can be inferred in the traditional manner. This will allow us to directly assess the extent to which markets are efficiently aggregating and disseminating information.
Our work sits at the intersection of two separate literatures. On the one hand, standard (mathematical) rumour theory assumes that individuals are automata that mechanically spread (or cease to spread) rumours (see e.g. Pearce, 2000). This approach, while workable from a practical and mathematical perspective, is undesirable from an economics perspective, because it makes strong assumptions on behaviour. On the other hand, even very small and seemingly trivial departures from the assumption of non-optimising behaviour on the part of individuals, makes the analysis almost intractable. This is the case even under the assumption that individuals are myopic, as they will typically have to form beliefs about a large number of possible (partially unobserved) histories of the rumour and about the behaviour and decision rules employed by people who have passed on the rumour in the past. Tractable frameworks have been analysed by Banerjee (1993), Bloch et al. (2014), Duffie et al. (2009) and Hong et al. (2010). All these papers build on very special assumptions that combine some decision making with tractability. It is extraordinarily challenging to extend these analyses beyond the narrow set of assumptions imposed by those authors and there are a number of central issues that remain unresolved. Banerjee et al. (2013) study information diffusion in the context of microfinance in rural India.
For these reasons, we wish to explore rumour propagation from a somewhat different perspective, made possible by recent advances in linguistics, computer science and the ubiquity of social media. In the past few years, it has been widely recognised that social media like Twitter, Facebook and Instagram can be a valuable addition to traditional media outlets such as TV, radio and the print press. In particular, researchers have found that trawling the internet for information on particular issues, has improved the speed with which important events are recorded. For example, there are presently early warning systems for the detection of earthquakes (Earle, 2010 and Sakaki et al., 2010), symptom-based influenza (Dalton et al., 2009) and more generally for epidemics (Cheng et al., 2009 and Thapen et al., 2016). These automated systems rely on large-scale collection and classification of information, contained in both news feeds and in social media communication (see Ateheh and Khreich, 2015 for a survey of such techniques).
One particularly interesting system which we think would be a good starting point is one called BioCaster, developed by Nigel Collier (co-applicant who is based at the Faculty of Modern and Medieval Languages at the University of Cambridge). Together with collaborators (Collier et al., 2008, 2011), he has developed a system that autonomously and in real time, collects information from thousands of news sources. The system relies on a sophisticated set of algorithms that can categorise information on a set of pre-specified topics (in their case, symptoms and diseases), analyse these and map them onto Google maps. This allows the analyst to follow trends in information flows, both geographically and temporally, in real time. A particularly powerful aspect of the system, is that it is trained to deal effectively with ambiguous and informal information, such as that passed on informally between individuals in short hand communication, and to extract information from context and in eight different languages. Also, the system can discern whether information is “new” or simply a reposting of an existing news item. Although the system has been developed with a view to aid in the surveillance of disease outbreaks and to serve a public health objective, the system’s rule engine could become an incredibly powerful tool to analyse any kind of digital rumour before applying propagation and mapping analysis.
There are many potential applications of this technology for the analysis of financial and real markets. A good starting point would be to extend the system to track and map information about, say, S&P 500 companies. One could track information about the sectors in which they operate, the products they produce, the technology on which they rely, the firms and sectors with which they are related vertically and horizontally and the countries in which they are based and with which they do trade. This information could subsequently be contrasted with real time share prices and trading volume information, to determine statistical dependencies and the speed with which new information is incorporated into prices.
Another set of questions that would be interesting to study, is to characterise the birth and death of rumours. For example, what is it that causes some rumours to catch on and propagate into the population, while others quickly die out? Also, what causes rumours to stop spreading? Is it because their veracity or otherwise is finally established and this is made commonly known, or do people just stop spreading them? Do rumours always have a single originator that then spreads, or are they based on many sources? Is it possible to say something about the nature of the propagation of rumours that are true versus those that are false? In other words, are there features of rumours that turn out to be false that can be detected early on? Last, it would be interesting to find out how individuals make decisions in respect of rumours. Do they mindlessly spread rumours that are the most entertaining or surprising, or only those they believe to be true? Similarly, when and why do they stop spreading them? Furthermore, do individuals make any attempt to verify the truthfulness of rumours before deciding whether to pass them on? The study of this type of question is made possible by methods developed recently by Zubiaga et al. (2016) for the analysis of non-financial rumours. No doubt, there are a multitude of additional questions that will arise, but the above outline gives a flavour of the kind of issues that can potentially be addressed. Doing so could yield very important insights into the functioning of markets and could inspire additional research on the topic.
For our pilot study, we intend to focus on merger and acquisition (M&A) activity. This provides both a well- defined test case, but is also of independent interest as a core field in financial and industrial economics. The main phases of our proposed research are as follows:
-
We will first carefully create a vocabulary of key words on M&A activity, such as {M&A, mergers, acquisitions, takeovers, restructuring, hostile takeover, proxy contests, leveraged buyouts, …}. This ontology will serve to create the dataset extracted from the different sources as described below.
-
Second, we will collect the relevant consumer share transaction data from both social media sources (such as Twitter, financial blogs etc.) and from more traditional sources such as major newspapers (through services such as Factiva and LexisNexis), Bloomberg, Yahoo Finance and Google Finance.
-
The data will then be cleaned and systematised in a format that is amenable to further analysis. In particular, we will identify rumour sources (i.e. news items or Tweets that are the originators of particular rumours) and subsequent responses that can be directly or indirectly linked to the rumour source. The entire structure of relationships between sources and responses will be recorded, thus creating a collection of rumour conversations.
-
Fourth, the rumour conversations will be carefully annotated, i.e. we will evaluate the content of each message along several dimensions as follows: degree of support (did message support or deny rumour, did it request further information or simply provide a comment), degree of veracity (did message express confidence in opinion given or not, or was it neutral) and evidentiality (did message link to external evidence, did it offer an argument for opinion or did it not provide any evidence). This information will be collected in order to ex post characterise whether the veracity of the rumour (when such can be conclusively determined) influences the pattern of diffusion. The annotation scheme will be based on that for RumourEval 2017, a community inspired veracity detection task being run in the Computational Linguistics community.
-
We will also perform some basic statistical/econometric analysis of the collected data and contrast and combine it with data obtained from traditional sources, such as information on public announcements and on price and trading volumes.
-
Finally, we will do some additional technical analysis by building a machine learning model and train it on the textual data we have collected and annotated. This model will serve as a tool to provide forecasts of corporate restructuring making use of future rumour information.
Project Outputs
Will-They-Won’t-They: A Very Large Dataset for Stance Detection on Twitter, Conforti, C., Berndt, J., Pilehvar, M. T., Giannitsarou, C., Toxvaerd, F. and Collier, N., (2020), Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics
STANDER: An Expert-Annotated Dataset for News Stance Detection and Evidence Retrieval, Conforti, C., Berndt, J., Pilehvar, M. T., Giannitsarou, C., Toxvaerd, F. and Collier, N., (2020), Findings of the Association for Computational Linguistics: EMNLP 2020
Synthetic Examples Improve Cross-Target Generalization: A Study on Stance Detection on a Twitter corpus, Conforti, C., Berndt, J., Pilehvar, M. T., Giannitsarou, C., Toxvaerd, F. and Collier, N., (2020), Proceedings of the Eleventh Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis
Adversarial Training for News Stance Detection: Leveraging Signals from a Multi-Genre Corpus, Conforti, C., Berndt, J., Pilehvar, M. T., Giannitsarou, C., Toxvaerd, F. and Collier, N., (2020), Proceedings of the EACL Hackashop on News Media Content Analysis and Automated Report Generation
Incorporating Stock Market Signals for Twitter Stance Detection, Conforti, C., Berndt, J., Pilehvar, M. T., Giannitsarou, C., Toxvaerd, F. and Collier, N., (2022), Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
References
Atefeh, F. and W. Khreich (2015): A Survey of Techniques for Event Detection in Twitter, Computational Intelligence, 31(1), 132-164.
Banerjee, A. V. (1993): The Economics of Rumours, Review of Economic Studies, 60(2), 309-327.
Banerjee, A., A. G. Chandrasekhar, E. Duflo, and M. O. Jackson (2013): The Diffusion of Microfinance, Science, 341(6144).
Bloch, F., G. Demange and R. Kranton (2014): Rumors and Social Networks, mimeo.
Cheng, C. K., E. H. Lau, D. K. Ip, A. S. Yeung, L. M. Ho and B. J. Cowling (2009): A Profile of the Online Dissemination of National Influenza Surveillance Data, BMC Public Health, 9(339).
Collier, N., N. T. Son and N. M. Nguyen (2011): OMG U got flu? Analysis of Shared Health Messages for Bio-Surveillance, Journal of Biomedical Semantics, 2(Suppl 5): S9.
Collier, N. et al. (2008): BioCaster: Detecting Public Health Rumors with a Web-Based Text Mining System, Bioinformatics, 24(24), 2940-2941.
Dalton, C., D. Durrheim, J. Fejsa, L. Francis, S. Carlson, E. T. d'Espaignet and F. Tuyl (2009): Flutracking: A Weekly Australian Community Online Survey of Influenza-Like Illness in 2006, 2007 and 2008, Communicable Disease Intelligence, 33(3), 316-22.
Duffie, D., S. Malamud and G. Manso (2009): Information Percolation with Equilibrium Search Dynamics, Econometrica, 77(5), 1513-1574.
Earle, P. (2010): Earthquake Twitter, Nature Geoscience, 3(4), 221-222.
Hong, D., H. Hong and A. Ungureanu (2010): Diffusion of Opinions and Price-Volume Dynamics, mimeo.
Pearce, C. E. M. (2000): The Exact Solution of the General Stochastic Rumour, Mathematical and Computer Modelling, 31(10-12), 289-298.
Sakaki, T., M. Okazaki and Y. Matsuo (2010): Earthquake Shakes Twitter Users: Real-Time Event Detection by Social Sensors, Proceedings of the 19th international conference on World Wide Web, 851-860.
Thapen, N., D. Simmie, C. Hankin and J. Gillard (2016): DEFENDER: Detecting and Forecasting Epidemics Using Novel Data-Analytics for Enhanced Response, PloS One, 11(5), e0155417.
Zubiaga , A. et al. (2016): Analysing How People Orient to and Spread Rumours in Social Media by Looking at Conversational Threads, PLOS ONE, March 4, 2016.