Collecting, Visualising and Analysing Twitter Data with the Collaborative Online Social Media Observatory (COSMOS)


In the preceding articles, it has been outlined how to collect Twitter data within R, the statistical software environment. Using R to collect Twitter data is a good method for gathering the data you require for research, though, a pitfall in using R is that it does come with a steep learning curve. This is something academics, researchers and students may not have the time to learn the fundamentals of, let alone master such software. This is particularly the case when it comes to cleaning and analysing data sets, as this can be quite complex and time-consuming.

However, at Cardiff University, the Social Data Science Lab (which is an Economic and Social Research Council Data Investment) has created free to use software called the Collaborative Online Social Media Observatory (COSMOS). The COSMOS software allows Twitter data to be collected, visualised and analysed in a user-friendly and feature-rich environment. I myself use COSMOS, in tandem with R, on a routine basis to collect, visualise and analyse data. Some of the features I find particularly helpful within COSMOS are as follows: the ability to construct network analysis graphs, create frequency graphs, word clouds, built-in sentiment scores powered by SentiStrength and the capability to link other data (such as census data) alongside the social media data already gathered.

As you can see in my guide on collecting Twitter data using keywords with R, the process of collecting Twitter data in R may perhaps be considered by some to be complicated and long-winded as you have to write out confusing looking lines of code, and there is also the eventuality that the code will bring about some kind of error message which you may have no idea how to interrupt or overcome. While I believe problem solving is good for researchers in becoming familiar and fluent in using computational software like R, the time constraints which come with conducting research can be a determining factor when deciding which software to use for data collection and analysis. Thus, if you are a social scientist branching into the intersecting realm of social and computer science methods, choosing the right kind of computational apparatus can be a daunting process.

Fear not, collecting data within COSMOS is a seamless experience, it is as easy as logging in with your Twitter credentials and inputting the keywords you wish to search for. In addition, when collecting Twitter data over an extended period of time (let’s say two months) COSMOS allows you to take snapshots (in seconds, minutes and hours) of the data without stopping the collection. This provides the opportunity to look at a sample of the data before the collection is stopped. Hence, the COSMOS software can be viewed as an attractive and useful computational tool for collecting Twitter data for research projects.

What is more, the software really shines when it comes to visualising and analysing data (as I have briefly outlined above), it allows the user to do many things which when replicated in other software environments similar results can be difficult to achieve with such ease. In conjunction with this, another great feature built into COSMOS is that once the Twitter data has been collected the data can be exported in a variety of popular formats (such as, CSV, JSON, EXCEL). Meaning that the data can easily be imported into other software if required, such as in R or SPSS.

Therefore, if you are planning on using Twitter data, or are currently using such data within research, I would definitely recommend utilising COSMOS. The software is free and available for those in academia, government and third-sector organisations, all that is required is to fill out the request form on the Social Data Science Lab software page.