Cloudy with a Chance of Text Analysis

Having created datasets using the previously discussed TAGS and Altmetric Explorer programmes, this week it was time to ‘screw around‘ with them and have a go at a bit of basic text analysis.

As with the discussion over the use of altmetrics for measuring academic impact (see previous post), there is a continuing debate over whether the kinds of analysis and visualisations I am about to demonstrate should be accepted as serious research tools (Jacob Harris thinks word clouds can’t). Tied in with this is the worry that by moving towards ‘distant reading‘ and having the ability to easily mine huge amounts of data in a quantitative way, we risk losing the qualitative benefits of more traditional ‘close reading‘.

Although they may attract criticism, and despite their ubiquity across the web (e.g. the way my tags are displayed on this blog), I enjoyed creating word clouds with a couple of different tools. Perhaps the most (in)famous of these is Wordle, and this was where we began in our DITA lab session:

WordleThis word cloud was created using the text from my TAGS dataset, showing all tweets over a certain period which included ‘#citylis’. Wordle automatically filters out certain ‘stop words‘, and also allows you to select individual words in the cloud and remove them. In this example the only additional term I removed was ‘#citylis’, as this unsurprisingly dominated the whole cloud and did not need to be there. You can also play with the colours/font/layout etc. (which for me basically involved clicking ‘randomize’ until I got something looking half-decent).

As a quick way to gain an insight into #citylis tweeting behaviour this cloud does seem useful, e.g. allowing you to easily see which individuals have tweeted the most (Ernesto proudly dominating at the top in this example). Also I could have removed frequently used terms such as ‘RT’ and ‘MT’, but here I thought it was a useful visualisation of how much Twitter activity involves sharing and remixing information within a network. However, compared to the more advanced functionality and adaptability of a tool such as the TAGSExplorer, where you can not only see who has been tweeting about what, but also create a visualisation of the network and how people have interacted, Wordle does not really offer much.

Another text analysis tool we tested was Voyant Tools, which I used to analyse the dataset I created with Altmetric Explorer using the search term ‘discovery tools’. From this dataset the text I extracted was the resulting article titles from my Altmetric search. Here is the resulting word cloud (a feature which Voyant have aptly named ‘Cirrus‘):

VoyantThis aspect of Voyant is not particularly different to that offered by Wordle, although it does allow you to edit and save lists of stop words, allowing for easier manipulation of the data. In terms of my own research I am interested in how the success of discovery tools in libraries is being evaluated, and comparisons between different services, so it is helpful to see terms such as ‘usability’, ‘service’ and ‘comparing’ cropping up in the cloud.

Also I would have expected to see the names of specific discovery tools and their providers (such as Primo or EBSCO) appearing more prominently, but the only one in the cloud is ‘Summon’. However we must remember that this set of text only includes the titles of the articles, and not the content, so we cannot draw too many conclusions.

Where Voyant goes further than Wordle is that it offers several additional tools beyond the initially useful, but ultimately shallow, word cloud:

Voyant2This allows you to do much more with the text, for example viewing a selected word in context (an example of concordancing), and visualising the frequency of words throughout the text with the ‘Word Trends’ feature. Voyant also allows you to export the data created by its different features in a variety of ways, meaning you are not limited to their own visualisations. My dataset here was perhaps a bit limited in size and scope, only featuring article titles rather than large bodies of text, so I was not particularly stretching the potential of Voyant’s features but I still found it useful in terms of identifying, locating and analysing potentially interesting words and trends within these titles.

Overall it seems that a word cloud can point you towards initial areas of interest when analysing text, but then additional tools such as those offered by Voyant can help you to delve further into whether what you have discovered is relevant or not, as well as potentially leading to new arguments and interpretations – combining quantitave data with qualitative analysis.


One response to “Cloudy with a Chance of Text Analysis

  1. Really well explained – a good summary of the pros and cons of the tools available. My take on wordle is that for it to be useful, it just requires a bit of thought as to; (a) whether it’s actually appropriate for the task at hand; and (b) which words should be omitted in order to make the resulting visualisation more meaningful.

    Liked by 1 person

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s