Twitter, research and content analysis software tools

Content analysis or textual analysis is a research method, which was developed to examine the content of communication. In 1948 Harold Lasswell defined content analysis as a set of questions that help identify the core communication content elements: “Who says What through What channel to Whom and with What effect?”.

content_analysis_twitterHeinderyckx[1] (1999) calls this set of questions “Lasswell’s paradigm of the 5W” (p.34), but also draws the attention to the missing mark of the context and the Why (p.35).

Benoit[2] (2011) discusses the basics of content analysis in Political communication, some previous definitions coined by scholars, content analysis categories, content coding as well as human versus Computer-Assisted Qualitative Data Analysis Software (CAQDAS). He also makes a critical review of the existing content analysis definitions and concludes that “political communication research places texts and their content to the forefront of theory and research” (p. 276). Furthermore, he argues that human versus CAQDAS raises a number of points that it would be worth comparing. For instance, a researcher willing to employ CAQDAS must determine whether raw data could be available or digitalised to be processed via the software. Before making a decision on whether employing human analysis or CAQDAS, a researcher should ask: “Is a content analysis programme available that will be able to test the hypothesis or answer the research questions posited in the research?” (p.275).

According to Benoit CAQDAS obviously has a number of benefits over human content analysis but nevertheless some limitations must be taken into account (Table 1, Adapted from Benoit, 2011, pp. 268-277).

Table 1: Human content analysis and CAQDAS - Benefits vs. limitations

Table 1: Human content analysis and CAQDAS – Benefits vs. limitations

There is a large variety of software applications to assist a researcher while analysing qualitative data. The CAQDAS are meant to assist with transcription analysis, coding and text interpretation, content analysis, discourse analysis and grounded theory methodology, to name but a few.

Baugh et al[3]. (2010) warn that a researcher should not let the CAQDAS control the data analysis. Therefore, according to them, no tool could replace the human capacity of going through the data and developing conclusions. However, CAQDAS helps save time and enables one to better focus on making a “deeper and richer data evaluation” (p.70). While employing CAQDAS, always there is a risk of being unable to cope with a large amount of data collected via the Internet for example.

Einspänner et al[4]. (2014) identify five major benefits and limitations of employing CAQDAS (Table 2, Adapted from Einspänner et al., p. 106).

Table 2: CAQDAS software – Benefits vs. limitations

Table 2: CAQDAS software – Benefits vs. limitations

There is no CAQDAS application that does the job effortlessly, as each application was designed to serve a certain purpose and deliver specific outcomes. Each application therefore has its own limitations and cannot be employed for all types of qualitative analysis activities. According to a researcher needs, Lowe[5] (2003) classifies the CAQDAS into three major groups: a) dictionary-based applications, b) development environments and 3) annotation aids.

This list is not intended to cover all existing CAQDAS applications, but some I discovered while identifying what software could help with my MA research project.

  1. a) Dictionary-based applications

These applications function on the basis of one or more dictionary. The applications analyse the content, by mapping, organising and classifying it according to the dictionaries that are incorporated. A number of applications enable customising dictionaries.

  • CATPAC[6] is a commercial application. Its manufacturer claims it is based on the human brain model to establish meaningful connections between text units that are linked by patterns of similarity following the way they appear in text. The programme does not need pre-coding.
  • Concordance[7] is well-known for its potential to create and establish concordances for literary texts, including expressions and lemmatisation[8]. Concordance is able to understand texts in almost all languages supported by Windows, the computer operating system.
  • Condor 3[9] measures and visualises the content, structure and sentiment of social communication networks over time.
  • Diction[10] can establish the tone of a verbal message with five main features: certainty, activity, optimism, commonality and realism.
  • LEXA[11] Corpus Processing Software processes linguistically text corpus and enables tagging and lemmatisation. The application provides lexical analysis, data processing, information retrieval, database, corpus management and a number of other utilities.
  • LIWC[12] (Linguistic Inquiry and Word Count) is designed by James Pennebaker and his team. It works on the basis of a set of dictionaries. Any software user can build customised dictionaries. The application analyses the way people use words in different contexts – emails, speeches, poems etc. – and detects the use of positive and negative emotions, self-references, causal words and many others. The output consists of about 80 language dimensions.
  • WordStat[13] is a text analysis application which enables extracting themes and trends from a text corpus. It is part of a software package, which works alongside other components: SimStat (statistical data analysis tool) and QDA Miner (a qualitative data analysis software tool).
  • WORDij[14] is an application that determines word co-occurrences and visualises these co-occurrences in network analysis terms.
  1. b) Developments environments

This category of applications enables building up dictionaries, grammar and other text tools in an automatic manner. They are not analysers. They may require some knowledge of computer programming since their behaviour is different from the easy-to-use content analysis tools falling in the first category of dictionary-based applications.

  • DIMAP[15] (DIctionary MAintenance Programs) is a rich application package that enables the creation and maintenance of dictionaries based on natural language processing and language technology applications.
  • Visual Text[16] is a complex application that mainly offers high-level information extraction capabilities and fewer options for traditional content analysis.
  • TABARI[17] (Textual Analysis By Augmented Replacement Instructions) makes an automatic coding of content based on three information keys: actors, verbs and phrases. The application identifies proper nouns, verbs and direct objects rather than using full syntactical analysis.
  1. c) Annotation aids
  • Atlas-ti[18] enables a large variety of annotation styles and note-keeping applications.
  • NVivo[19] is a suitable application for organising and analysing non-numerical or unstructured data, by classifying, sorting and arranging information. It also determines relationships in the data and provides analysis of data elements.

In a state of the art report, Einspänner et al. (2014) summarise the best practices of employing CAQDAS to research Twitter content. Among others, they refer to software for automated coding (QDA Miner, ATLAS.ti, NVivo) and focus on a concrete example of employing QDA Miner to analyse speech act on Twitter. In this context the authors mention LIWC, which fails to treat words if they are either misspelt or abbreviated. This is a major limitation of LIWC.

The Twitter data is created “without being motivated by any research intent, unlike elicited data from interviews, survey etc.” and the CAQDAS software for Twitter research is not the most widely used approach, it can in fact make a content analysis more efficient” (p.99).

Thelwall et al[20]. (2011) used SentiStrength[21] software to analyse tweet bodies. They claim that this was the most appropriate software solution since it could measure the strength of positive and negative sentiment in tweets, being short texts which may even include informal language elements.

Danowski[22] (2012) introduces the existing methods and tools for sentiment analysis. He mentions tweet corpora as an example, which can be analysed either entirely, according to each tweet body or broken in smaller units, the words composing a tweet. Danowski describes his experiences with CAQDAS and makes three recommendations:

  • Amazon Mechanical Turk: a paying online service provided by Amazon.com, available in the USA only. A requestor submits a tweet or more and asks the Turkers (workers employed by the services) to judge to what degree the tweet carries positive and/or negative emotions.
  • WORDij: a software suite developed by Danowski himself. One of its components, WordLink, is able to run a network analysis of a given piece of content. One could select a seed word and the application “traces the shortest paths across all words in the entire network to each of the sentiment words found in the network” (p.2).
  • Linguistic Information and Word Count (LIWC): a licensed piece of software which provides an output of 80 linguistic categories, including attributes about positive and negative emotions. Danowski emphasises that LIWC is suitable for content “not sensitive to the linguistic context for the word” (p.2).

Given my research object and subject I found WORDij and LIWC closer to my needs. I will introduce the two pieces of CAQDAS tools in one of my next articles.

[1] Heinderyckx, F. (1999), Une introduction aux fondements théoriques de l’étude des médias, Liège, Cefal-Sup.

[2] Benoit, W. L. (2011), “Content Analysis in Political Communication” in Bucy, E. P. and Holbert, R. L. (eds.) The Sourcebook for political communication research: methods, measures, and analytical techniques, New York, Routledge

[3] Baugh, J. et al. (2010), Computer assisted qualitative data analysis software: A practical perspective for applied research, in Revista del Instituto Internacional de Costos, 6, 69-81

[4] Einspänner, J. et al. (2014), “Computer-Assisted Content Analysis of Twitter Data” in Weller K. et al. (eds.) Twitter and Society, New York, Peter Lang Publishing

[5] Lowe, W. (2003), Content Analysis Software: A Review Technical Report for the Identity Project, Weatherhead Center for International Affairs, Harvard University

[6] http://www.galileoco.com/N_catpac.asp

[7] http://www.concordancesoftware.co.uk/

[8] Determining a headword or lemma, under which its various forms and related words are placed in order to analyse them as a single item.

[9] https://sites.google.com/site/coincourse2013/tools

[10] http://www.dictionsoftware.com/

[11] http://icame.uib.no/lexainf.html

[12] http://www.liwc.net/

[13] http://provalisresearch.com/

[14] http://wordij.net/

[15] http://www.clres.com/software.html

[16] http://textanalysis.com/

[17] http://eventdata.psu.edu/software.html

[18] http://www.atlasti.com/

[19] http://www.qsrinternational.com/

[20] Thelwall, M. et al. (2011), “Sentiment in Twitter Events”, in Journal of the American Society for Information Science and Technology, 62(2), pp. 406-418

[21] http://sentistrength.wlv.ac.uk/

[22] Danowski, J. A. (2012). “Social network size and designers’ semantic networks for collaboration” in International Journal of Organization Design and Engineering 2(4/2012), pp. 343-361

One thought on “Twitter, research and content analysis software tools

  1. Pingback: Mixed bag for IPOs from Genocea, others | Herpes Survival Kit

Leave a reply, don't be shy! :)

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s