Twitter content analysis: WORDij and LIWC software tools

WORDij and LIWC are two Computer-Assisted Qualitative Data Analysis Software tools (CAQDAS) that could be successfully employed to run a content analysis of tweets.

WORDij

WORDij[1] includes a number of components that serve different research purposes. WordLink is part of the WORDij suite. It extracts word pairs, which “are the basis for creating networks” (Danowski[2], 2012, p.3). The words are the network nodes, which are called “link strengths” that are inter-connected, according to the word co-occurrence frequency.

Word pairs are made based on word proximity, which, according to Danowski, is calculated by a “window that slides through the text, counting all word pairs inside as it moves from word in the full text” (p.4). The application counts “pairs 3 positions before each focal word, as well as those within 3 words after it”, which means that the key parameter is 3 by default. The parameter value can be customised according to one’s needs. An example of a semantic network is pictured in Figure 1.

20_Nodes_SocialEuropefinal100

Figure: A spring embedded graph consisting of nodes and arrows making a semantic network of the top 20 nodes and 3 minimum link values, designed with WORDij software suite, for the Twitter communications of Social Europe, in 2012

The WORDij software suite has mainly been used to carry out “text mining and semantic network analysis” (Yuan et al., forthcoming, p.1). The word pair co-occurrence is explained by Danowski[3] (2013) as follows:

“Defining word-pair link strength as the number of times each word occurs closely in text with another, all possible word pairs have an occurrence distribution whose values range from zero on up. This ratio scale of measurement allows the use of sophisticated statistical tools from social network analysis toolkits. These enable the mapping of the structure of the word network. They identify word groups, or clusters, and quantify the structure of the network at different levels. Using these word-pair data as input to network analysis tools, you map the language landscape. On the map, instead of cities, the nodes are words. Rather than roads, there are links or edges among words” (2013, online).

In a paper introducing the outcomes of a research activity that compared the professional behaviour of high-contact vs. low-contact designers who are members of the LinkedIn platform, Danowski (2012) describes the performances of WORDij, which can detect proximate word pairs in a manner which does not apply the “bag of words” approach. On the contrary, WordLink, the WORDij’s component, counts “words as paired that appeared anywhere in the same profile document” (Danowski, 2012, p.9). The “bag of words” is a research technique where a text corpus is represented as the bag containing its words, ignoring grammar and word order, but preserving the word occurrence.

WordLink therefore detects word pairs “within 3 word positions on either side of each word in the text” (p.9). Danowski used a “stop list” to remove common function words in the content body. He then discarded frequencies of 1 and 2 for words and word pairs, as this “is supported by empirical research in natural language processing” (p.9). Further, he dropped numerals, punctuation, and normalised contractions. He then used advanced WORDij functions to “test for differences in word-pair frequencies” (p.9).

Danowski (2012) argues that WORDij could be employed for automatic link coding with proximities since it overcomes the problems that occur when employing the “bag of words” method:

“While word bags are useful for document retrieval they blur social meaning by ignoring the relationships of social units within the texts, whether these units are words, people, or other entities” (p.217).

LIWC

In support to employing CAQDAS, Gluesing et al[4]. (2009) provide clear evidence that “it is possible to design research that takes full advantage of information technologies to gather large amounts of data for data mining and network analysis, but also to embed qualitative methods in parallel and in a measured, targeted way to maximize the richness of results while minimizing the costs usually involved in long-term, labour-intensive ethnographic studies” (p.25).

Derczynski et al[5]. (2013) consider Twitter a “noisy environment” given the SMS-like behaviour of users, where people tend to use word abbreviations and SMS conventions. To overcome or reduce linguistic noise, one needs to normalise data. That requires additional work for researchers when they analyse Twitter content. It means that the researchers have to transform abbreviations and other conventions into their full word equivalent. The work can be done in two stages:

1) identification of orthographic errors and

2) the correction of the errors (p.7).

Elson et al[6]. (2012) state that until 2012 researchers studied social media content by employing a manual approach: focusing on certain pieces of content and on specific users and interpreting and reporting the findings. The authors mention that the researchers might have analysed the irrelevant pieces of content, so that their findings could apply to particular cases and situations and may not apply to Twitter mass communication.

The authors propose to employ LIWC[7] (The Linguistic Inquiry and Word Count) as a “computerized method to study the content of social media”, which may compensate for the limitations of the “manual” methods and therefore reduce “the chance that their biases affect their interpretations of social media texts” (p.xii).

Despite its later popularity as a piece of CAQDAS, LIWC “is largely untested in political contexts”, which was the case before 2012, when this paper was published.

The authors employed LIWC to “investigate how men and women communicate differently” and realised that the software application “has not been widely applied to understanding a non-Western political context” (p.xii).

Elson et al. (2012) analysed Iranian Twitter users’ opinion and what they felt about the 2009 election in Iran and what were their attitudes towards certain countries. The researchers actually wanted to validate the LIWC methodology on a special piece of Twitter content, which has not been done previously.

It is worth noting that Elson et al. reviewed some of the previous research projects that employed LIWC as a CAQDAS and claim that LIWC provides clear evidence on people’s behaviours, attitudes and their emotions. For example “greater use of first-person singular pronouns… has been shown to suggest feelings of depression” while “second-person or plural pronouns indicate reaching out to others… and a sense of community or group identity” (p.xiii). LIWC has proven to be a reliable method to identify “emotions in written language… and the results it generates are comparable to those produced with other content-analysis methods” (p.xiii). Furthermore “LIWC has been successfully applied more recently to various forms of social media” (p.xiii) and the procedure “holds much promise” conclude Elson et al. (p.xvii).

The LICW software application was developed by the psychologist James W. Pennebaker and his team. LIWC processes texts and provides information about 80 linguistic categories contained in the analysed texts. The LIWC power consists of the ability to output information about the positive and negative emotions carried by the texts:

“LIWC represents only a transitional text analysis program in the shift from traditional language analysis to a new era of language analysis” (Tausczik and Pennebaker[8], 2010, p. 38).

LIWC goes through any type of text, including blogs, novels, speeches, poems etc. and checks each word against a set of dictionaries, which are embedded in the application. The dictionaries define a particular word category and “capture different psychological concepts” (Pennebaker[9], 2011, p. 6). Once the programme has checked and counted all the words, it then “calculates a ratio of the number of words in each word category” (Servi and Elson[10], 2012).

Pennebaker (2011) acknowledges some LIWC limits. According to Pennebaker, word counting programmes cannot detect linguistic nuances such as irony or sarcasm. LIWC also fails “to capture the context of language. One word, for example, can have very different meanings, depending on how it is used” (p.8). However, researchers aim to create “smarter word-count programs that will eventually take into account syntax, grammar, and context in general” (p.9).

References

[1] http://wordij.net/

[2] Danowski, J. A. (2012). “Social network size and designers’ semantic networks for collaboration” in International Journal of Organization Design and Engineering 2(4/2012)

[3] Danowski, J. A. (2013). WORDij version 3.0: Semantic network analysis software, Chicago: University of Illinois at Chicago, Available from http://wordij.net/

[4] Gluesing, J. et al. (2009), “Mixing ethnography and information technology data mining to visualize innovation networks in global networked organizations” in Dominguez S. and Hollstein B. (eds.) Mixed methods in studying social networks, Cambridge University Press

[5] Derczynski, L. et al. (2013), “Microblog-Genre Noise and Impact on Semantic Annotation Accuracy” in Proceedings of the 24th ACM Conference on Hypertext and Social Media

[6] Elson, S.B. et al. (2012), Technical Report: Using Social Media to Gauge Iranian Public Opinion and Mood After the 2009 Election, Rand Corporation

[7] http://liwc.net/

[8] Tausczik, Y. R. and Pennebaker J. W. (2010), “The Psychological Meaning of Words: LIWC and Computerized Text Analysis Methods”, in Journal of Language and Social Psychology, 29 (I) 24-54, Sage Publications

[9] Pennebaker, J.W. (2011), The Secret Life of Pronouns: What our words say about us, New York, Bloomsbury Press, 2011

[10] Servi, L. and Elson, S. B. (2012) “A Mathematical Approach to Identifying and Forecasting Shifts in the Mood of Social Media Users”, in MITRE Technical Report #120090 p. 27-30

Twitter, research and content analysis software tools

Content analysis or textual analysis is a research method, which was developed to examine the content of communication. In 1948 Harold Lasswell defined content analysis as a set of questions that help identify the core communication content elements: “Who says What through What channel to Whom and with What effect?”.

content_analysis_twitterHeinderyckx[1] (1999) calls this set of questions “Lasswell’s paradigm of the 5W” (p.34), but also draws the attention to the missing mark of the context and the Why (p.35).

Benoit[2] (2011) discusses the basics of content analysis in Political communication, some previous definitions coined by scholars, content analysis categories, content coding as well as human versus Computer-Assisted Qualitative Data Analysis Software (CAQDAS). He also makes a critical review of the existing content analysis definitions and concludes that “political communication research places texts and their content to the forefront of theory and research” (p. 276). Furthermore, he argues that human versus CAQDAS raises a number of points that it would be worth comparing. For instance, a researcher willing to employ CAQDAS must determine whether raw data could be available or digitalised to be processed via the software. Before making a decision on whether employing human analysis or CAQDAS, a researcher should ask: “Is a content analysis programme available that will be able to test the hypothesis or answer the research questions posited in the research?” (p.275).

According to Benoit CAQDAS obviously has a number of benefits over human content analysis but nevertheless some limitations must be taken into account (Table 1, Adapted from Benoit, 2011, pp. 268-277).

Table 1: Human content analysis and CAQDAS - Benefits vs. limitations

Table 1: Human content analysis and CAQDAS – Benefits vs. limitations

There is a large variety of software applications to assist a researcher while analysing qualitative data. The CAQDAS are meant to assist with transcription analysis, coding and text interpretation, content analysis, discourse analysis and grounded theory methodology, to name but a few.

Baugh et al[3]. (2010) warn that a researcher should not let the CAQDAS control the data analysis. Therefore, according to them, no tool could replace the human capacity of going through the data and developing conclusions. However, CAQDAS helps save time and enables one to better focus on making a “deeper and richer data evaluation” (p.70). While employing CAQDAS, always there is a risk of being unable to cope with a large amount of data collected via the Internet for example.

Einspänner et al[4]. (2014) identify five major benefits and limitations of employing CAQDAS (Table 2, Adapted from Einspänner et al., p. 106).

Table 2: CAQDAS software – Benefits vs. limitations

Table 2: CAQDAS software – Benefits vs. limitations

There is no CAQDAS application that does the job effortlessly, as each application was designed to serve a certain purpose and deliver specific outcomes. Each application therefore has its own limitations and cannot be employed for all types of qualitative analysis activities. According to a researcher needs, Lowe[5] (2003) classifies the CAQDAS into three major groups: a) dictionary-based applications, b) development environments and 3) annotation aids.

This list is not intended to cover all existing CAQDAS applications, but some I discovered while identifying what software could help with my MA research project.

  1. a) Dictionary-based applications

These applications function on the basis of one or more dictionary. The applications analyse the content, by mapping, organising and classifying it according to the dictionaries that are incorporated. A number of applications enable customising dictionaries.

  • CATPAC[6] is a commercial application. Its manufacturer claims it is based on the human brain model to establish meaningful connections between text units that are linked by patterns of similarity following the way they appear in text. The programme does not need pre-coding.
  • Concordance[7] is well-known for its potential to create and establish concordances for literary texts, including expressions and lemmatisation[8]. Concordance is able to understand texts in almost all languages supported by Windows, the computer operating system.
  • Condor 3[9] measures and visualises the content, structure and sentiment of social communication networks over time.
  • Diction[10] can establish the tone of a verbal message with five main features: certainty, activity, optimism, commonality and realism.
  • LEXA[11] Corpus Processing Software processes linguistically text corpus and enables tagging and lemmatisation. The application provides lexical analysis, data processing, information retrieval, database, corpus management and a number of other utilities.
  • LIWC[12] (Linguistic Inquiry and Word Count) is designed by James Pennebaker and his team. It works on the basis of a set of dictionaries. Any software user can build customised dictionaries. The application analyses the way people use words in different contexts – emails, speeches, poems etc. – and detects the use of positive and negative emotions, self-references, causal words and many others. The output consists of about 80 language dimensions.
  • WordStat[13] is a text analysis application which enables extracting themes and trends from a text corpus. It is part of a software package, which works alongside other components: SimStat (statistical data analysis tool) and QDA Miner (a qualitative data analysis software tool).
  • WORDij[14] is an application that determines word co-occurrences and visualises these co-occurrences in network analysis terms.
  1. b) Developments environments

This category of applications enables building up dictionaries, grammar and other text tools in an automatic manner. They are not analysers. They may require some knowledge of computer programming since their behaviour is different from the easy-to-use content analysis tools falling in the first category of dictionary-based applications.

  • DIMAP[15] (DIctionary MAintenance Programs) is a rich application package that enables the creation and maintenance of dictionaries based on natural language processing and language technology applications.
  • Visual Text[16] is a complex application that mainly offers high-level information extraction capabilities and fewer options for traditional content analysis.
  • TABARI[17] (Textual Analysis By Augmented Replacement Instructions) makes an automatic coding of content based on three information keys: actors, verbs and phrases. The application identifies proper nouns, verbs and direct objects rather than using full syntactical analysis.
  1. c) Annotation aids
  • Atlas-ti[18] enables a large variety of annotation styles and note-keeping applications.
  • NVivo[19] is a suitable application for organising and analysing non-numerical or unstructured data, by classifying, sorting and arranging information. It also determines relationships in the data and provides analysis of data elements.

In a state of the art report, Einspänner et al. (2014) summarise the best practices of employing CAQDAS to research Twitter content. Among others, they refer to software for automated coding (QDA Miner, ATLAS.ti, NVivo) and focus on a concrete example of employing QDA Miner to analyse speech act on Twitter. In this context the authors mention LIWC, which fails to treat words if they are either misspelt or abbreviated. This is a major limitation of LIWC.

The Twitter data is created “without being motivated by any research intent, unlike elicited data from interviews, survey etc.” and the CAQDAS software for Twitter research is not the most widely used approach, it can in fact make a content analysis more efficient” (p.99).

Thelwall et al[20]. (2011) used SentiStrength[21] software to analyse tweet bodies. They claim that this was the most appropriate software solution since it could measure the strength of positive and negative sentiment in tweets, being short texts which may even include informal language elements.

Danowski[22] (2012) introduces the existing methods and tools for sentiment analysis. He mentions tweet corpora as an example, which can be analysed either entirely, according to each tweet body or broken in smaller units, the words composing a tweet. Danowski describes his experiences with CAQDAS and makes three recommendations:

  • Amazon Mechanical Turk: a paying online service provided by Amazon.com, available in the USA only. A requestor submits a tweet or more and asks the Turkers (workers employed by the services) to judge to what degree the tweet carries positive and/or negative emotions.
  • WORDij: a software suite developed by Danowski himself. One of its components, WordLink, is able to run a network analysis of a given piece of content. One could select a seed word and the application “traces the shortest paths across all words in the entire network to each of the sentiment words found in the network” (p.2).
  • Linguistic Information and Word Count (LIWC): a licensed piece of software which provides an output of 80 linguistic categories, including attributes about positive and negative emotions. Danowski emphasises that LIWC is suitable for content “not sensitive to the linguistic context for the word” (p.2).

Given my research object and subject I found WORDij and LIWC closer to my needs. I will introduce the two pieces of CAQDAS tools in one of my next articles.

[1] Heinderyckx, F. (1999), Une introduction aux fondements théoriques de l’étude des médias, Liège, Cefal-Sup.

[2] Benoit, W. L. (2011), “Content Analysis in Political Communication” in Bucy, E. P. and Holbert, R. L. (eds.) The Sourcebook for political communication research: methods, measures, and analytical techniques, New York, Routledge

[3] Baugh, J. et al. (2010), Computer assisted qualitative data analysis software: A practical perspective for applied research, in Revista del Instituto Internacional de Costos, 6, 69-81

[4] Einspänner, J. et al. (2014), “Computer-Assisted Content Analysis of Twitter Data” in Weller K. et al. (eds.) Twitter and Society, New York, Peter Lang Publishing

[5] Lowe, W. (2003), Content Analysis Software: A Review Technical Report for the Identity Project, Weatherhead Center for International Affairs, Harvard University

[6] http://www.galileoco.com/N_catpac.asp

[7] http://www.concordancesoftware.co.uk/

[8] Determining a headword or lemma, under which its various forms and related words are placed in order to analyse them as a single item.

[9] https://sites.google.com/site/coincourse2013/tools

[10] http://www.dictionsoftware.com/

[11] http://icame.uib.no/lexainf.html

[12] http://www.liwc.net/

[13] http://provalisresearch.com/

[14] http://wordij.net/

[15] http://www.clres.com/software.html

[16] http://textanalysis.com/

[17] http://eventdata.psu.edu/software.html

[18] http://www.atlasti.com/

[19] http://www.qsrinternational.com/

[20] Thelwall, M. et al. (2011), “Sentiment in Twitter Events”, in Journal of the American Society for Information Science and Technology, 62(2), pp. 406-418

[21] http://sentistrength.wlv.ac.uk/

[22] Danowski, J. A. (2012). “Social network size and designers’ semantic networks for collaboration” in International Journal of Organization Design and Engineering 2(4/2012), pp. 343-361

Twitter communication patterns

In a recent study Bruns and Stieglitz[1] (2014) refer to the standard 1/9/90 distribution in the context of online communication. The 1/9/90 distribution is a new concept, which was born with the Internet, and which explains the distribution of content usage and creation in an online collaborative space/social media platform.

twitter_comm_patternsAccording to the concept 1% of the participants act as content creator, 9% of the participants act as editors and 90% of the participants act as content consumers:

“In most online communities, 90% of users are lurkers who never contribute, 9% of users contribute a little, and 1% of users account for almost all the action” (Nielsen[2], 2006, online).

Nielsen’s claim is confirmed by other researchers who found evidence of what happens in an online community. Their conclusions are that 1% are leading users – “those individuals that dominate” (Kardara et al.[3], 2012), e.g. opinion leaders -, 9% are highly active users, while 90% are least active users (Bruns and Stieglitz, p.166) who are “plain participants” or “information consumers” (Kardara et al., 2012, online).

Zappavigna[4] (2012) also confirms the standard 1/9/90 distribution and names the three categories of users as “information sources, friends and information seekers” (p.30) where the “information sources” are the content creators, “friends” are the editors and “information seekers” are the content consumers. Citing Naaman, Boase et al. Zappavigna (p.30) groups Twitter users in two categories:

1) meformers, largely concerned with self, and

2) informers, interested in sharing information” (p.30).

Waters and Williams[5] (2011) compared the Twitter usage by individual politicians and national bodies against Grunig’s four models of public relations. They found that U.S. national bodies use Twitter for a double purpose:

1) “to release information in a one-way manner” and

2) to “foster relationship growth with other Twitter users” (p.361).

These bodies pay less attention to interactivity and conversation since they “have more on their communicative plates” (p.361).

Waters and Williams warn that, contrary to the researchers’ recommendations, the one-way communication models are still the dominant features of the current practice where public relation practitioners pay less attention to the two-way communication approaches.

The researchers demonstrate that the findings of their study contradict the communication “practitioners’ claims of interactivity on Twitter” (p. 353). Furthermore, Waters and Williams adapt the traditional public relation models to the behaviour of users on Twitter, and therefore create a Twitter version of the four models (Table 1, Adapted from Waters and Williams, 2011, pp. 356-357).

twitter_comm_patterns100

Table 1: Twitter communication models based on the traditional public relation models

Waters and Williams (p.356) use a relevant metaphor to describe the two-way symmetrical model:

“Scholars and consultants state that public affairs communications should ideally incorporate the fourth and final model into their communication as often as possible to move toward becoming an audience-centric organization. Just as doves coo to one another to woo the other into a romantic relationship, two-way symmetrical communication promotes a balanced dialogue between an organization and its publics to encourage an open, mutually beneficial relationship”.

Waters and Williams explain why organisations prefer the one-way distribution of messages to mass audiences: they “are not going to abandon the control that they maintain in one-way communications for give-and-take conversations on issues where external stakeholder input is not warranted” (p.359). The authors note, nevertheless, that “public relations scholars have slowly pushed these issues to the periphery in favour of focusing solely on relationship-building approaches and dialogue” (p.359). In terms of Twitter communication practice, Waters and Williams advise communicators and organisations to:

  • personalise their Twitter presence and profile with logo and website URL;
  • balance the number of followees and followers;
  • tweet key information and not overtweet; engaging is crucial;
  • avoid overusing the agentry model leading to publishing overly promotional content;
  • avoid focusing too much on the own content, retweeting or sharing the others’ content increases credibility and helps build a community;
  • avoid ignoring followers’ comments, questions and direct messages;
  • use links consistently to the company website for detailed information;
  • interlink social media/network accounts to expand the network communities to strengthen corporate identity, increase credibility and visibility (Adapted from Waters and Williams, pp. 360).

Summarising a study by Hargittai and Litt, Weigel[6] (2013, online) draws attention to the risk of excluding other populations while analysing samples of Twitter users. It means that the findings of a research activity may concern only Twitter users and should not generalise the conclusions since Twitter is not a familiar platform for all population.

While Twitter has become a strong competitor of the traditional media, Weigel, citing Lampe et al., raises the point of a number of barriers to the use of social media:

“The ability to realize these potential benefits faces inherent barriers in terms of perceptions of social media, ability of administrators to make effective use of social media tools, and the design of software used to operationalize social media “.

[1] Bruns, A. and Stieglitz St. (2014), “Metrics for Understanding Communication on Twitter in Study” in Weller K. et al. (eds.), Twitter and Society, New York, Peter Lang Publishing

[2] Nielsen, J., (2006), Participation Inequality: Encouraging More Users to Contribute

[3] Kardara, M. et al. (2012), Influence Patterns in Topic Communities of Social Media, in WIMS ’12 Proceedings of the 2nd International Conference on Web Intelligence, Mining and Semantics

[4] Zappavigna, M. (2012), The Discourse of Twitter and Social Media, Continuum International Publishing Group, New York

[5] Waters, D., R. and Williams, M., J. (2011), “Squawking, tweeting, cooing, and hooting: analysing the communication patterns of government agencies on Twitter” in Journal of Public Affairs, vol. 11, No. 4, pp. 353-363

[6] Weigel, M. (2013), Twitter, politics and the public: Research roundup“, in Journalist’s Resource

Twitter functions

Although limited to 140-characters, a retweet (RT) could become a powerful piece of content to be consumed and shared. Notwithstanding “why users embrace retweeting, through broadcasting messages, they become part of a broader conversation” (Boyd et al.[1], 2010, p.10).

twitter_functionsBruns and Stieglitz[2] (2014, p.70-73) discuss the basic functions and metrics of Twitter. A tweet body itself includes the author (username), a timestamp and the sender’s profile picture. The body may also include one or multiple hashtags, a reference URL(s), one or multiple @mentions, and the tweet origin, either is genuine (generated by the user) or retweeted by another user.

Boyd et al. (2010) examined retweeting, one of the major Twitter functions, as a “conversational practice” (p.1), which is strongly connected to “authorship, attribution, and communicative fidelity” that “are negotiated in diverse ways” (p1).

Seen as an equivalent of email forwarding, retweeting is “the act of copying and rebroadcasting” that is at the heart of building a “conversational ecology” (p.1) on Twitter. According to the authors, retweeting purposes are twofold:

1) information spreading and

2) validating what other say and/or engaging with the others.

Given the nature of Twitter and its attributes, retweeting may raise concerns about “authorship, attribution and communicative fidelity” (p1). Given the simplicity of Twitter functions, messaging could be both an asset and a drawback.

To support this statement, based on their case study and empirical research, Boyd et al. (2010, p.5) identify two ways of retweeting behaviour:

1) either by preserving or slightly editing the tweet content (where then authorship as such is questionable) and

2) shortening the content through deletion.

The authors also looked at the reasons why people retweet. They listed ten such reasons:

  • to amplify or spread tweets
  • to entertain
  • to add comments to a tweet
  • to make one’s presence as a listener visible
  • to publicly agree with someone
  • to validate others’ thoughts
  • to prove friendship and loyalty
  • to refer to less popular people/content
  • to gain followers or reciprocity from more visible users
  • to save tweets for future personal access (Adapted from Boyd et al., 2010, p.6).

Even though Twitter expanded its functions in the past few years, the platform still bears some ambiguities that imply content modification, authorship acknowledge and “communicative fidelity”, on one hand. On the other hand, given these attributes, Twitter has become a transparent platform where content is public and accessible to anyone.

Bruns and Stieglitz[3] (2012) carried out a comparative study that focused on more than 40 cases covering major topics and events on Twitter: elections, natural disasters, corporate crises and others. Based on the research outcomes and a number of communicative metrics they conclude that “thematic and contextual factors influence the usage of different communicative tools available to Twitter users, such as original tweets, @replies, retweets, and URLs” (p.160) while the communication patterns employed in the context of major topics and events are steady.

“Twitter activities […] appear to be governed by a number of standard practices” (p.178) where Twitter users tend to find, share and re-share breaking news and other “acute events”. Twitter is the “backchannel” where “original commentary [… ] does not engage with the tweets of the others” (p.179) or offer links to additional information. Data gathering implied capturing 40 hashtag datasets covering a wide range of major topics and events, by using yourTwapperkeeper, a tool, which was apparently discontinued by Twitter in 2011.

References

[1] Boyd, D. et al. (2010), Tweet, Tweet, Retweet: Conversational Aspects of Retweeting on Twitter, HICSS-43

[2] Bruns, A. and Stieglitz St. (2014), “Metrics for Understanding Communication on Twitter in Study” in Weller K. et al. (eds.), Twitter and Society, New York, Peter Lang Publishing

[3] Bruns, A. and Stieglitz, St. (2012), Quantitative Approaches to Comparing Communication Patterns on Twitter, in Journal of Technology in Human Services, 30:3-4, 160-185

Produsage and content polarity on Twitter

What is produsage?

The concept of “produsage”, a new term putting together “production” and “usage”, recently coined by Axel Bruns, entails “the assumption that there is a move away from the traditional value production chain of “producer – distributor – consumer” towards a triadic simultaneity of each category” (Horan[1], 2013, p.1).

twitter_polarity

Horan (2013) analyses the “produsage” within social media platforms through a semantic network analysis of Twitter communications.

The traditional production chain does not apply to what happens on Twitter where the content as such is mainly produced by consumers while the platform owners “facilitate modes for distribution”. This does not happen in mass media “where the content is produced by the platform owners and sold to the consumers” (p.1). Horan argues that the business model applies differently to social media platforms “where consumers perform labour for themselves” (p.3).

The study findings prove that produsage as such has a low value for most users, who are passive information consumers, and a significant high value for active users.

Content polarity and Losada Line

Over the past numbers of years researchers analysed the content polarity which is embedded in tweets. Citing Pang and Lee, Thelwall et al[2]. (2011) state that “the research field of sentiment analysis, also known as opinion mining, has developed many algorithms to identify whether an online text is subjective or objective, and whether any opinion expressed is positive or negative” (p.406).

Thelwall et al. looked at sentiment polarity (positive, negative or neutral) on Twitter and whether popular events could be linked to the sentiment strength increase in a given period of time. They found that popular events “are normally associated with increases in negative sentiment strengths and some evidence that peaks of interest in events have stronger positive sentiment than the time before the peak” (p.406). In other words, negative sentiment on Twitter could generate popular events.

Danowski[3] (2012) suggests comparing polarity values against the Losada Line, also known as the “critical positivity ratio”. Following a number of studies focused on the ratios of positive to negative in communications, Fredrickson and Losada[4] (2011) found evidence that 2.9 is the “flourishing” point on the line. Danowski explains that “to flourish means to live within the optimal range of human functioning, one that connotes goodness, generativity, growth, and resilience” (p.2). The values below 2.9 show less effectiveness and are labelled as “languishing”, meaning “distress, impairment and limitations in activities” (p.2), while any value above 11.6 proved to lead to the weakening of the system.

Danowski concludes that Losada Line along with WORDij could be successfully combined to extract relevant information about the polarity of a given text corpus:

“For a system to be flourishing there must be at least 2.9 times more positive than negative communication. Below that ratio, the system is languishing” (p.1).

References

[1] Horan, T.J. (2013), ‘Soft’ versus ‘Hard’ News on Microblogging Networks: Semantic Analysis of Twitter Produsage, in Information, Communication & Society, Taylor & Francis, Vol. 16/1

[2] Thelwall, M. et al. (2011), “Sentiment in Twitter Events”, in Journal of the American Society for Information Science and Technology, 62(2), pp. 406-418

[3] Danowski, J. A. (2012), Sentiment Network Analysis of Taleban and Radio Free Europe/Radio Liberty (RFE/RL) Open-Source Content About Afganistan in Open-Source Intelligence and Web Mining conference [OSINT-WM 2012], Odense, Denmark

[4] Fredrickson, B.L. and Losada M.F. (2011), “Positive Affect and the Complex Dynamics of Human Flourishing” in Am Psychol, American Psychological Association

Twitter: Influence versus passivity

According to Kardara et al. (2012) there is “no standard method in the literature for evaluating the outcomes of an influence criterion”, but recent studies revealed that Twitter influencers may be “the users who produce original content that is frequently retweeted”, though “they avoid getting into discussions or reproducing others’ opinions” (online[1]).

06_influence_pass

Romero et al[2]. (2011) investigated to what extent individuals, governments and companies got the attention of popular users and influencers to spread their “ideas, policies, products and commentary” (p.113) on Twitter. The influence as such comes from the way certain users retweet and therefore re-distribute content that draws the attention of their followers.

Romero et al. employed an algorithm to measure the influence of Twitter. The algorithm is based on a corpus containing 22 million tweets with URLs, posted on Twitter in the 300 hours following 10 September 2009. Each URL got a timestamp and the processed data led to the following conclusions:

  • it is rather hard to get the attention of Twitter users to “rise to the most trending” topics (p.114) given the large amount and frequency of the information distributed on the platform;
  • popularity and influence do not necessarily contribute to information dissemination on Twitter. Their “correlation is weaker than it might be expected” (p.114);
  • the information distribution on Twitter could be propagated better if content authors and their followers “actively engage rather than passively read it and cease to act on it” (p.114).

According to Romero et al. the influence is determined by four factors:

1) content novelty

2) resonance of the content published by the followed users and the content of the followers

3) content quality

4) frequency of the content users create (p.113).

A major stumbling block to the content propagation on the network is the followers’ passivity, even if influencers have a significant number of followers. Romero et al. state that this stumbling block is “often hard to overcome” (p.113).

Romero et al. attempt to introduce a new definition of “influence on social media”, which is not based on individual statistics – number of followers and retweets (RTs) – but on the structural properties of the Twitter platform along with the users’ behaviour and their passivity. The authors explain that the influence “depends on not only the size of the influenced audience, but also on their passivity” (p.113).

Furthermore the authors state that “high popularity does not necessarily imply high influence and vice-versa” (p.113). Twitter users somewhat compete for attention on the platform while distributing a significant information amount, which increases from one day to the next. Some users manage to get the attention of the others and this may lead to increased popularity.

Bakshy et al[3]. (2011) studied the influence of 1.6 million Twitter users and their 74 million events that went out on the platform in two months in 2011. They discovered “the largest cascades tend to be generated by users who have been influential in the past and who have a large number of followers” (p.1). The authors attempt to define the term Twitter “influencers” as users “who exhibit some combination of desirable attributes – whether personal attributes like credibility, expertise, or enthusiasm, or network attributes such as connectivity or centrality – allows them to influence a disproportionately large number of others, possibly indirectly via a cascade of influence” (p.1). Bakshy et al. consider that both ordinary people and experts (journalists and other public figures) could be influencers on Twitter, depending on the configuration of their networks of followers and the role of their tweet content.

Pfitzner et al[4]. (2012) found out that “highly emotional diverse tweets can have up to almost five times higher chances of being retweeted” (p.546). Following a study where sentiment extraction techniques were employed, they claim that Twitter in practice mainly involves two major actions: “information creation and subsequent distribution (tweeting) and pure information distribution (retweeting), with pronounced preference to the first” (p.543). The tweets carrying a “high emotional diversity have a better chance of being retweeted, hence influencing the distribution of information” (p.543).

[1] Kardara, M. et al. (2012), Influence Patterns in Topic Communities of Social Media, in WIMS ’12 Proceedings of the 2nd International Conference on Web Intelligence, Mining and Semantics

[2] Romero, M. D. et al., (2011), “Influence and passivity in social media” in Proceeding: WWW ’11 Proceedings of the 20th international conference companion on World Wide Web, pp. 113-114, ACM New York

[3] Bakshy, E. et al. (2011), Everyone’s an influencer: Quantifying influence on Twitter, in Proceedings of the fourth ACM international conference on Web search and data mining (2011), pp. 65-74

[4] Pfitzner R. et al. (2012), Emotional Divergence Influences Information Spreading in Twitter, in ICWSM, The AAAI

Twitter, political communication and research

This article covers some key aspects of communication worldwide with a closer look at the social, cultural and political dimensions of the communication on Twitter.

An introduction to political communication research across the world

Holtz-Bacha and Kaid[1] (2011) make a critical review of the research methodologies employed by worldwide researchers to study political communication. They also discuss classifications of political, media and cultural systems, methodological issues in international comparisons, research issues across countries and measuring political advertising and political debates across cultures.

Hashtag cloud featuring the hashtags employed by EU Commissioner László Andor in his Twitter communications, in 2012

Hashtag cloud featuring the hashtags employed by EU Commissioner László Andor in his Twitter communications, in 2012

They stress that, in the context of the political communication research, the employed methodology varies from one country to another as the research context mirrors specific cultural settings. Therefore it is difficult to compare the research outcomes coming from different countries, since they embed different cultural perceptions and behaviours.

Considering previous research reports, Holtz-Bacha and Kaid recommend adapting the research methodology to each cultural context in terms of political communication. In Asia, for instance, it is often not possible to collect public opinion data. Japanese citizens tend to “be socially cohesive and the expression of any kind is not likely due to the desire to avoid open confrontations” (p.01). Moreover, according to the “spiral of silence” theory citizens whose opinion is in the minority tend not to speak because of fear of isolation from the society.

Holtz-Bacha and Kaid mention the case of the European Parliamentary elections which provide a good research context because of the similarity in the election cycle where constant variables are generated in different cultural settings in European Union countries. The event is followed by studies that provide rich information on the impact and results at European scale and at national level in the participating countries. Citing Schlesinger, Heinderyckx[2] (1998) points out that trying to communicate at European level means to first acknowledge the existence of considerable diversity. He also adds an additional dimension to what Wolton says about the particular European context[3]: “And Wolton also states that ‘there is no European public space even if there is a political area’ where we can also consider its economic dimension” (p.222).

Holtz-Bacha and Kaid conclude that in recent decades researchers have been able to enhance traditional research methods in political communication with innovative elements. While the American approach has lately been based on sophisticated computer software, the European traditions “have favoured more qualitative and interpretive approaches to political language analysis” (p. 407). Both approaches have benefited from the facilities offered by the digital technologies which enable storing and processing of raw data – transcripts, media databases, scanners and others. This is a significant contribution to making the research more “feasible, efficient, accurate and less time-bound” and enabling researchers to report “robust findings and accurate insights” (p. 407).

In today’s world, online content overlaps with both traditional media and interpersonal information sources. In contrast with traditional media, which is monolithic, the Internet harvests many more information channels arranged in distinct layers of sources: websites, blogs, email, repositories, social media and social network platforms (Bellur and Sundar[4] citing Sundar and Nass, 2001, p.489).

Bellur and Sundar also emphasise that in the case of blogging – and micro-blogging, in extenso – both blog author and blog readers serve as information sources, where the traditional message model (sender, message, receiver) is adapted to present conditions, which enables sender and receiver to switch places and support interactivity.

Political communication on Twitter

Maireder et al. (2012) conducted a content analysis on the Austrian political Twittersphere from February to October 2011. They identified the structure of the Austrian political Twittersphere and looked at three types of Twitter users – politicians, journalists and other actors – who interacted with people outside the professional political sphere. Their research method implied 1) collecting, processing and visualising the tweets containing the most popular hashtags and 2) mapping the @mention network between the users who interacted.

Maireder et al[5]. undertook a “user-centred approach” (p.6) as most tweets did not contain hashtags and therefore hashtagging was “not used consistently” (p.6). They identified 374 account holders who they grouped in four user categories: politicians, journalists, others (experts, lobbyists, strategists) and ordinary citizens. The researchers analysed the @mentions employed in tweet bodies, and, as this was not a “sufficient indicator for influence” (p.7), they also analysed the users’ activity on Twitter, number of users mentioned, and the messages classified by profession.

They conclude that 1) politicians, journalists and other professional actors “use Twitter to interact with both actors of their own profession and other spheres” (p.11), whereas politicians are the most active and 2) Twitter enables ordinary citizens to become more active in the public debate than “in traditional contexts of interactions” (p.11).

Citing Bruns, Maireder et al. (2012) state that Twitter has become “the world’s second most important social media platform” (p.3) which may increase interaction between politicians and citizens who are “exposed to a lot of different opinions” (p.4) even though previous studies report that politicians may use “Twitter for self-promotion and simple information diffusion rather than conversations” (p.5).

The purpose of the study carried out by Lilleker[6] (2013) was to identify peers’ patterns of usage and communication on Twitter. The research subject was a sample of 850 tweets published by the Labour Party Peers in the House of Lords, in the UK in 2012. The researcher applied a mixed research methodology that combined semantic analysis, social network analysis and quantitative analysis.

Among Lilleker’s research findings there are some which are worth noting: 1) content could be shaped by norms of politics or Twitter medium or a mixed of two; 2) political agenda and debates could be led by the use of @ and #; 3) Twitter content is linked to mainstream media; 4) individuals involved in debates focus on key subjects. In his tentative conclusions, Lilleker emphases that: 1) Twitter usage is determined by interests and personality; 2) Twitter could act as a communication central point (hub); and 3) “Twitter has high potential but dependent on individual usage” (Adapted from Lilleker, slide 19).

Weigel[7] (2013) summarises a set of studies focusing on how Twitter use “may shape political and civic space and discourse” in a time when “the microblogging platform is increasingly being used as a vehicle for shaping political debates by actors who have their own motivations and who do not necessarily represent the grassroots of the citizenry” (online).

While youth and minorities get involved in politics “to a certain extent”, Twitter is the place where “highly partisan and politically engaged citizens appear to dominate the social media outlet”(online).

Van Dijck[8] (2013) questions the neutrality of Twitter as a social media platform, where “some users are more equal than the others” (p.74), despite its “town hall” image, which homes the voices of individuals and the opinions of organisations. However Van Dijck explains his statement through the platform architecture itself which “privileges certain influential users who can increase tweet volume, and whom thus garner more followers” (p.74).

References

[1] Holtz-Bacha, C. and Kaid, L. L. (2011), “Political Communication across the World: Methodological Issues Involved in International Comparisons” in Bucy, E. P. and Holbert, R. L. (eds.) The Sourcebook for political communication research: methods, measures, and analytical techniques, New York, Routledge

[2] Heinderycx, François (1998), L’Europe des médias, éditions de l’Université de Bruxelles, (p. 222), « Or, comme l’affirme Schlesinger, ‘essayer de communiquer sur le plan européen implique d’abord et avant tout de reconnaître la réalité d’une diversité considérable’. Et Wolton rappelle de même qu’il n’y a ‘pas d’espace public européen, même s’il existe un espace politique’ et, pourrait-on rajouter, économique ».

[3] Heinderyckx, F. (1998), L’Europe des médias, Editions de l’Université libre de Bruxelles, Bruxelles, Institut de sociologie

[4] Bellur, S. and Sundar, S. S. (2011), “Concept Explication in the Internet Age: The Case of Political Interactivity” in Bucy, E. P. and Holbert, R. L. (eds.) The Sourcebook for political communication research: methods, measures, and analytical techniques, New York, Routledge

[5] Maireder, A. et al., (2012), “Mapping the Austrian Political Twittersphere: How politicians, journalists and political strategists (inter-)act on Twitter”, in Proceedings of CeDem12 Conference for E-Democracy and Open Government, Krems: Danube University, pg. 151-164

[6] Lilleker, D. (2013), “Elite tweets: Analysing the twitter communication patterns of Labour Party Peers in the House of Lords” in A session at Twitter and Microblogging: Political, Professional and Personal Practices

[7] Weigel, M. (2013), “Twitter, politics and the public: Research roundup“, in Journalist’s Resource

[8] Van Dijck, J. (2013), “Twitter and the Paradox of Following and Trending” in The Culture of Connectivity: A Critical History of Social Media, Oxford University Press, pp.68-88