Twitter linguistic patterns

The European Commission’s EMPL presence on Twitter in 2012: Linguistic patterns

As mentioned in the previous articles, I used LIWC as one of the software tools to establish the linguistic patterns developed on Twitter by the subjects of this research.


LIWC was able to detect about 69% of the dictionary words which were part of the input provided by Social Europe, EURes and Commissioner Andor in 2012. The remaining 31% of words may be part of EU terminology that is uncovered by the LIWC dictionaries.

Only 35 information categories and sub-categories, which are relevant to this research, were selected out of 80 in output. They are: linguistic processes, psychological processes and personal concerns.

Each category is introduced in the next paragraphs.

Linguistic processes

  • Word count: the largest tweet corpus is of Commissioner Andor (17,782 words in 716 tweets), followed by Social Europe (15,843 words in 934 tweets) and EURes (6,837 words in 398 tweets). It appears that Social Europe used a condensed communication style since their average word count per tweet is about 17 while EURes had more than 17 and Commissioner Andor about 25 words per tweet.
  • Dictionary words: 69% user average of words were captured by the program, based on its incorporated dictionaries. The remaining 31% may represent EU terminology, which LIWC dictionaries may not contain as they are based on informal and not on a specialised vocabulary.
  • Total function (or style) words: only 34% of the tweet corpora represent function words. A possible explanation would be that due to the 140 character limit, Twitter users rely more on content words to convey a clear message and keep function words (such as pronouns, articles and prepositions) to a minimum. LIWC distinguishes between content and function words. Content words are the backbone of a message (nouns, verbs, adjectives and adverbs), while function words connect, shape, and organise content words (pronouns, articles, prepositions, auxiliary verbs, conjunctions, negations, quantifiers and common adverbs).

They do not have any meaning by themselves but they allow a psychological insight into how people think, feel and connect with others. They are short and used at very high rates but they are also hard to detect in a conversation flow or written text. Function words also require social skills and social knowledge to be used properly: the speaker assumes the listener is familiar with the communication context. According to Pennebaker[1] (2011, p. 33) “the speaker assumes that the listener knows the context”.

  • Total pronouns: EURes leads with 8% followed by Commissioner Andor (4%) and Social Europe (3%) while the user average is 4%. In terms of personal pronouns use, EURes is in the leading position again (6%) followed by Commissioner Andor (2%) and Social Europe (2%).

Personal pronouns captured my attention. For example, “people who pay a great deal of attention to other people tend to use personal pronouns at high rates” (Pennebaker, 2011, p.291). First person singular pronouns were not used often (less than 1% user average). First person plural pronouns reflect a social connection to a group. I noticed a 2% use for EURes and less than 1% for the two others which may indicate that EURes is more “inclusive” by creating a convivial and cooperative environment. Second person pronouns represented about 3% use for EURes, who often address themselves directly to their followers to engage in a more spontaneous way and get closer to the audience. Since EURes offers practical solutions, their tweets often include questions such as “Are you interested in working in Norway? You can read more about it here”.

  • Verbs: the three account holders have a preference for using the present tense (5% user average) and tend to equally use the past and future tenses (less than 1%). The use of the present tense proves a dynamic communication.

Psychological processes

  • Affective processes: both positive and negative emotions (posemo and negemo) will be introduced in the next article as a Losada line. It is worth noting that, according to LIWC output, Commissioner Andor’s content is placed in both the most positive and most negative categories (see posemo and negemo in Table 1). This could be explained through the use of specific words associated to unemployment, in the context of the crisis. However, the positive dimension is visible in the efforts proposing relevant legislation in order to overcome the crisis.
  • Cognitive processes imply perception, learning, and reasoning to facilitate thinking and remembering. Less than 11% of the content shows a certain degree of cognitive processes, which is visible in the Twitter messages (13% Commissioner Andor, 11% EURes and 9% Social Europe). Commissioner Andor’s tweets often placed a number of quotes from his speeches and some of his reflections on the subject of the policies. The account administrator confirmed that the Commissioner liked to tweet relevant quotes from his speeches, which were enhanced with personal reflections.

It is also important to note the inclusive dimension of the cognitive processes, which is represented by words such as “and”, “with”, or “include” and reflects a high use of 3,2% for all three account holders. This may mean that specific vocabulary covering inclusion policy is well employed in the tweet corpora. The exclusive dimension is minor.

  • Relativity conveys information on motion, space and time. The size of both space and time lexical fields is significant: 7% user average for space and 6% user average for time. It is apparent that European Union countries, regions and cities are well represented in the communications. Time-wise, there are many references to event dates throughout the year 2012.

Personal concerns

Personal concerns cover information on work, achievements, home and money.

  • Work includes information about jobs and careers. With 9% user average, this category is quite remarkable (10% EURes, 9% both Commissioner Andor and Social Europe). It is obvious that this category is well represented as all account holders talk about job opportunities and job-related events.
  • Achievement covers information on earnings, winning and successes and represents 4% user average, which may reflect the efficiency of the policy communication as well as the considerable achievements with the events and guidelines. The peak of 4,2% for the Commissioner may be related to his missions and official visits abroad which were successful in 2012, according to the statements in the tweets.
  • Money represents 2% of the content and covers discussions about poverty, salary rights and others. Commissioner Andor is above the user average, with 3%.

Table 1: Selected LIWC categories

Table 1: Selected LIWC categories

However, even though LIWC offers insights into the language patterns of a user, we should also keep in mind that Twitter style is very different from everyday interaction. For instance, the three account holders may not have used many future tense verbs to reflect a future perspective but they are certainly goal-oriented.

[1] Pennebaker, J.W. (2011), The Secret Life of Pronouns: What our words say about us, New York, Bloomsbury Press, 2011

One thought on “Twitter linguistic patterns

  1. Pingback: How to measure Twitter content polarity | A Walk Of Life

Leave a reply, don't be shy! :)

Please log in using one of these methods to post your comment: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s