Tweet me a URL and make your communication richer

The European Commission’s EMPL presence on Twitter in 2012: URLs

Some of the previous studies suggested that URLs in tweets are a sign of engagement given the URL purpose to provide additional information and therefore engage a follower with the page content of the URL.

Although this might be the case, it is difficult to trace and monitor the clicks on the links, unless specialised URL shortening services also provide analytics along with the shortening tool service. Therefore the URL category was not treated as an engagement parameter in this research project.

“Sharing links is a central practice in Twitter” (Boyd et al[1]., 2010, p.3), therefore it is common practice to add links (URLs) to a tweet body to enable followers find out more about the tweet content.

All URLs employed in tweets are normally shortened either by the tweet author (using a shortening software tool) or by Twitter itself.

The URLs in a tweet are a way to promote information in all formats such as pictures, video and texts. The URLs employed by Social Europe, EURes and Commissioner Andor provide relevant information about what type of content they recommended on Twitter, and in what format this content was delivered to the audience.

The three account holders placed 1415 URLs in 1255 tweets out of 2048 (the total amount of tweets). There were 21 URLs either broken or mistyped, therefore I discarded them and considered 1394 URLs, which I validated myself, while visiting the webpages to identify their content category and format.

URLs and tweets with URLs vs. total of tweets

In terms of tweets employing URLs vs. tweet volume per user, EURes leads with 77%, followed by Commissioner Andor and Social Europe, 62% and 54% respectively, while 61% represents the user average (Table 1).

Table 1: URL overview by account

Table 1: URL overview by account

The number of URLs vs. the number of tweets including URLs is higher, meaning that there was more than one URL in each of the tweets (Table 1). Commissioner Andor led with 120%, followed by Social Europe with 107% and EURes with 104%, while the user average is 111%.

Figure 1: Links and tweets with links vs. total of tweets

Figure 1: Links and tweets with links vs. total of tweets

It is worth mentioning that 61% of tweets published by the three account holders contained 1394 URLs, meaning 1394 webpages. This is a remarkable volume of information which was distributed through Twitter to the three account holders’ followers in 2012. It would be interesting to find out how many people clicked on the URLs and why, but this is not the subject of this research.

Websites linked in the tweet body

I identified nine major groups of websites (where the 1394 URLs from the tweets highlighted) while visiting the websites linked by the URLs in the tweet bodies (Table 2). The information in the table is sorted in descending order of the URL total, from the largest to the smallest number.

Table 2: Website categories

Table 2: Website categories

The three account holders together had a particular preference for recommending other websites (20%, 284 URLs), employment policies (16%, 228 URLs) and employment policy news (16%, 222 URLs) published by the European Commission. The statistics in both Table 2 and Figure 2 reveal that EURes, Social Media Networks, other EU institutions, EC other departments, EY2012 and Commissioner Andor websites follow on in the next six positions.

Figure 2: Categories of websites linked in the tweet body

Figure 2: Categories of websites linked in the tweet body

In terms of individual use of URLs the results indicate that Social Europe focused more on promoting “EC EMPL Social Europe” (11% policies), the European Commission (EC) news covering mainly its policies and EURes activities. EURes focused more attention to its own website (10%). In second position was the “Social Media Networks”, where EURes was more active, while the “Other websites” category came in third position. The preferences of Commissioner Andor were “Other websites” (14%), in the first instance, followed by “EC News” in second position and “EC EMPL Social Europe” (employment policies) in the third position.

Websites’ languages

In terms of user average the website language options were as follows: approximately 68% of the URLs pointed to English content, 13% pointed to content in 23 languages and 9% pointed to bilingual content (Figure 3).

Figure 3: Websites' languages

Figure 3: Websites’ languages

At individual level, the results illustrate different preferences: 1) Social Europe with 69% to content in 23 languages, 34% to content in English and 33% to bilingual content; 2) EURes with 87% of content in 25 languages, its own website, 44% of content in 4-22 languages and 24% English content; 3) Commissioner Andor with 61% trilingual content, 54% bilingual content and 42% English content. The figures validate the statements made by the three account administrators during the interviews. They explained that their preference for English content was based on website content being available mainly in English and also on the available human resources to handle this content on Twitter.

Content categories

In terms of content category the first preference of the three account holders was to link their tweets to mixed content (69%), while their second preference was the “Text only” category (17%). The third preference was the “Video only” category with 12%, mainly video content on Youtube and EC Audio-visual gallery (Figure 4).

Figure 4: Content categories

Figure 4: Content categories

Social media and social networks are well represented among the nine website categories. The three account holders gave equal attention to linking their tweets to mixed content (69%). It appears that the most efficient communication occurred when the followers are redirected to a combination of text, pictures, videos, and rarely sound. The second option was to link to “text only” category (17%), mainly documents and other text-based communications (Figure 4). The third option was the “video” category (12%) which account administrators believe have some significant impact.

Content format

Most of the content recommended by the three account holders to their audiences (Figure 5) was in HTML format (93%), followed by PDF format (7%, mainly policy documents) and PPT format (0.1%).

Figure 5: Content format distribution by user

Figure 5: Content format distribution by user

About 3/4 of the tweets were linked primarily to webpages with content covering employment, social affairs and inclusion policies. This is an important achievement which shows the need to bridge the EC’s communication needs and readers’ expectations. It is obvious that the most featured websites through Twitter URLs are websites managed by the departments of the three accounts. The other websites contained related information to the tweets’ subjects which will be introduced in the next sections. It is worth noting the information volume that was handled through a 140 character content unit. The tweets occasionally contained more than one link depending on the communication needs of the users.

There was a balanced option of the three account holders to tweet links to HTML format (25%, 18%, 25%), which accounts for 93% of the user average, while PDFs came in the second position, the most utilised format for documents and other publications (7%). Surprisingly, Commissioner Andor occasionally linked to PowerPoint presentations (0.1%).

The most balanced language group was English where the three account holders are closer: Social Europe with 34%, EURes with 24% and Commissioner Andor with 42%.

Previous articles on the same subject

Case study: The European Commission’s EMPL presence on Twitter in 2012

The European Commission’s EMPL presence on Twitter in 2012: Time metrics

The European Commission’s EMPL presence on Twitter in 2012: Content languages and hashtagging

Why mentions on Twitter help people communicate: The European Commission’s EMPL presence on Twitter in 2012

[1] Boyd, D. et al. (2010), Tweet, Tweet, Retweet: Conversational Aspects of Retweeting on Twitter, HICSS-43

Why mentions on Twitter help people communicate

The European Commission’s EMPL presence on Twitter in 2012

Twitter mentions help users communicate and give more visibility to the one mentioned. The mention format is the Twitter username (handle), preceded by the “@” symbol (@username). The mention can be placed anywhere in the tweet body. The replies are also considered mentions since their mark (“@”) is similar. The statistics on mentions, their occurrences and the tweets employing mentions of the three Twitter account holders, namely Social Europe, EURes and Commissioner Andor, are presented in Figure 1. (Previous information describing the results of this research project are available in these articles)

Figure 1: Mentions, occurrences, and tweets with mentions

Figure 1: Mentions, occurrences, and tweets with mentions

The information in Figure 1 introduces the following categories:

  1. mentions
  2. their occurrences, which are higher as certain users were mentioned more than once, and
  3. the number of tweets where mentions were part of their bodies.

There were 159 accounts mentioned by the three account holders, from which 153 were valid accounts (6 of the accounts were discarded either for being closed or unidentified in 2012). The 159 accounts were mentioned 194 times by the three account holders. The difference of 35 accounts arises from one or more account being mentioned by two or even all the three account holders.

The occurrences of mentions amount to 462, which means that they were recorded 462 times in 357 tweet bodies. Commissioner Andor leads the statistics for mentions with 169 tweets (47% of the total tweets with mentions), which means that he used 97 mentions (50% of the total of mentions) with an occurrence of 230 (47% of the total occurrences). Social Europe comes in second position having issued 140 tweets (39% of the total tweets with mentions), where 64 mentions were used (32% of the total of mentions) with an occurrence of 174 (38% of the total occurrences). EURes comes in third position having issued 48 tweets (13% of the total tweets with mentions), where 33 mentions appear (17% of the total of mentions) with an occurrence of 58 (13% of the total occurrences).

Top mentions with at least 5 occurrences

There are 16 accounts (57%) that were mentioned by the three account holders in their tweets at least 5 times. Top 3 includes Commissioner Andor (96 occurrences), EU Commission (21 occurrences) and Eurofound (19 occurrences). Apart from one NGO (@ESUtwt), an international body (@ILOnews) and a Twitter facility to hide a tag (@hidetag, used during the chat to avoid running too much information in the feeds of the chat host’s followers, Commissioner Andor), the others are EU public figures and EU bodies. While Commissioner Andor was in first position, EUResjob and Social Europe came in the fourth and sixth place.

Figure 2: Top mentions with at least 5 occurrences

Figure 2: Top mentions with at least 5 occurrences

Distribution of mentions by country of the mentioned users

The statistics on the country of mentioned users (Figure 3) places Belgium on top with 73 users, as Brussels is the home for most EU institutions and EU public figures. Belgium is followed by countries outside EU (22), the UK (11), Italy (9), Ireland (8), Spain (7), France and The Netherlands (5), Croatia, Greece and Sweden (2) and the remaining countries with 1 user each.

Figure 3: Distribution of mentions by country of the mentioned users

Figure 3: Distribution of mentions by country of the mentioned users

The top mentions with at least five occurrences and the distribution of mentions by country are a sign of top level collaboration and of an interest in certain key EU public figures and EU bodies in relation to topics discussed by the three account holders on Twitter.

This is the case with the Employment Package, where supporting job creation in the EU countries, making labour markets more dynamic, and improving EU governance, involve not only social but also other policies. The leading position of Commissioner Andor in terms of mentions could be also explained through the mentions by followers while interacting with him during the chat on Youth Employment in December 2012.

The remaining top mentions are located in countries either visited by Commissioner Andor or by countries hosting some events. Among the top mentions there is also an EU body, Eurofound (Ireland), which is an EU agency working closely with DG EMPL.

The mentions are a relevant indicator when discussing the engagement of the three account holders with their followers in the forthcoming articles.

Previous articles on the same subject

Case study: The European Commission’s EMPL presence on Twitter in 2012

The European Commission’s EMPL presence on Twitter in 2012: Time metrics

The European Commission’s EMPL presence on Twitter in 2012: Content languages and hashtagging

The European Commission’s EMPL presence on Twitter in 2012: Content languages and hashtagging

Previous articles on the same subject

Case study: The European Commission’s EMPL presence on Twitter in 2012
The European Commission’s EMPL presence on Twitter in 2012: Time metrics

Tweet languages

The tweet content of the three account holders, namely Social Europe, EURes and Commissioner Andor, was in nine languages. About 96% of the content was in English, followed by Spanish, French, Italian, Dutch, German, Hungarian (Commissioner’s mother tongue), Croatian and Latvian (Figure 1).

Figure 1: Tweet languages

Figure 1: Tweet languages

With about 96% of the content in English the three account holders are officially not in compliance with the Commission policy on multilingualism. The policy emphasises the access to key content in all the EU official languages to enable citizens to understand policy outcomes in their mother tongue. The account administrators stated that, for a number of reasons, they could not equally communicate content in all the 23 EU official languages (in 2012). One of the main reasons is that the Commission works in three languages (English, French, German) and the content is produced in English, in the first stage, for practical reasons. Another reason is the level of human resources to manage multilingual content. Unlike the Commission, the European Parliament has a large team of editors and social media managers covering all the EU official languages.

Certain languages (Figure 1) were used in relation to some national events or official visits performed by the Commissioner. The tweets in those languages were localised (translated) either by the EURes team and its national counterparts or by European Commission’s representations in the respective countries.

Hashtagging

A hashtag should be self-relevant and notable, to enable people easily remember it. The hashtags were used to mark and classify certain pieces of tweet content. They were processed to extract information on hashtag volume, occurrences and most used hashtags. The extracted information was triangulated with other information categories that are described in the forthcoming articles, when establishing the trending topics of the Twitter communications managed by the three account holders. Figure 2[1] pictures the hashtags of the three unified tweet corpora, namely of Social Europe, EURes and Commissioner Andor.

Figure 2: Hashtag cloud with all unified hashtags

Figure 2: Hashtag cloud with all unified hashtags

Hashtag volume and occurrences: User overview vs. total

Figure 3 pictures the hashtag volume (the set of hashtags included in the tweet bodies, each hashtag being counted once) and the hashtag occurrence. It means that certain hashtags were used multiple times to tag content and classify it in a hashtag category, therefore making it searchable, visible and well-contextualised when other readers looked for it.

The highest occurrence rate was recorded by EURes by multiplying its own hashtags by 4.6, followed by Commissioner Andor by 4.4 and Social Europe by 4.3 while the average use of all hashtags was 4.4 higher. It is worth mentioning that Social Europe is the highest with 58% of the total hashtag volume, followed by Commissioner Andor with 23% and EURes with 19%.

Figure 3: Hashtag volume and occurrences: User overview vs. total

Figure 3: Hashtag volume and occurrences: User overview vs. total

The results on hashtag volume and occurrences prove a professional and efficient approach to hashtagging content on Twitter, even though for certain time periods hashtagging was inconsistent, mainly in the case of EURes, which was confirmed in the interview. It is also worth noting that, shortly before 2012, the three account holders began communications on Twitter and therefore needed time to understand how the platform works and its novelties.

Ranking of most used hashtags (#)

  1. #EY2012 (245 occurrences)

The most popular topic generated by the three account holders on Twitter in 2012 was the 2012 European Year for Active Ageing and Solidarity between Generations. DG EMPL and national bodies managed the European Year as a pan-European project. The shared management approach may explain the popularity of the topic following the interest of Twitter followers to promote projects, events and other activities managed under the 2012 European Year umbrella.

  1. #Jobs4Europe (231 occurrences)

Jobs 4 Europe was the Employment Policy Conference held in Brussels on 6-7.09.2012. The conference focused on the Employment Package, employment policy, the functioning of European labour markets, wage developments, flexicurity[2] and inequalities. Key European public figures as well as experts and policy makers attended the event, which was organised by DG EMPL.

  1. #EJD (150 occurrences)

European Job Days (EJD) was a set of recruitment events that brought jobseekers and employers together in some hundred events all over Europe in 2012. The events were jointly organised by EURes and national authorities and encouraged workers’ mobility throughout the European Union by attempting to match the right candidates to the right jobs.

  1. #YouthEmpl (136 occurrences)

This was the hashtag used to promote the Twitter Chat with Commissioner Andor on 7 December 2012 (between 14:15 and 15:00). Becoming very popular, the hashtag was used to tag content covering any references to youth employment.

  1. #EOJD (97 occurrences)

This is the online version of the EJD event, which took place in face-to-face settings. The organisers made European Job Days accessible to as many people as possible by creating an online version, The European Online Job Days, to enable employers and jobseekers to meet online.

  1. #EU (88 occurrences)
  2. This is the hashtag for the European Union (EU).
  3. #EURes (81 occurrences)

EURes is one of the Twitter account holders, which is a subject of this research project. EURES (EURopean Employment Services) is a free service for both jobseekers and employers, a network composed of public employment services. Trade unions and employer organisations also participate as partners. EURes encourages the free movement of workers within 31 countries (27 EU countries in 2012, plus Norway, Liechtenstein, Iceland and Switzerland).

  1. #EmplPackage (80 occurrences)

Employment Package, which launched in April 2012, is a set of policy documents, which establishes a cross-policy approach. It supports job creation in EU countries and makes labour markets more dynamic.

  1. #poverty12 (75 occurrences)

This hashtag was created to tag the Second Annual Convention of the Platform against Poverty and Social Exclusion, a high level event that took place in Brussels, between 5 and 7 December 2012. The Platform is one of the Europe 2020[3] leading initiatives, which comprise actions on reducing the number of people at risk of poverty or social exclusion by at least 20 million by 2020. The purpose of the Convention was to review the progress in this regard and provide recommendations, comments and suggestions for consultation on the Social Investment Package.

  1. #ageing (63 occurrences)

This is one of the hashtags used to tag content covering the 2012 European Year for Active Ageing and Solidarity between Generations. EY2012 is the most popular hashtag related to the European Year and it came in the first position on this list.

The remaining hashtags will be discussed in the forthcoming articles, when analysing the tweet corpora and establishing the trending topics of the Twitter communications managed by DG EMPL and EU Commissioner Andor in 2012.

[1]Hashtag Cloud designed with Tagxedo

[2] Flexicurity is a term that combines flexibility and security to express a well-being social prototype that implies a pro-active labour market policy.

[3] Europe 2020, the EU’s growth strategy for the current decade

The European Commission’s EMPL presence on Twitter in 2012: Time metrics

The quantitative results were grouped into eleven information entities that were created after processing the 14 information categories collected into a spreadsheet: tweet date, time, language, tweet body, hashtag (#), mentions (@), URL in the tweet body, retweets (RT), RT author, favourite, favourite author, replies, reply author and reply body. Initially the “carbon copy” (cc) category was included in the list, but later discarded during the research as there were few relevant occurrences. The time metrics are introduced and discussed in this article.

Figure1: Aggregate daily tweets

Figure1: Aggregate daily tweets

Aggregate daily tweets

The aggregate daily tweets (Figure 1) reveal key information on the Twitter behaviour of the three account holders: Social Europe, EURes and Commissioner Andor. The highest daily tweet density, which shows the busiest day of the week, was recorded as follows: Social Europe on Thursday, EURes on Saturday and Commissioner Andor on Wednesday. EURes had constant Twitter activity and its daily tweet distribution shows a consistent density throughout the week except for Sundays, which is the lowest. EURes had its peak activity on Saturday. Social Europe’s Twitter activity increased gradually from Monday to Thursday and decreased from Friday to Sunday. Commissioner Andor’s Twitter activity had two peaks, the first one on Wednesday, and the second one on Friday. His first peak activity on Wednesdays might be explained through the need to share important pieces of news and decisions following the weekly European Commission’s college (all 27 Commissioners in 2012) meetings taking place on Wednesdays.

The lowest tweet density was recorded on Sunday for all the accounts: Social Europe with 8 tweets, EURes with 4 tweets and Commissioner Andor with 8 tweets. Sunday was therefore the least busy day of the week.

The daily and weekly averages were calculated by dividing the total number of tweets by the number of days and weeks of the year 2012. The results place Social Europe on top with a daily average of 2.5 tweets per day and 17.9 per week, followed by Commissioner Andor with a daily average of 1.9 tweets per day and 13.7 per week and EURes with a daily average of 1.1 tweets per day and 7.6 tweets per week. Wednesday was the busiest day of the group, if the three accounts are counted together. The daily average of 5.6 tweets per day of all three account holders shows a regular daily behaviour on Twitter.

The aggregate tweet volume also varies from one user to another. The aggregate weekly distribution is about 40 tweets per week, meaning approximately 13 weekly tweets published by each user.

The results are in line with Waters and Williams’ recommendations[1] (2011, pp.360-361) on posting content with moderation and not engaging in over-tweeting. The 5.6 tweets per day means an average of about 2 daily tweets per user. The statistics also show a constant daily and weekly activity which is important when maintaining and expanding a network of followers and feeding them with relevant and timely content.

Aggregate hourly tweets

The busiest hour of the day (Figure 2) was between 12:00 and 13:00 for Social Europe, between 09:00 and 10:00 for EURes and between 14:00 and 15:00 for Commissioner Andor. On average the busiest hour of the three account holders was between 12:00 and 13:00. They started their daily Twitter activity in the morning, even between 06:00 and 07:00. Social Europe and EURes closed their activity after the working office hours (18:00) while Commissioner Andor later, around 21:00.

Figure2: Aggregate hourly tweets

Figure2: Aggregate hourly tweets

The Twitter hourly behaviour of the three account holders could be categorised as normal. The statistics shows which hour of the day they were the most and the least active. Occasionally Commissioner Andor had some activity during the night, but that occurred when he visited the USA and Mexico to attend two events. The account administrators stated that the statistics show a regular activity performed during the typical office hours. The lunch breaks are visible, although some intense activity was recorded during the breaks when running certain campaigns and events.

Annual tweet timeline

The total number of tweets published in each month of 2012 is presented in Figure 3.

Figure 3: Annual tweet timeline

Figure 3: Annual tweet timeline

The annual Twitter timeline configuration indicates that individual peaks were recorded for both Social Europe (221 tweets) and Commissioner Andor (113 tweets) in September while the peak of EURes (112 tweets) occurred in October. The peak activity of all three account holders together occurred in September (407 tweets). The least active period was in July for both Social Europe (5 tweets) and Commissioner Andor (24 tweets) and May (3 tweets) for EURes. The least activity of all account holders was recorded in July with a total of 40 tweets.

From the statistics it appears that after the summer break the autumn was the most intense period of the year when major events and activities took place. In September, Social Europe and Commissioner Andor focused on policies and a high level event, the Jobs4Europe conference, while EURes focused on European (Online) Job Days in October. The account administrators confirmed the results.

Related article: Case study: The European Commission’s EMPL presence on Twitter in 2012

References

[1] Waters, D., R. and Williams, M., J. (2011), “Squawking, tweeting, cooing, and hooting: analyzing the communication patterns of government agencies on Twitter” in Journal of Public Affairs, vol. 11, No. 4, pp. 353-363

Case study: The European Commission’s EMPL presence on Twitter in 2012

I would like to share the results of a MA research project that I conducted between 2012 and 2014. The project focused on the Twitter communications carried out by the European Commission’s Directorate-General for Employment, Social Affairs and Inclusion Policies in 2012.

13_md_articleThe main research question was to identify the purposes and features of Twitter communications focusing on Employment, Social Affairs and Inclusion policies, managed by the EU Commissioner Andor and the EU Commission’s DG EMPL in 2012.

The main question was composed of a number of smaller parts to enable me build solid research answers. These parts are:

  1. What was the content of communications managed by Commissioner Andor and DG EMPL on Twitter in 2012?
  2. What were the trending topics and what communication patterns did they develop?
  3. Social networking vs. social media: What content categories and format were distributed on Twitter by Commissioner Andor and DG EMPL in 2012?
  4. Who were among their information multipliers? (All these parts and the linking elements are introduced in Figure 1).
Figure 1: Research main question, sub-questions and linking elements

Figure 1: Research main question, sub-questions and linking elements

Why such a research project?

Because of its intuitive and easy-to-use interface, Twitter has been embraced by both individuals and organisations, which over the past years helped rise in its popularity and therefore place it among the referential public virtual communicative spaces.

Using Twitter for communication with others either from a desktop computer or a mobile device became a daily activity. Twitter thus enables a fast communication service to a wide range of institutions and people, from celebrities, politicians, journalists to scientists, marketers, activists, researchers, educationalists and ordinary people.

Twitter therefore influences the way we live and the society we live in. Twitter actually taught us how to refine our communication message to fit into 140 characters. We also learn how to polish a message and concentrate ideas in a manner that makes communications more efficient.

The beauty of Twitter and its uniqueness comes from this 140 character restriction, which was imposed by its programmers to enable both computers and mobile devices use the platform and “speak to each other” a common language.

My research focused on the Twitter communications on Employment, Social Affairs and Inclusion policies managed by the European Commission in 2012 through its specialised department, namely DG EMPL and the Commissioner responsible for these policies.

Little information covering institutional communications via Twitter has been published to date. Existing blog and news articles treat specific aspects and do not look at the communication strategic points and at the multilingual content that is of an interest to the European Union, a large area where the intended communication audience lives. Furthermore, this type of communications is unique and does not resemble a national communication pattern, where national bodies design their strategies and content based on specific cultural settings.

That is why I believe that the subject matter of this research and some of its aspects may prove challenging, but, in return, may shed light on the particularities of the Twitter communications managed by a European institution that addresses multilingual audiences with a content that might bypass the national layers. The research context is year of 2012, when crisis signs were still evident and affecting the European Union as a whole.

The findings of this research may contribute to understanding the Twitter role as a communication channel that has been adopted by the European Commission as well as may enable sharing a communication model that could be adopted by other institutions in an attempt to benefit from this practice example, in a world transformed by the social media.

Research object, research methods and tools

The object of this research consists of two representative pieces of content:

  1. Three corpora of 2048 tweets published by the three account holders in 2012 as follows: 934 tweets by Social Europe and 398 tweets by EURes for DG EMPL, and 716 tweets by Commissioner Andor, including 21 tweets which were answers to the participants’ questions in a chat hosted by himself on 7 December 2012.
  2. A set of four face-to-face interviews with the administrators of the three Twitter accounts as well as with a social media coordinator from the European Commission.

I chose two research methods to process and analyse the raw data. The first method is a quantitative analysis of the tweet metrics that are related to each tweet body. The second method ensures the qualitative dimension of the research. It consisted of:

  1. analysing the tweet bodies in the three corpora and
  2. a set of face-to-face interviews of the Twitter accounts administrators.

The tweet corpora were analysed by the use of spreadsheet software and by employing two CAQDAS tools, WORDij and LIWC, which were introduced earlier. The interview outcomes were analysed by applying the recursive abstraction method, which did not imply coding but a manual content distillation of the notes taken during the interviews. The quantitative dimension of the research will be visible in the Twitter metrics (numerical values and parameters), which are detailed in the forthcoming articles.

Given the purpose of interviewing the account administrators, namely to compare the Twitter data I collected against what the account administrators intended in 2012, I chose to follow the methodology of a “standardized open-ended interview” (Cohen et al[1]., 2002, p.271).

The three account administrators, namely of Social Europe, EURes and Commissioner Andor, were asked identical questions to enable analysing and comparing similar pieces of information. The questions prepared for the corporate account administrator also contained, in addition to some similar questions for the three administrators, some questions covering subjects related to the Twitter strategic communications and corporate approach of the Commission.

The interviews outcome enabled data triangulation, which means that facts and opinions resulted from the Twitter corpora and Twitter parameters could have been corroborated to validate or invalidate the outcomes of this research.

I prepared my interview questionnaires following Denscombe’s (2003[2]) advice:

“In reality, interviewing is no easy option. It is fraught with hidden dangers and can fail miserably unless there is good planning, proper preparation and a sensitivity to the complex nature of interaction during the interview itself” (p.164).

More content to come in the forthcoming articles.

References

[1] Cohen, L. et al. (2002), Research Methods in Education, London, 5th edition, RoutledgeFalmer

[2] Denscombe, M. (2003), The Good Research Guide for small-scale social research projects, second edition, Maidenhead, Open University Press

Twitter: Semantic user and network profiles

Social media platforms and some well-known websites such as Google and Amazon customise the webpages they display to their users, based on their preferences and browsing habits. Therefore, the information in the browser window is personalised and it is based on a so-called “user profile”.

With the introduction of the Semantic Web concept, researchers have looked at the users’ profiles from a semantic angle.

twitter_network_iconGentile et al[1]. (2011) carried out research to profile users from a semantic angle and based on informal communication exchange through email. They claim their study proposes a solution to modelling user expertise from regular email exchange in organisations, where very often, lost knowledge could be easily rescued and stored for re-use purposes to enable quick transfer to newcomers for increasing efficiency in workplace:

“Extracting information from informal communication exchanges could be hugely beneficial for knowledge management inside an organisation, as it offers means to recover buried knowledge without any additional effort from individuals and respecting their natural communication patterns” (p. 13).

“Given the variety and recency of topics people discuss on Twitter, semantic user profiles generated from Twitter posts moreover promise to be beneficial for other applications on the Social Web as well” (Fabian et al[2]., 2011, p.375).

Fabian et al. introduce a method to analyse tweets and the information linked to tweets via URLs. They created semantic user profiles; they then enriched with relevant elements featured in the webpages that are highlighted in the tweets through their URLs. They found out that Twitter user profiles, where URLs are employed in the tweet bodies, could fit in three types of models, based on: hashtag, topic and entity.

To model the profiles, the researchers applied research strategies to discover tweet-news relations, URL categories, content analysis (by employing the Bag of Words method), hashtag analysis, a strategy to identify entities in the text by employing OpenCalais[3] to extract entities and topics from both tweet corpus and the linked webpages. They conclude that their study reveals that analysing tweet bodies could be a way of semantically profiling a Twitter user.

Semantic network analysis

Drieger[4] (2013) proposes to employ a semantic network model to represent “visual text analytics to support knowledge building” (p.4): “Semantic networks allow to model semantic relationships (Sowa, 1991) that are represented in a graph with labelled nodes and edges” (p.4).

semantic_network_euresFigure 1: A semantic network part of EURes Twitter communications in 2012 (The “website” node is not connected to the network as it did not qualify for three minimum links with the other nodes)

The lexical dimension of a semantic network consists of nodes (the words), links (the connection between the words) and labels (the word denomination). Doerfel and Barnett[5] (1999) provided a complete definition of the semantic network analysis when CAQDAS[6] was in its earlier years:

Semantic network analysis, similar to network analysis, is both a research method and a theoretical framework. Semantic network analysis differs from traditional network methods because it focuses on the structure of a system based on shared meaning rather than on links among communication partners. In other words, two nodes are connected in a semantic network to the extent that their uses of concepts overlap (p.589).

Furthermore semantic network analysis implies:

…a theoretical foundation based on cognitive processes. Learning theorists argue that words are hierarchically clustered in memory. Thus, spatial models that illustrate the relationships among words are representative of meaning. As a result, studies have turned to analysis of text with network analysis techniques (Danoswki, 1982; Jang & Barnett, 1995; Rice & Danowski, 1993; Stohl, 1993)” (p.590).

Doerfel and Barnett (1999) acknowledge the use of the WORDij[7] software suite, and in particular of its component, WORDLINK, which has been in place since 1993 to assist with the semantic network analysis:

“Semantic network analysis requires a content analysis of textual data to determine the most frequently used symbols. The analysis then provides the relationship among these symbols and how they co-vary with the members of the social system. Although this process traditionally has been conducted by hand, computer-based analysis software has been developed and used to describe the semantic structure of textual data. For example, see WORDLINK (Danowski, 1993)” (p.591).

The optimal visual representation of a semantic network would be a spring embedded graph, which consists of nodes and arrows. The nodes are connected by arrows “in a two dimensional plane with some separation, while attempting to keep connected nodes reasonably close together. Each node in the graph is modelled as a charged particle, thereby causing a repulsive force between every pair of nodes. Each edge is modelled as a spring that exerts an attractive force between the pair of nodes it connects” (Mutton and Golbeck[8], 2003, p.300).

[1] Gentile, A.L. et al. (2011), Extracting Semantic User Networks From Informal Communication Exchange, in The 10th International Semantic Web Conference, Bonn, Germany

[2] Fabian, A. et al. (2011), “Semantic Enrichment of Twitter Posts for User Profile Construction on the Social Web” in The Semantic Web: Research and Applications, Lecture Notes in Computer Science Volume 6644, pp 375-389

[3] http://www.opencalais.com/

[4] Drieger, Ph. (2013), “Semantic Network Analysis as a Method for Visual Text Analytics” in Procedia – Social and Behavioral Sciences, Vol. 79, pp. 4–17, 9th Conference on Applications of Social Network Analysis (ASNA)

[5] Doerfel, M.L. and Barnett, (1999), “A Semantic Network Analysis of the International Communication Association” in Human Communication Research, Vol. 25/4, pp. 589-603

[6] CAQDAS stands for Computer Assisted Qualitative Data Analysis Software

[7] http://wordij.net/

[8] Mutton, P. and Golbeck J. (2003) Visualization of Semantic Metadata and Ontologies in Proceeding IV ’03 Proceedings of the Seventh International Conference on Information Visualization, p.300

Twitter content analysis: WORDij and LIWC software tools

WORDij and LIWC are two Computer-Assisted Qualitative Data Analysis Software tools (CAQDAS) that could be successfully employed to run a content analysis of tweets.

WORDij

WORDij[1] includes a number of components that serve different research purposes. WordLink is part of the WORDij suite. It extracts word pairs, which “are the basis for creating networks” (Danowski[2], 2012, p.3). The words are the network nodes, which are called “link strengths” that are inter-connected, according to the word co-occurrence frequency.

Word pairs are made based on word proximity, which, according to Danowski, is calculated by a “window that slides through the text, counting all word pairs inside as it moves from word in the full text” (p.4). The application counts “pairs 3 positions before each focal word, as well as those within 3 words after it”, which means that the key parameter is 3 by default. The parameter value can be customised according to one’s needs. An example of a semantic network is pictured in Figure 1.

20_Nodes_SocialEuropefinal100

Figure: A spring embedded graph consisting of nodes and arrows making a semantic network of the top 20 nodes and 3 minimum link values, designed with WORDij software suite, for the Twitter communications of Social Europe, in 2012

The WORDij software suite has mainly been used to carry out “text mining and semantic network analysis” (Yuan et al., forthcoming, p.1). The word pair co-occurrence is explained by Danowski[3] (2013) as follows:

“Defining word-pair link strength as the number of times each word occurs closely in text with another, all possible word pairs have an occurrence distribution whose values range from zero on up. This ratio scale of measurement allows the use of sophisticated statistical tools from social network analysis toolkits. These enable the mapping of the structure of the word network. They identify word groups, or clusters, and quantify the structure of the network at different levels. Using these word-pair data as input to network analysis tools, you map the language landscape. On the map, instead of cities, the nodes are words. Rather than roads, there are links or edges among words” (2013, online).

In a paper introducing the outcomes of a research activity that compared the professional behaviour of high-contact vs. low-contact designers who are members of the LinkedIn platform, Danowski (2012) describes the performances of WORDij, which can detect proximate word pairs in a manner which does not apply the “bag of words” approach. On the contrary, WordLink, the WORDij’s component, counts “words as paired that appeared anywhere in the same profile document” (Danowski, 2012, p.9). The “bag of words” is a research technique where a text corpus is represented as the bag containing its words, ignoring grammar and word order, but preserving the word occurrence.

WordLink therefore detects word pairs “within 3 word positions on either side of each word in the text” (p.9). Danowski used a “stop list” to remove common function words in the content body. He then discarded frequencies of 1 and 2 for words and word pairs, as this “is supported by empirical research in natural language processing” (p.9). Further, he dropped numerals, punctuation, and normalised contractions. He then used advanced WORDij functions to “test for differences in word-pair frequencies” (p.9).

Danowski (2012) argues that WORDij could be employed for automatic link coding with proximities since it overcomes the problems that occur when employing the “bag of words” method:

“While word bags are useful for document retrieval they blur social meaning by ignoring the relationships of social units within the texts, whether these units are words, people, or other entities” (p.217).

LIWC

In support to employing CAQDAS, Gluesing et al[4]. (2009) provide clear evidence that “it is possible to design research that takes full advantage of information technologies to gather large amounts of data for data mining and network analysis, but also to embed qualitative methods in parallel and in a measured, targeted way to maximize the richness of results while minimizing the costs usually involved in long-term, labour-intensive ethnographic studies” (p.25).

Derczynski et al[5]. (2013) consider Twitter a “noisy environment” given the SMS-like behaviour of users, where people tend to use word abbreviations and SMS conventions. To overcome or reduce linguistic noise, one needs to normalise data. That requires additional work for researchers when they analyse Twitter content. It means that the researchers have to transform abbreviations and other conventions into their full word equivalent. The work can be done in two stages:

1) identification of orthographic errors and

2) the correction of the errors (p.7).

Elson et al[6]. (2012) state that until 2012 researchers studied social media content by employing a manual approach: focusing on certain pieces of content and on specific users and interpreting and reporting the findings. The authors mention that the researchers might have analysed the irrelevant pieces of content, so that their findings could apply to particular cases and situations and may not apply to Twitter mass communication.

The authors propose to employ LIWC[7] (The Linguistic Inquiry and Word Count) as a “computerized method to study the content of social media”, which may compensate for the limitations of the “manual” methods and therefore reduce “the chance that their biases affect their interpretations of social media texts” (p.xii).

Despite its later popularity as a piece of CAQDAS, LIWC “is largely untested in political contexts”, which was the case before 2012, when this paper was published.

The authors employed LIWC to “investigate how men and women communicate differently” and realised that the software application “has not been widely applied to understanding a non-Western political context” (p.xii).

Elson et al. (2012) analysed Iranian Twitter users’ opinion and what they felt about the 2009 election in Iran and what were their attitudes towards certain countries. The researchers actually wanted to validate the LIWC methodology on a special piece of Twitter content, which has not been done previously.

It is worth noting that Elson et al. reviewed some of the previous research projects that employed LIWC as a CAQDAS and claim that LIWC provides clear evidence on people’s behaviours, attitudes and their emotions. For example “greater use of first-person singular pronouns… has been shown to suggest feelings of depression” while “second-person or plural pronouns indicate reaching out to others… and a sense of community or group identity” (p.xiii). LIWC has proven to be a reliable method to identify “emotions in written language… and the results it generates are comparable to those produced with other content-analysis methods” (p.xiii). Furthermore “LIWC has been successfully applied more recently to various forms of social media” (p.xiii) and the procedure “holds much promise” conclude Elson et al. (p.xvii).

The LICW software application was developed by the psychologist James W. Pennebaker and his team. LIWC processes texts and provides information about 80 linguistic categories contained in the analysed texts. The LIWC power consists of the ability to output information about the positive and negative emotions carried by the texts:

“LIWC represents only a transitional text analysis program in the shift from traditional language analysis to a new era of language analysis” (Tausczik and Pennebaker[8], 2010, p. 38).

LIWC goes through any type of text, including blogs, novels, speeches, poems etc. and checks each word against a set of dictionaries, which are embedded in the application. The dictionaries define a particular word category and “capture different psychological concepts” (Pennebaker[9], 2011, p. 6). Once the programme has checked and counted all the words, it then “calculates a ratio of the number of words in each word category” (Servi and Elson[10], 2012).

Pennebaker (2011) acknowledges some LIWC limits. According to Pennebaker, word counting programmes cannot detect linguistic nuances such as irony or sarcasm. LIWC also fails “to capture the context of language. One word, for example, can have very different meanings, depending on how it is used” (p.8). However, researchers aim to create “smarter word-count programs that will eventually take into account syntax, grammar, and context in general” (p.9).

References

[1] http://wordij.net/

[2] Danowski, J. A. (2012). “Social network size and designers’ semantic networks for collaboration” in International Journal of Organization Design and Engineering 2(4/2012)

[3] Danowski, J. A. (2013). WORDij version 3.0: Semantic network analysis software, Chicago: University of Illinois at Chicago, Available from http://wordij.net/

[4] Gluesing, J. et al. (2009), “Mixing ethnography and information technology data mining to visualize innovation networks in global networked organizations” in Dominguez S. and Hollstein B. (eds.) Mixed methods in studying social networks, Cambridge University Press

[5] Derczynski, L. et al. (2013), “Microblog-Genre Noise and Impact on Semantic Annotation Accuracy” in Proceedings of the 24th ACM Conference on Hypertext and Social Media

[6] Elson, S.B. et al. (2012), Technical Report: Using Social Media to Gauge Iranian Public Opinion and Mood After the 2009 Election, Rand Corporation

[7] http://liwc.net/

[8] Tausczik, Y. R. and Pennebaker J. W. (2010), “The Psychological Meaning of Words: LIWC and Computerized Text Analysis Methods”, in Journal of Language and Social Psychology, 29 (I) 24-54, Sage Publications

[9] Pennebaker, J.W. (2011), The Secret Life of Pronouns: What our words say about us, New York, Bloomsbury Press, 2011

[10] Servi, L. and Elson, S. B. (2012) “A Mathematical Approach to Identifying and Forecasting Shifts in the Mood of Social Media Users”, in MITRE Technical Report #120090 p. 27-30