News and Misinformation Consumption in Europe: Results and Discussion

cover
7 Jun 2024

Authors:

(1) Anees Baqir, Ca’ Foscari University of Venice, Italy;

(2) Alessandro Galeazzi, Ca’ Foscari University of Venice, Italy;

(3) Fabiana Zollo, Ca’ Foscari University of Venice, Italy and The New Institute Centre for Environmental Humanities, Italy.

3. Results and Discussion

In this section, we present the results of our analysis, organized as follows. First, we provide an overview of the information landscape in selected European countries over the three years. This step is crucial for identifying key topics that are widely shared among countries and distinguishing between questionable and reliable sources, enabling a coherent comparison. Next, we examine both commonalities and differences among countries in their online discussions of these topics, focusing on user engagement and consumption patterns.

3.1 The evolution of Public Discourse across Countries

To compare the landscapes of public discourse in the selected countries, our initial step involves identifying common topics extensively discussed in all four countries and by both questionable and reliable sources. To this aim, we employ BERTopic(Grootendorst, 2022) to perform topic modeling on the content produced by news outlets’ accounts over a three-year period (see Section 2 for further details). To identify suitable topics for our analysis, we divide the dataset by year and by country and run BERTopic algorithm on each subset. The results reported in Figure 1 show the most debated topics for each year by country and source category. The size of each topic represents the number of news sources contributing to it, while its position reflects its relevance to the overarching topics. The flow diagrams show the topic’s prevalence in news outlets over time.

Figure 1 highlights how the attention of news outlets to different topics varied across countries and types of news sources. Notably, in addition to certain topics of common interest, news outlets tended to prioritize subjects of national relevance, such as protests, the influence of foreign countries, religion, electric cars, and drug legalization. We also observe disparities in the topics covered by questionable and reliable sources within the same country. For instance, the fraction of news outlets reporting on the coronavirus vaccine in Italy was higher for reliable sources than for questionable ones. Furthermore, certain topics were exclusive to one type of source, like ”Flights” (Italy, reliable), ”Water management” (France, reliable), or ”Palestinian struggle” (UK, questionable). These findings indicate that the level of interest was influenced both by the country and the type of source considered, with questionable sources displaying a broader range of interests and reliable ones focusing more on topics common to all countries.

Crucially, our analysis highlights the presence of common topics between both questionable and reliable debates of all countries. Specifically, three topics appeared consistently in debates across all countries: “Brexit”(2019), “Coronavirus”(2020), and “Covid Vaccine”(2021). Therefore, in the subsequent analysis, we exclusively focus on these topics for a cross-country examination of the discourse. The rationale behind this choice is to spotlight the differences and similarities in how these topics were reported and consumed by news outlets and users from various countries, thereby minimizing the impact of topic-specific variations on our analysis. Additionally, these topics have been extensively discussed at the European level, making our analysis valuable for understanding how subjects of European significance are perceived across different countries.

To underscore the relevance of the three chosen topics in online public debates and validate the accuracy of the time frames assigned to each topic, we conduct a Google Trends analysis of search

Figure 1: Topic modeling results on questionable and reliable news sources content across countries. The size of each topic is given by the proportion of unique news sources contributing to it. The flows represent the interest shift of news outlets in different topics over time.

interest in Brexit, Coronavirus, and Covid Vaccine in France, Germany, Italy, and the UK from 2019 to 2021, as shown in Figure 2.

The analysis of Google Trends confirms that the selected topics attracted the highest attention during the specified time frames in the broader online context. Thus, going forward, our analysis focuses on these three topics (Brexit, Coronavirus, and Covid Vaccine) to examine the differences and similarities in news production and consumption within the European landscape. To conduct our analysis exclusively on these topics, we filter the timelines of news outlets to select only tweets relevant to the chosen topic within the respective time range (see Section 2 for details).

3.2 User engagement and community structures

We continue our study by comparing the engagement with content related to the identified topics on social media platforms. Figure 3 shows the distribution of tweet interactions by country, computed as the sum of likes, retweets, quotes, and replies, for reliable news sources (blue) and questionable news sources (orange), as classified by NewsGuard (see Section 2), for each of the three topics. Despite minor geographical variations, the distributions of user interactions display a similar long-tailed distribution for all three topics, where a small number of tweets receive a large number of interactions while the majority receive very few. Reliable news sources typically obtained more interactions than questionable

Figure 2: Google Trends analysis of search interest in Brexit, Coronavirus, and Covid Vaccine in France, Germany, Italy, and UK from 2019 to 2021. The plots display how search interest for each topic evolved over time, with each row representing one topic. Interest trends reveal that Brexit was most popular in 2019, followed by a sharp decline in 2020 and 2021 with some exceptions at the end of 2020. Coronavirus peaked in early 2020 and declined thereafter, while Covid Vaccine gained momentum in early 2021, reached the maximum in mid-2021, and saw another surge at the end of 2021. Brackets represent the time span taken into account in the analysis for each topic.

sources, as shown by their wider distribution along the x-axis. However, a few exceptions are observed, such as the case of the UK in COVID-19 vaccine discussions and France in Coronavirus debates. Furthermore, in the Brexit discourse, questionable sources have a notable presence in the tail of the distribution in Germany and Italy, although they are less prominent in other discussions. Overall, the presence of questionable sources and the engagement they generated can vary, contingent on both the country and the specific topic under consideration.

We then turn our attention to news consumption patterns to highlight the differences and similarities in the news outlets’ audiences. Analyzing Twitter data on Brexit, Coronavirus, and Covid Vaccine, we explore whether news outlets of the same type are consumed by similar audiences. We define a metric based on cosine similarity(see Section 2) on retweeters to quantify the similarity between news outlets in terms of audiences. News outlets sharing a high percentage of retweeters have a higher value of the similarity metric (close to 1), while outlets with only a few shared retweeters get a low similarity (close to 0).

We then build an undirected network in which news outlets are represented as nodes and weighted edges indicate the level of similarity among them. We create one network for each country and topic considered to enable a fair comparison. The resulting networks are visualized in Figure 4. To highlight only the stronger connections, we discard edges with weights lower than the overall median of the edges of each network (see Figures 1 and 2 of SI for the results with the complete networks).

We may observe variations in the network structure depending on the country and topic under consideration. Indeed, France, Germany, and Italy tend to display a clearly identifiable cluster of questionable sources (orange triangles), indicating the presence of communities primarily consuming questionable content. In the UK, this distinction is less pronounced. Looking at topic-specific differences, we find that for all countries except the UK, the networks tend to be sparser, with a lower edge density, in the case of Brexit. For Coronavirus and Covid Vaccine discussions, the networks are more connected and exhibit higher edge density (see Table 2 of SI). This is reflected in the separation between questionable and reliable news sources: in the Brexit debate, the separation between the two types of news appears clearer, while in the other debates, they share a higher number of connections, as shown in Table 3 of SI. To quantify this behavior further, we apply the adjusted nominal assortativity to our networks (Karimi and Oliveira, 2022), showing that higher levels of assortativity are achieved in the context of the Brexit debate. However, the UK exhibits different behavior, possibly due to its direct involvement in the debate.

Figure 3: Distribution of tweet interactions by country for reliable (blue) and questionable (orange) news sources around Brexit (top row), Coronavirus (middle row), and Covid Vaccine (bottom row). Tweet interactions are computed as the sum of likes, retweets, quotes, and comments received by each tweet.

Our analysis also reveals that there is no absolute separation between questionable and reliable news outlets. This suggests that some users primarily or exclusively consume reliable or questionable content, while others have a mixed news diet, consuming both types in varying proportions. To delve deeper into this question, we analyze the fraction of questionable news consumed by each user and present the distribution in Figure 5. The results indicate that the majority of users in each debate primarily rely on reliable news sources (see also Table 4 of SI). However, in every debate, there is a small but noticeable fraction of users who exclusively endorse questionable news, although with varying degrees of prominence. Notably, the Figure depicts a distinctive bimodal distribution, with very few users falling outside the extreme ends of the spectrum. These users play a crucial role in bridging the gap between questionable and reliable news within the similarity networks. Furthermore, reliable news sources tend to occupy the core of the network, while questionable sources are generally situated in more peripheral positions. Indeed, among the top 25 sources identified by the PageRank algorithm in each network (Bakshy et al., 2011), a substantial majority (at least 95.3%) are found to be reliable news sources (see SI for further details). We conclude our analysis by examining the community structure of the similarity networks. We perform community detection using the Louvain clustering

Figure 4: Similarity network among news outlets, where each news source is represented as a node, and edges represent audiences’ similarity among news outlets. The color and shape of the nodes indicate the classification of the news source, and the thickness of the edges represents the level of similarity of retweeters between two news sources. We discarded edges with weights lower than the overall median of the edges. Each network represents the news outlets’ similarity on one topic for one country.

algorithm (Blondel et al., 2008) and report the results in Figure 6. Clusters are color-coded based on the proportion of questionable news outlets, with darker shades indicating a higher percentage of questionable sources.

Across all countries and topics, the majority of clusters consisted mainly of reliable news outlets, and within these clusters, we also find the most significant nodes according to the PageRank classification. However, our analysis also reveals the presence of small clusters with a high proportion of questionable news outlets. The number and size of these clusters vary depending on the country and topic. For instance, in Germany and Italy, there is one such cluster for each topic, while in the Brexit debate in France, there are two clusters. In the UK, the separation is less clear, with no clusters showing a high percentage of questionable news outlets. We also notice that reliable clusters tend to be smaller in size but more numerous, while questionable clusters tend to be larger and often unique in each network. This suggested that users who consume questionable content tend to endorse most of the questionable sources of the network, while reliable news consumers focus on fewer news outlets.

Overall, our analysis provides a longitudinal view of the online news consumption landscape in the selected countries, highlighting the predominance of reliable news sources while also revealing the presence of clusters with a higher proportion of questionable news sources in many countries and topics. The existence of such clusters suggests the presence of a group of users consuming content from various questionable sources while avoiding reliable ones. This behavior is consistent with the potential presence of echo chambers, a phenomenon widely observed in online debates (Cinelli et al., 2021; Falkenberg et al., 2022; Cota et al., 2019).

This paper is available on arxiv under CC 4.0 license.