Health Information: A Literature Review on YouTube as a Source for Healthcare Information


Universidad Complutense de Madrid, España

Abstract

The aim of this research is to critically review peer-reviewed literature addressing YouTube as a source for healthcare information. A systematic research has been done through scientific database PubMed, retrieving articles published from Jan. 2017 to Apr. 2020, that included free-access YouTube videos content analysis. Results: 40% of articles reviewed focused their study on illnesses or diseases; median number of videos analyzed is 94; most frequent scoring systems are DISCERN (n=16), followed by JAMA and GQS; the majority of analysis also collect YouTube data and combines it with self-made indicators to measure video’s popularity. Researchers tend to classify contents as unreliable and as of poor-quality, particularly those uploaded by users or general information channels. This review concludes that, in general, it is necessary to improve content’s quality, reliability and usefulness, as well as the relevance of medical institutions and professional-made contents in the platform.

INFORMACIÓN SOBRE LA SALUD: UNA REVISIÓN DE LA LITERATURA EXISTENTE SOBRE YOUTUBE COMO FUENTE DE INFORMACIÓN SANITARIA

El objeto de esta investigación es evaluar la literatura académica existente sobre el uso de YouTube como fuente de información médica. Para ello, se ha realizado una revisión de la literatura disponible en la base de datos PubMed. Se seleccionaron los artículos publicados entre enero de 2017 y abril de 2020, cuyas técnicas de investigación incluyesen análisis de contenido y revisiones de vídeos de acceso abierto colgados en YouTube. Resultados: el 40% de los artículos revisados tienen como objeto de estudio la información sobre una enfermedad o afección; la mediana de vídeos analizados por artículo es de 94; los criterios de puntuación más empleados son DISCERN (n=16), seguido de JAMA y de escalas GQS; la mayor parte recoge análisis de variables descriptivas intrínsecas a cualquier vídeo de YouTube, complementadas con indicadores propios para determinar su popularidad; en la revisión predominan los contenidos calificados como poco fiables y de baja calidad, particularmente aquellos subidos por canales no especializados. Esta revisión concluye que es necesario mejorar los contenidos médicos y sanitarios en YouTube, especialmente en los que respecta a la calidad, fiabilidad y utilidad de los vídeos, así como la presencia de fuentes profesionales e instituciones médicas en la plataforma.

Keywords

YouTube, Health information, Healthcare information on YouTube, Healthcare information on Internet, e-health.

INTRODUCTION

There are numerous social platforms that can be analysed as online information tools on health pathologies, but a number of reasons led us to focus this research on the YouTube platform. It is a free, open access, video-sharing website of American origin.

It is positioned as the second most visited website, both in Spain and worldwide, only behind Google.com in both cases. In 2020, fifteen years after its creation, it is positioned as the third most visited website in terms of the number of sites that generate traffic to it: more than one and a half million (Alexa Internet, 2020).

Globally, as shown in Figure 1, YouTube is the second most popular social network with 2 billion active users (Statista, 2020). What is remarkable is that 40% of the audience of this platform is a consumer of educational products and services (Alexa Internet, 2020). This led us to investigate the usefulness, quality and popularity, among others, of YouTube as an information tool for human health-related content.

https://typeset-prod-media-server.s3.amazonaws.com/article_uploads/a773624f-d6dd-40b5-98b6-41112def874d/image/ad9daa0e-0d38-423f-93c6-c90df4092944-ureplace-98.png
Figure 1: Top social networks by number of users

Source: Statista (2020)

Interest in human health and healthy lifestyle habits has increased over time and this, together with the facilities offered by YouTube for disseminating information, has led to a considerable increase in the number of YouTube videos dealing with health-related topics in all areas.

The relevance of YouTube as a source of information for health-related topics is demonstrated by research such as that of Mustafa, Taha, Alshboul, Alsalem, and Malki (2020), which concluded that 91.2% of medical students surveyed reported using YouTube as a source of information for their studies.

Despite its advantages as an information medium, it is also important to bear in mind its limitations and the problems that can arise from the dissemination of videos with unreliable or low-quality content on health issues. In this sense, studies such as that of Aubrey, Speno, and Gamble, (2020) show the negative power that certain biased information can have on the self-esteem and mental health of adolescent women.

This situation has led, as Ruppert et al. (2017) state, to an increased interest in the safety aspects of health information provided by social media. Much research, including that of Chalil, Rivera-Rodríguez, Greenstein, and Gramopadhye (2015), which this paper aims to follow up, has examined the quality of health information in YouTube videos.

OBJECTIVES

General objective

To analyse the academic literature on the use of the social network YouTube as a source of information in the healthcare field.

Specific objectives

• Identify research that uses YouTube as a source of information.

• Evaluate what kind of methodologies these research studies employ.

• Compare the results achieved by the different studies.

METHODOLOGY

This paper adopts a systematic review analysis methodology of the literature available in the PubMed database. This choice was made for the following reasons:

1º It is a specific repository of medical journals and related sciences, which allows us to limit the sample of articles to this field.

2º We are talking about a reputable database in the scientific field which, in turn, facilitates access to content hosted on Medline (Drozd, Couvillon, & Suárez, 2018).

Inclusion and exclusion criteria

For the selection of articles, the following inclusion and exclusion criteria were defined:

• The article was published between 1 January 2017 and 15 April 2020;

• The object of study is detailed in the title and/or abstract and addresses "the evaluation of YouTube content as a source of medical-health information";

• The research techniques employed include content analysis and systematic reviews of videos available on the platform and accessible to the general public;

• The search criteria used, the methodological design and the main findings are described in the abstract and/or in the body of the article.

Identification of items

The initial search took place between 15 and 22 April 2020, using as search terms and logical operators 1) "YouTube" and 2) "YouTube AND source of information". Given the narrowness of the search for the PubMed database described above, it was not necessary to use terms that further narrowed the field of study.

The initial search was carried out by one of the authors, from which 1397 results were obtained. This selection was filtered by publication date to reduce the analysis sample.

After screening by chronological criteria, 748 articles were obtained, to which the remaining inclusion criteria were applied, resulting in a total of 80 articles to be analysed.

Thirty-eight articles were discarded after a joint review by the three authors. Articles that approached the object of study from a purely theoretical perspective were excluded as they were not comparable with the rest of the selection, as well as those that referred to the use of social networks in general as they had a broader focus than the one proposed

for this review. The search and selection process by criteria is shown in figure 2.

Figure 2. Search process and selection of articles for review

Source: Own elaboration

Thus, the final selection of articles is made up of 42 research studies, published between 2017 and 2020, in which the use of YouTube as a source of information in the medical-health field is addressed as an object of study.

RESULTS

Themes and objects of study

The systematic review of the literature revealed a wide diversity of research topics. Thus, 40% of the articles reviewed (n=17) had information (diagnosis, symptoms) about a disease or condition, such as oral leukoplakia, as the object of study (Kovalski et al., 2019). Secondly, 26% (n=10) of studies analysed specific content about medical treatments, taking the treatment of glioblastoma multiforme as an example (ReFaey et al., 2018). This is followed in number (n=8) by analyses of videos specific to surgical procedures, such as cataract surgery (Bae and Baxter, 2018). The last category is made up of all those investigations whose object of study is not assimilable to the previous ones. These results are reflected in figure 3.

https://typeset-prod-media-server.s3.amazonaws.com/article_uploads/a773624f-d6dd-40b5-98b6-41112def874d/image/59907dc9-ebc5-4b4f-b618-9ad6cdc46873-ureplace-99.png
Figure 2: Most frequent themes in recent academic literature

Source: Own elaboration

Specifically, the most frequent research topics have been colorectal cancer, addressed by Brar, Ferdous, Abedin and Turin (2020) and by Şahin, Şahin and Türkcü (2019), and psoriasis treatment, addressed by Lenczowski and Dahiya (2018) and by Pithadia, Reynolds, Lee and Wu (2019).

Sample selection

From the review of the 42 articles, the median number of videos analysed per article is 94, with a standard deviation (σ) of 103.9. Particularly noteworthy is the research by Devendorf, Bender and Rottenberg (2020), which addresses the conceptualisation, images and information about depression in 327 YouTube videos, selecting up to 50 videos for each search keyword.

On the other hand, in 35% of the cases (n=15), researchers define their selection from the search results of the platform, previously determining the total number of videos to be analysed. This is the case of Basch, Wahrman, MacLean and Garcia (2019) on E. coli bacteria (n=100), Basch, Brown, et al. (2018) on skin whitening (n=100) or Ovenden and Brooks (2018), regarding the quality of videos on anterior cervical discectomy and fusion (n=50).

Other authors also define the sample on the basis of videos shown in the top 10 result pages on YouTube, such as Lenczowski and Dahiya (2018) or Jain, Abboudi, Kalic, Gill and Al-Hasani (2019) in their research on transrectal ultrasound-guided prostate biopsies (n=41).

Methodological design

Practically all of the research examined adopts content analysis as a research technique, using different measurement instruments adapted to the field of health studies.

Approximately one in four (n=16) of the research collected includes the DISCERN scoring system (Charnock, Shepperd, Needham and Gann, 1999) as the most frequent methodological design. As outlined by Aydin and Aydin (2020), this instrument consists of sixteen questions distributed in three sections which, when applied to a given content, allow for an assessment of its reliability, the quality of the information provided about treatments and/or other medical options, as well as an overall assessment of the content. Each question is scored between 1 and 5, with a maximum score of 80 points.

Thus, an adaptation of this instrument is used by authors such as Esen, Aslan, Sonbahar and Kerimoğlu (2019) to examine the content, quality and reliability of YouTube videos related to breast self-examination. In that research, the scale is reduced to values between 1 and 5 points to be applied to a selection of 87 videos categorised into "useful information" (n=33) and "misinformation" (n=54). On average, videos categorised as useful scored 3.4 ± 0.9, while those categorised as misinformation scored 1.0 ± 0.7.

In contrast, Szmuda et al. (2020), when applied to measure the quality and reliability of YouTube videos related to narcolepsy, find that the mean score on the DISCERN scale (0-80) of the selection of 80 videos analysed is 27.0 ± 8.0 points, arguing that the content available on this platform about narcolepsy is generally of low quality.

Other researchers such as Ferhatoglu, Kartal, Ekici and Gurkan (2019), Borno et al. (2020) and Tripathi et al. (2020) also include this instrument in their methodological design.

Additionally, much of the research complements the DISCERN scale with other content assessment methods, including the JAMA benchmark and the Global Quality Score (GQS) scale, along with other methods adapted from reputable sources, such as the National Institute on Deafness and Other Communication Disorders (Basch, Yin, et al., 2018) or the American Thoracic Society and American College of Chest Physicians (Lashari et al., 2019).

The JAMA (Journal of American Medical Association) benchmark is a research tool that measures the quality of patient-facing information based on four criteria: 1) authorship; 2) attribution; 3) disclosure; and 4) timeliness. In this way, it assesses whether or not the content meets each criterion, giving 1 point if it does, 0 if it does not, and 4 being the maximum total score (Aydin and Aydin, 2020).

As a sample, Cassidy et al. (2018) combine the DISCERN scale with the JAMA benchmark and a proprietary metric to assess the quality of information on YouTube about anterior cruciate ligament rupture and reconstruction. Specifically, by analysing YouTube content (n=39) based on keywords such as "ACL" or "Anterior Cruciate Ligament", they found that on the modified DISCERN scale no video scored more than 3 out of a maximum of 5 points, while on the JAMA scale, 33% (n=13) of the videos scored 3 out of a possible 4 points.

Another example of the use of the JAMA scale is the research on the reliability, accuracy and quality of YouTube videos on intubation procedures conducted by Ocak (2018), obtaining for the 50 videos analysed a mean score of 1.5 ± 0.8, being significantly higher (p= 0.00055) in those videos produced by medical professionals (1.9±0.8).

Finally, the use of GQS scales and methodologies adapted from the guidelines of medical institutions should be highlighted. The former are used as a methodological complement in five investigations, while the latter are observed in works such as that of Ferhatoglu et al. (2019), in which they evaluate the quality and accuracy of videos related to vertical sleeve gastrectomy procedures.

GQS scales are tools designed for the evaluation of internet content and resources. As explained by Kocyigit, Nacitarhan, Koca and Berk (2019), it is a 5-value scale that corresponds to the level of usefulness of the content for the patient. Thus, Kunze et al. (2019) combine this tool with a 20-variable specific system (PLCS) to assess the reliability and educational content of videos on posterior cruciate ligament injuries of the knee. In this research, the authors found that the mean score of the analysed videos (n=50) was 2.02 on the JAMA scale, 2.3 on the GQS scale and 2.9 points out of 22 on the PLCS scale.

Descriptive content analysis

Most of the research includes in its analysis the descriptive variables intrinsic to any YouTube video. The most frequent variables in the research are the number of likes, number of dislikes, duration of the content, total number of views, total number of days uploaded to the platform, average viewing time, number of times shared and comments. Less frequent are the number of subscribers to the YouTube channel and the country of origin.

In their research on vaccine refusal, Donzelli et al. (2018) first measure the accessibility of this data in a selection of 560 videos, and then relate it to the tone of the video content. Moreover, research such as Ekram, Debiec, Pumper and Moreno (2019), Brar, Ferdous, Abedin and Turin (2020) and Tripathi et al. (2020), among others, explore correlations between these variables and content quality/reliability with mixed results.

On the other hand, studies by Nguyen and Allen (2018), Özdal Zincir, Bozkurt and Gaş (2019) and Fortuna, Schiavo, Aria, Mignogna and Klasser (2019), among others, include their own indicators that seek to quantify the popularity, momentum or interaction of the videos analysed, looking for correlations with other relevant variables.

As an example, Ocak, (2018) uses an indicator called Video Power Index to quantitatively determine the popularity of each content, including viewing ratios and likes ratios.

Sources and authorship of the videos analysed

The identification of the source and its categorisation is another common element in the research analysed. Although authors such as ReFaey et al. (2018) or Fernández-Llatas, Traver, Borrás-Morell, Martínez-Millana and Karlsen (2017) limit the search to articles that come exclusively from academic sources, the vast majority (95%) of the research reviewed distinguishes different types of sources or content authors in their analyses.

In this regard, Ocak (2018) reports that, in the case of informative YouTube videos on intubation procedures, the majority (92%) have been published by healthcare professionals. In others, the distribution of content across author types is wider. This is agreed by Bae and Baxter (2018), who reflect that 71% (n=51) of videos on cataract surgery methods as an educational resource were posted by physicians and healthcare professionals, as were Şahin, Şahin and Türkcü (2019), in their analysis of videos on retinopathy of prematurity.

In contrast, other research highlights the role of users as predominant sources. This is the case of Devendorf et al. (2020) who find that one third (n=118) of YouTube videos on depression come from non-professional sources, compared to 9% (n=32) from mental health organisations. Consistent with this observation are Lenczowski and Dahiya (2018), noting that 71% (n=144) of videos analysed on psoriasis treatment came from sources without a clear medical context, as well as Di Stasio et al. (2018), who find a higher share of videos coming from generalist channels versus those uploaded by medical or professional channels.

DISCUSSION

The number of research studies published in recent years on the contents of this platform, together with the huge number of videos analysed (n=5124), highlight the relevance that the platform is acquiring in providing health information for patients, professionals and academics alike. As previously discussed, the content being addressed on the platform ranges from information and diagnosis of relatively common conditions such as knee pain to complex surgical procedures.

Thus, a number of implications for users, medical professionals, health institutions and patients can be deduced from this review. Specifically, in terms of the relationship between variables such as the quality, usefulness, reliability and accuracy of the information and other platform-specific variables such as popularity and interaction with the content.

In terms of the reliability and quality of YouTube videos, there is a general tendency to rate content as unreliable and of low quality. These findings are consistent with the results of the reviews by Chalil et al. (2015), and Okagbue et al. (2020).

In particular, the lowest results are observed for videos that have been produced by non-specialised users and channels. The rationale for this phenomenon is that users tend to upload testimonials and videos giving opinions on treatments or conditions. This type of information tends to be more misleading than that published by experts, as Esen et al. (2019) report for videos on breast self-examination.

In turn, these types of videos featuring patient experience or testimonials have higher rates of viewing and popularity, generally measured by the number of likes. This is demonstrated by Basch, Yin, et al. (2018) who find higher viewing rates for videos uploaded by consumers and news sources, as well as Ferhatoglu et al. (2019) who identify a negative correlation between the video power index and the JAMAS scale score.

However, as Loeb et al. (2019) point out, a higher number of views and approval on YouTube does not guarantee that the information is reliable, nor is it associated with its completeness (Sahin, Sahin, Schwenter, & Sebajang, 2019).

Another factor to consider is the advertising use of this type of content, which is generally opposed to its reliability and usefulness, especially when it occurs in videos from non-professional channels. In this regard, Basch, Brown, et al. (2018) note that the likelihood of commercial interest in skin whitening videos sourced from digital media was 17 times higher than in videos from 'anonymous' users. Similarly, research examining YouTube as a source of information about prostate cancer clinical trials by Borno et al. (2020) found an advertising bias in 10% of the videos analysed.

This lack of reliability is particularly relevant for those aspects of medicine that have recently been the subject of controversy, such as vaccination against viral diseases. This is the conclusion of Donzelli et al. (2018), who found that anti-vaccine content is up to three times more numerous and viral than so-called "pro-vaccine" content. These videos, on the other hand, are 4 times more likely to provide accurate information. (Ekram et al., 2019)

However, research such as that of Tolu, Yurdakul, Basaran and Rezvani (2018) on self-administered subcutaneous injections shows that, in certain contexts and when authored by professional sources, reliability and usefulness can correlate positively with the number of views. Moreover, this type of content, which is often intended for patients, can be particularly valuable if it contains accurate and detailed information. Kovalski et al. (2019) confirm this hypothesis by finding a positive correlation between trustworthiness and increased interaction in videos about oral leukoplakia.

Under these premises, the source of the video becomes particularly important. A clear conclusion emerges from the review: videos published by medical professionals or institutions tend to score systematically better in the different evaluation systems, as opposed to those uploaded by users or other types of sources. This is expressed by Ferhatoglu et al. (2019), Şahin et al. (2019) or Pons-Fuster, Ruíz Roca, Tvarijonaviciute and López-Jornet (2020) in their research. Among other characteristics, these videos tend to be longer (Brar et al., 2020), with more detailed information and generally with an educational purpose.

In this line, the research by Kocyigit et al. (2019) is noteworthy, in which they conclude that almost half of the videos analysed on exercises for the treatment of ankylosing spondylitis were classified as high quality, based on the DISCERN scale.

All in all, this review highlights the need for further research on the reliability, accuracy and usefulness of medical-health content on YouTube, especially in the face of the increasing digitisation of healthcare, the integration of mHealth devices and the now unavoidable digitisation of almost all spheres of reality.

To conclude, some limitations of this work derive from its very nature. Firstly, it is difficult to make an exhaustive comparison of findings across research, given the diversity of topics. In addition, it is equally difficult to extrapolate conclusions applicable to all YouTube videos with medical-health content, as the sample is vast in terms of subject matter and selection.

Similarly, the divergence in methodological designs, search criteria and measurement instruments limits the drawing of conclusions as the studies collected are not completely comparable.

CONCLUSIONS

General conclusions

• Content linked to medical issues that generate controversy tends to show considerably less reliability, particularly in the case of anti-vaccine content. They are also more likely to go viral.

• Videos posted by medical institutions or professionals score better in the different rating systems than those uploaded by other sources.

• Limitations in analysing the findings have been noted, due to the breadth of topics covered and divergences in the methodological designs employed. However, this opens up new opportunities for future research.

Specific conclusions

• Content analysis is the most frequently repeated technique in all research. Most of the research uses the DISCERN scoring criteria, followed by the JAMA and GQS scales.

• The authorship of the videos is divided between media, health organisations and institutions, professional users and anonymous users.

• Overall, the content available on YouTube on medical and health topics is of low quality and unreliable.

• Videos posted by users tend to be testimonial or opinionated, a type of information that is less accurate than that provided by experts. However, these videos and those posted by non-expert channels tend to score lower in terms of quality and reliability.

• There is no clear conclusion on the correlation between the number of YouTube views or approvals and reliable and comprehensive information; high popularity does not ensure the reliability of the content.

• Most of the research agrees in recommending that institutions and healthcare professionals increase their activity on YouTube, taking advantage of the potential of the social network and responding to the need for more reliable and higher quality information.

BIBLIOGRAPHICAL REFERENCES