Case study is often defined in different ways, reflecting evolving practice. What is important then is to define the concept for yourself, and explain to your audience how you are using the term.

Some definitions

Case study involves a detailed in depth analysis of an organisation, person, a group, an event, allowing an understanding of complex phenomena, such as organisations. A case study generally involves looking at a single case (which already exists), an object of study which is easily identified and separated (a bounded system) from other similar objects e.g. an organization, a place, an illness in one patient. Case study is a useful methodology for focusing on relationships connecting everyday practices in natural settings, placing attention on a local situation (Stake, 2006).

The case study is useful to investigate an issue in depth and ‘provide an explanation that can cope with the complexity and subtlety of real of life situation’ (Denscombe, 2010, p. 55).

Research questions revolve around ‘How?’ or ‘Why?’ and may be explanatory, exploratory or descriptive in nature (Yin, 2003).

Case study can be used to develop theory. Yin (2003, p. 1) notes that a case study is a way to ‘contribute to our knowledge of individual, group, organisational, social, political and related phenomena’ Case study can be used to test theory: what is it supposed to do and does it do that? Case studies can be used to trace a process, developing an understanding and then test it (Bennett, Andrew).

  • 1 Data collection
  • 2 Multiple case studies
  • 3 References and resources

Data collection [ edit | edit source ]

Case studies generally use a combination of data collection methods.

Multiple case studies [ edit | edit source ]

In multiple cases, research single cases are meaningful in relation to the other cases cited. Multiple case study research needs to use cases that are similar in some ways. The cases become "members of a group or examples of a phenomenon" (Stake, 2006, p. 6). This allows examination of what is similar and dissimilar about the cases. The researcher is looking for patterns and uniqueness, particulars and generalizations in the cases developed.

References and resources [ edit | edit source ]

Denscombe,Martyn (2010)(4th ed). The good research guide for small scale social research projects . Maidenhead: Open University Pres McGraw Hill

Dufour, S. & Foutin, V., ‘Annotated bibliography of case study method’, Current Sociology vol.40/1, 1992, pp.166-181.

Fidel, R. (1984). ‘The Case Study Method: A Case Study’, Library and Information Science Research vol.6/3, pp.273-288.

Garson, G.D. (2008). Case Studies , available from http://faculty.chass.ncsu.edu/garson/PA765/cases.htm

Gerring, J. (2007). Case Study Research: Principles and Practices , Cambridge University Press, Cambridge.

Gilbertson, D. W. & Stone, R. J. (1985) (2nd ed). Human resources management: cases and readings . Sydney: McGraw-Hill, 1985.

Giving, L. M. (2008) (ed.), The SAGE Encyclopedia of Qualitative Research Methods, Los Angeles: Sage.

Hossain, Dewan Mahboob (2009). 'Case Study Research' Social Science Research Network http://ssrn.com/abstract=1444863

Marshall, C. & Rossman, G.B. (2006) (4th ed). Designing qualitative research , Thousand Oaks: Sage.

Merriam S. (1998). Qualitative research and case study application in education . San Francisco: Jossey Bass.

Ragin, C.C. & Becker, H.S. (1992), What is a Case? Exploring the foundations of social enquiry , Cambridge University Press, Cambridge.

Sadler, D. Royce (1985). ‘Evaluation, Policy Analysis and Multiple Case Studies: Aspects of focus and sampling’, Educational Evaluation and Policy Analysis , vol.7/2, pp.143-149.

Simons (2009). Case study research in practice . London: Sage

Stake, R. E. (1995). The art of case study research . Thousand Oaks: Sage Publications.

Stake, R.E. (2006), Multiple Case Study Analysis, New York & London: The Guildford Press.

Soy, Susan K. (1997). The case study as a research method . Available from http://www.ischool.utexas.edu/~ssoy/usesusers/l391d1b.htm

Stoecker, R., ‘Evaluating and rethinking the case study’, The Sociological Review vol.39, no.1, February 1991, pp.88-112.

Yin, R.K. (1989). ‘Case study research design and method’. Applied Social Research Methods Series 5. Newbury Park: Sage

Young, Raymond (2010). Case study research http://ise.canberra.edu.au/raymond/?s=case+study

Zach, L. (2006), ‘Using multiple case studies design to investigate the information-seeking behaviour of arts administrators’, Library Trends vol.55/1, pp.4-21.

See also [ edit | edit source ]

  • Topic:Business case studies
  • Portal:Social entrepreneurship/Case Studies/more
  • Wikiversity:Case studies
  • CisLunarFreighter/Scripts and case studies
  • Case studies in patent litigation
  • Portal:Social entrepreneurship/Case Studies
  • Evidence-based medicine/Case studies
  • Case study: Blended design and openness
  • Case study in psychology

case study wikipedia

  • Case studies
  • Research methods
  • Resources with related material at Wikipedia

Navigation menu

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

  • Knowledge Base

Methodology

  • What Is a Case Study? | Definition, Examples & Methods

What Is a Case Study? | Definition, Examples & Methods

Published on May 8, 2019 by Shona McCombes . Revised on November 20, 2023.

A case study is a detailed study of a specific subject, such as a person, group, place, event, organization, or phenomenon. Case studies are commonly used in social, educational, clinical, and business research.

A case study research design usually involves qualitative methods , but quantitative methods are sometimes also used. Case studies are good for describing , comparing, evaluating and understanding different aspects of a research problem .

Table of contents

When to do a case study, step 1: select a case, step 2: build a theoretical framework, step 3: collect your data, step 4: describe and analyze the case, other interesting articles.

A case study is an appropriate research design when you want to gain concrete, contextual, in-depth knowledge about a specific real-world subject. It allows you to explore the key characteristics, meanings, and implications of the case.

Case studies are often a good choice in a thesis or dissertation . They keep your project focused and manageable when you don’t have the time or resources to do large-scale research.

You might use just one complex case study where you explore a single subject in depth, or conduct multiple case studies to compare and illuminate different aspects of your research problem.

Here's why students love Scribbr's proofreading services

Discover proofreading & editing

Once you have developed your problem statement and research questions , you should be ready to choose the specific case that you want to focus on. A good case study should have the potential to:

  • Provide new or unexpected insights into the subject
  • Challenge or complicate existing assumptions and theories
  • Propose practical courses of action to resolve a problem
  • Open up new directions for future research

TipIf your research is more practical in nature and aims to simultaneously investigate an issue as you solve it, consider conducting action research instead.

Unlike quantitative or experimental research , a strong case study does not require a random or representative sample. In fact, case studies often deliberately focus on unusual, neglected, or outlying cases which may shed new light on the research problem.

Example of an outlying case studyIn the 1960s the town of Roseto, Pennsylvania was discovered to have extremely low rates of heart disease compared to the US average. It became an important case study for understanding previously neglected causes of heart disease.

However, you can also choose a more common or representative case to exemplify a particular category, experience or phenomenon.

Example of a representative case studyIn the 1920s, two sociologists used Muncie, Indiana as a case study of a typical American city that supposedly exemplified the changing culture of the US at the time.

While case studies focus more on concrete details than general theories, they should usually have some connection with theory in the field. This way the case study is not just an isolated description, but is integrated into existing knowledge about the topic. It might aim to:

  • Exemplify a theory by showing how it explains the case under investigation
  • Expand on a theory by uncovering new concepts and ideas that need to be incorporated
  • Challenge a theory by exploring an outlier case that doesn’t fit with established assumptions

To ensure that your analysis of the case has a solid academic grounding, you should conduct a literature review of sources related to the topic and develop a theoretical framework . This means identifying key concepts and theories to guide your analysis and interpretation.

There are many different research methods you can use to collect data on your subject. Case studies tend to focus on qualitative data using methods such as interviews , observations , and analysis of primary and secondary sources (e.g., newspaper articles, photographs, official records). Sometimes a case study will also collect quantitative data.

Example of a mixed methods case studyFor a case study of a wind farm development in a rural area, you could collect quantitative data on employment rates and business revenue, collect qualitative data on local people’s perceptions and experiences, and analyze local and national media coverage of the development.

The aim is to gain as thorough an understanding as possible of the case and its context.

Receive feedback on language, structure, and formatting

Professional editors proofread and edit your paper by focusing on:

  • Academic style
  • Vague sentences
  • Style consistency

See an example

case study wikipedia

In writing up the case study, you need to bring together all the relevant aspects to give as complete a picture as possible of the subject.

How you report your findings depends on the type of research you are doing. Some case studies are structured like a standard scientific paper or thesis , with separate sections or chapters for the methods , results and discussion .

Others are written in a more narrative style, aiming to explore the case from various angles and analyze its meanings and implications (for example, by using textual analysis or discourse analysis ).

In all cases, though, make sure to give contextual details about the case, connect it back to the literature and theory, and discuss how it fits into wider patterns or debates.

If you want to know more about statistics , methodology , or research bias , make sure to check out some of our other articles with explanations and examples.

  • Normal distribution
  • Degrees of freedom
  • Null hypothesis
  • Discourse analysis
  • Control groups
  • Mixed methods research
  • Non-probability sampling
  • Quantitative research
  • Ecological validity

Research bias

  • Rosenthal effect
  • Implicit bias
  • Cognitive bias
  • Selection bias
  • Negativity bias
  • Status quo bias

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.

McCombes, S. (2023, November 20). What Is a Case Study? | Definition, Examples & Methods. Scribbr. Retrieved April 16, 2024, from https://www.scribbr.com/methodology/case-study/

Is this article helpful?

Shona McCombes

Shona McCombes

Other students also liked, primary vs. secondary sources | difference & examples, what is a theoretical framework | guide to organizing, what is action research | definition & examples, what is your plagiarism score.

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here .

Loading metrics

Open Access

Peer-reviewed

Research Article

Wikipedia as a tool for contemporary history of science: A case study on CRISPR

Roles Conceptualization, Data curation, Formal analysis, Funding acquisition, Investigation, Methodology, Project administration, Resources, Software, Supervision, Validation, Visualization, Writing – original draft, Writing – review & editing

¶ ‡ OB and RA are joint senior authors on this work.

Affiliations System Engineering and Evolution Dynamics, Inserm, Université Paris Cité, Paris, France, Learning Planet Institute, Paris, France

Roles Data curation

Roles Software

ORCID logo

Affiliation Bezalel Academy of Arts and Design, Jerusalem, Israel

Roles Funding acquisition, Methodology, Project administration, Writing – original draft

* E-mail: [email protected]

  • Omer Benjakob, 
  • Olha Guley, 
  • Jean-Marc Sevin, 
  • Leo Blondel, 
  • Ariane Augustoni, 
  • Matthieu Collet, 
  • Louise Jouveshomme, 
  • Roy Amit, 
  • Ariel Linder, 
  • Rona Aviram

PLOS

  • Published: September 13, 2023
  • https://doi.org/10.1371/journal.pone.0290827
  • Peer Review
  • Reader Comments

Fig 1

Rapid developments and methodological divides hinder the study of how scientific knowledge accumulates, consolidates and transfers to the public sphere. Our work proposes using Wikipedia, the online encyclopedia, as a historiographical source for contemporary science. We chose the high-profile field of gene editing as our test case, performing a historical analysis of the English-language Wikipedia articles on CRISPR. Using a mixed-method approach, we qualitatively and quantitatively analyzed the CRISPR article’s text, sections and references, alongside 50 affiliated articles. These, we found, documented the CRISPR field’s maturation from a fundamental scientific discovery to a biotechnological revolution with vast social and cultural implications. We developed automated tools to support such research and demonstrated its applicability to two other scientific fields–coronavirus and circadian clocks. Our method utilizes Wikipedia as a digital and free archive, showing it can document the incremental growth of knowledge and the manner scientific research accumulates and translates into public discourse. Using Wikipedia in this manner compliments and overcomes some issues with contemporary histories and can also augment existing bibliometric research.

Citation: Benjakob O, Guley O, Sevin J-M, Blondel L, Augustoni A, Collet M, et al. (2023) Wikipedia as a tool for contemporary history of science: A case study on CRISPR. PLoS ONE 18(9): e0290827. https://doi.org/10.1371/journal.pone.0290827

Editor: Claire Seungeun Lee, University of Massachusetts Lowell, UNITED STATES

Received: March 5, 2023; Accepted: August 16, 2023; Published: September 13, 2023

Copyright: © 2023 Benjakob et al. This is an open access article distributed under the terms of the Creative Commons Attribution License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: All relevant data are within the manuscript and its Supporting Information files.

Funding: Thanks to the Bettencourt Schueller Foundation long term partnership, this work was partly supported by the LPI Research Fellowship, Université de Paris, INSERM U1284, to RAv and OB. RAv’s work was supported in part at the Technion by a fellowship of "The Israel Academy of Science and Humanities”. In either case, the funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

Introduction

In recent years, the historically qualitative field of history of science has undergone a data revolution [ 1 ], with research increasingly making more use of big data and computational techniques for historical ends [ 2 ]. Alongside the rise of digital humanities, a divide has persisted between quantitative historical research and textually rich qualitative work. The result is a historiographic lacuna [ 3 ] and a debate regarding the materials and methods which can be researched [ 4 , 5 ]. Here, we suggest a new resource can be utilized for the context of history of contemporary science by systematizing existing research methods on an unlikely arena that is rich in both bibliometric data and historical text: Wikipedia.

Wikipedia’s volunteer-run-editorial process was long lambasted as inherently unreliable. When in 2005 Wikipedia was found by a study by Nature ’s news site to be as reliable as Encyclopedia Britannica—the news was met with surprise [ 6 , 7 ]. Issues of accuracy in the encyclopedia anyone could edit, along with “edit wars” between volatile unknown editors, all became a topic of research [ 7 – 9 ]. In recent years however, this narrative has reversed: both academic research and the media have praised Wikipedia’s coverage and reliance on sources, which in some cases have been found to be in lock step with science [ 7 , 10 , 11 ], especially in regards to the COVID-19 pandemic [ 12 , 13 ].

If Wikipedia is indeed meetings its self-defined goal of representing the scientific consensus on scientific knowledge while making use of scientific sources, then we wanted see if this was also true historically: Can Wikipedia articles document shifts in science over time ? Using Wikipedia’s edit history, we hypothesized, could allow to see changes in how a field was represented in the past and allow the tracking of citations, with new ones being added and old ones removed as new knowledge accumulated, sparking reassessments of existing paradigms. Our first study showed how two Wikipedia articles on the Nobel Prize-winning field of circadian clocks managed to accurately reflect the field for over 15 years, even as it underwent scientific revolutions [ 14 ]. Others have used Wikipedia in a similar manner to focus on documentation of political events such as the 2011 Egyptian Revolution [ 15 ].

Wikipedia provides a rich, open, and accessible source of information as past versions of all articles can be viewed through what is termed the changelog. This continuum of text throughout time compliments the traditional historical practice of textual analyses, yet in this case the reading is of changing versions of the same text as opposed to comparing different scientific reviews and papers. For us, this feature raised the possibility to map the changes of specific parts of the article’s text, structure and references and easily track new additions and deletions.

The study of academic sources and publications are mainstays of the history of science. Both qualitative and quantitative researchers make use of them—be it for bibliometric analysis or thick description [ 4 , 16 ].

The historical methods born with historian Derek J. de Solla Price that made use of publication data [ 17 ] joined the works of earlier historians like Robert K. Merton that laid the historiographic framework for research into the scientific revolution [ 18 ]. Later on, sociological works, written by historians like Robert Darnton on the history of books, offered a qualitative detail-rich chronicle of the rise of scientific media during the Enlightenment, substantiating the scientometrics of history with rich detail [ 19 , 20 ].

Unlike academic publications focused on the state-of-art of the field or review papers coverage of the aforementioned, Wikipedia does not aim to publish original research—it only reflects the scientific consensus based on already published sources. In that, Wikipedia also provides a textually rich base for qualitative work regarding narrativization and the communication of science outwards from academia to the public. Importantly, policies enforced by Wikipedia’s editors require “verifiable” sources to back all factual claims [ 21 ], and every article has a reference list. A small but growing body of bibliometric research based on Wikipedia has also emerged [ 22 , 23 ] and even found that on medical [ 24 ] and science [ 8 ] topics English-language articles have an explicit bias towards using academic sources.

Wikipedia thus easily lends itself to such research, providing both data and text that can be used for historical analysis. In this work, we demonstrate this through a case study on the CRISPR field.

In a relatively short period of time, CRISPR-based gene-editing tools have been labeled the scientific “breakthrough” of the 21 st century [ 25 ]. While CRISPRs were identified in the 1980’s, and received their name in 2002 [ 26 ], their function remained unclear until 2005, when different labs deduced from in silico studies that CRISPR sequences were part of a bacterial adaptive immune system [ 27 – 29 ]. The academic studies that first performed CRISPR-based directed gene editing in vitro were famously published in 2012: First from the labs of Jennifer Doudna and Emmanuelle Charpentier [ 30 ] and shortly after in a paper of the Virginijus Šikšnys group [ 31 ]. These were rapidly followed by publications in February 2013 that performed genetic engineering in vivo in mammals, led by scientists Fang Zhang [ 32 ] and George Church [ 33 ]. Thus, the field matured from a basic science discovery into the ability to utilize CRISPR-associated proteins like Cas9 for genetic engineering, currently used by countless labs around the globe [ 34 ]. Doudna and Charpentier were awarded the 2020 Nobel Prize for Chemistry for their scientific contribution to genetic editing technologies, showcasing how the so-called CRISPR revolution played out over the past 20 years. Told as such, CRISPR’s history can seemingly be deduced through academic publications. However, the science itself does not tell the full scientific story.

In contrast to many other groundbreaking scientific discoveries which remain known only within scientific circles, gene editing has also been in the spotlight of much public debate. For example, many news outlets have dedicated reports to developments in the field and debated the ethical implications of “designer babies” [ 35 ]. Netflix has even broadcasted a documentary on CRISPR, underscoring its iconic status in popular culture. By now, CRISPR is not a purely scientific phenomenon. Wikipedia, a popular source read and compiled by the general public, strives to document these facets as well.

The CRISPR field’s brief history has been riddled with controversies, and legal battles over credit and CRISPR patents were all covered extensively in the media [ 36 ]. Most famously, Eric Lander’s perspective in Cell, the “Heroes of CRISPR” [ 37 ], was met with fierce criticism [ 38 ]. Critics claimed that the text offered a biased version of the field’s history that minimized the roles of some scientists as part of the patent war raging between academic institutions [ 39 ]—going as far as to label Lander the “villain” of CRISPR [ 40 ]. This controversy underscores how scientific outlets, even those famous for publishing novel scientific research, may not necessarily serve as reliable historical sources on contemporary science itself.

The encyclopedia’s text and sources can thus be viewed as an inclusive media, one that can potentially help track the interaction with additional fields and allow a better understanding of how scientific knowledge ramifies well outside the realm of academic publications.

CRISPR is a prime example of a scientific field that has undergone massive growth during Wikipedia’s lifespan. It is an ideal case study as its history is short (i.e., parallel to Wikipedia’s lifetime) and multi-faceted: a highly scientific topic with wide-ranging technological and social ramifications. These facets, we found, were documented on Wikipedia and its different articles, supported by scientific, public and popular sources alike. Together, our findings—based on an analysis of the CRISPR article and 50 others with related content—suggest that Wikipedia can indeed serve as a tool in the digital history of contemporary science. To that end, we put forward a methodology and provide automated tools utilizing Wikipedia’s data—its articles, their edit histories and their references. Our method relies on both quantitative and qualitative analyses that may help consolidate research into Wikipedia and help address the aforementioned conflict between data and content-dependent historical research.

Methods & results

Delineating the research scope.

Historical research requires a clear delineation of the field being studied, for instance gathering a collection of academic publications [ 5 ]. Similarly, the manner in which a scientific field is represented on Wikipedia requires clear delineation of scope and span—i.e., the articles that touch on it and the time frame being examined. While a single article can provide a rich source of textual and historical data, related articles may represent more nuanced facets of a field—like scientists’ biographies or related events and technologies. Identifying these requires sieving through Wikipedia’s massive body of articles—currently numbering well above 6 million in English alone.

For this aim, we propose a stepwise strategy for defining a research corpus about a certain topic. The first step utilizes Wikipedia’s free-text search function to find all articles that contain the topic being researched ( Fig 1A ). In the present study, searching for “CRISPR” yielded 720 Wikipedia articles containing that term, as of June 2022 ( Fig 1B ). Reading of these articles revealed that the majority made only minor or incidental use of CRISPR. Thus, to permit qualitative analyses on a more focused pool, we designed the second stage of the research funnel, which calls for retaining only those articles with the term in either their title or one of their sections, as these will likely contain a substantial amount of information directly related to the subject. To facilitate these steps of search and filtration we designed a tool to do this automatically for any given term of interest ( WikiCorpusBuilder ). Continuing with the term “CRISPR”, this filtering yielded 51 articles ( S1 Table ). Out of these, 10 had CRISPR in their title, and another 41 that only had it in the title of one of their sections.

thumbnail

  • PPT PowerPoint slide
  • PNG larger image
  • TIFF original image

A) Scheme of proposed research flow as supported by our tool: A free search of Wikipedia’s English-language articles is conducted to identify relevant articles; these are then filtered to include only those with the term in either their title or that of a section. Next, different analyses can be performed on the anchor article and corpus. Of the listed examples, in bold are the data provided by our tool, the rest are currently collected manually. B) Breakdown of flow scheme in the CRISPR case study, as of June 2022.

https://doi.org/10.1371/journal.pone.0290827.g001

Even among this list of corpus articles, a clear hierarchy arises between those which in subject, text and focus are fully aligned with the topic being researched (which we term the “anchor article/s”); and “auxiliary” articles, those that represent secondary aspects of the topic or instances in which it is embedded within other fields. The distinction is important as it allows us to focus the qualitative work described in the following sections. Here, the anchor article selected was”CRISPR”, which was determined semantically based on its title and content.

Within this CRISPR corpus, several auxiliary articles focused on scientific topics, for example the article for “CRISPR Activation”, “Cas9”, or “CRISPR gene editing”; while others had wider scientific topics, such as “Antibiotic”, “Gene knockdown”, and “Genome editing”. Also included were articles with broad topics, for example “Wheat” which had a section on CRISPR-edited strains of grain. Another group of articles were those dedicated to scientists, like the 2020 Nobel laureates Doudna and Charpentier, awarded for their groundbreaking work in the field; or Šikšnys, who also played a pivotal role in CRISPR’s history. Other science-adjacent articles touched on more social facets of CRISPR, e.g., “The CRISPR Journal” and “Designer baby”, showing how cultural aspects are also captured by this method. We therefore concluded that these articles provide a good sample of CRISPR related knowledge.

Another advantage Wikipedia provides is open access to these articles’ data, which we harness using our tool to further characterize the corpus. For example, “CRISPR” ranked amongst the top five articles in terms of size, number of references, and number of edits ( S1 Fig ).

Thus, we have composed a clearly defined research corpus regarding our key term of interest. The corpus can be depicted through the titles (qualitative) or the data (quantitative), and further mixed analyses can take place as will be demonstrated below.

Mixed method analyses for understanding historical growth of knowledge

After having established the research scope, created our corpus and identified the anchor article(s) within it, we move to the analysis phase. We deployed three different complementary analyses: (1) A qualitative reading of anchor article(s) text and structure, both its current and past versions; a quantitative analysis of the (2) anchor article and (3) of the corpus. All three are based on data and materials readily available on Wikipedia and many aspects of the quantitative analyses have been automated through our tool ( Fig 1 ). The following describes how these were deployed on CRISPR.

Mixed-methods research [ 41 ] combines quantitative and qualitative analyses and served as the basis for this research. We employed this in what can be termed Wikipedia-focused “thick big data” [ 42 ] (as opposed to content-agnostic big data approaches) which meshes the world of thick description and data analysis. In our case, Wikipedia articles, their edit histories and sources are treated as an initial dataset, which are then analyzed semantically as well as through quantitative methods and then interpreted in a detailed manner.

First, using Wikipedia’s “view history” tool, available at the top right of every article ( Fig 2A ), we can access the anchor article’s past versions and perform a comparative reading. Here, we used annual intervals to sample textual changes—at times narrowing the time frame to provide a more detailed account of the article’s historical textual growth.

thumbnail

A) An example of the top of a Wikipedia article, note the `View history`(frame added) tab that enables accessing older versions of the text. Snapshots from the Wikipedia archive of the CRISPR article: B) the full text of the article when it first opened on June 30th 2005, and C) extract of the lead section’s opening paragraphs, as of July 6th, 2022.

https://doi.org/10.1371/journal.pone.0290827.g002

Comparing the article’s past versions provided rich historical context: The article for CRISPR was created in June 2005, as what is termed a “stub”—a short entry in need of further elaboration ( Fig 2B ). This first version included but a single paragraph elucidating the CRISPR acronym and describing the genetic locus. At the time, there was no mention of its relation to bacterial immunity or gene editing, two points which would be integral to the field, and as a result be highlighted in the article’s lead text in future versions ( Fig 2C ).

Next, we augmented this form of textual comparison with structural analyses of the CRISPR article’s architecture, i.e., its table of contents, a basic feature of all Wikipedia’s articles. This “table of contents” or “section” analysis is done as a mixed method: Quantitatively, we measured the overall number of sections and subsections ( Fig 3A ); qualitatively, we reviewed their titles and documented the changes they underwent to provide insight into the content of the article, with the section titles serving as a proxy for new units of CRISPR-related knowledge ( Fig 3B and S2 Table ). Due to the semantic variation between sections, we prefer to gather this data and perform the comparison manually.

thumbnail

A) The number of sections and subsections in the CRISPR article since it opened in 2005. B) Titles of the article’s sections throughout 2010–2022, sampled biannually. Subsections and those listing sources were removed for clarity and can be found in S2 Table . Alignment and coloring were added manually to highlight sections repeating in consecutive revisions. C) Timeline of the number of the corpus’ articles opened each year since Wikipedia was launched (2001). The articles titles and DOB can be found in S1 Table . D) Changelog as of November 25th, 2013, documenting the section title change from “Possible applications” to “Applications” (Other changes that occurred as part of that edit were removed for visibility, and can be found in the archive ). All analyses shown occurred until June 2022.

https://doi.org/10.1371/journal.pone.0290827.g003

As we shall see, the section analysis is intertwined with shifts in the corpus. To understand the historical processes that took place in the CRISPR corpus, we can examine the articles based on their Date Of Birth (DOB), ( Fig 3C and S1 Table ). Even more so than the appearance of a new section—opening new articles on Wikipedia requires the topic at hand to have a certain level of “notability” [ 9 ], and we therefore considered the creation of a new article as an indication that a critical abundance of knowledge and editor interest has been reached. Here too, we combined a quantitative evaluation of the number of articles being created with a content-dependent reading of their titles ( Fig 3C and S1 Table ). Finally, a side-by-side view of these two timelines (sections and DOB) adds another layer of information, interpreted to provide a narrative to contextualize the findings, as described below.

Qualitative reading of the section titles showed that the structural changes were directly linked to shifts in the article’s content, pertaining to either the accumulation of new knowledge or the restructuring of the growing field’s representation on Wikipedia. For example, the first sections added in 2010 were “CRISPR Mechanism”, “CRISPR Spacer and Repeats,” “CAS Genes” and the reference section ( Fig 3B and S2 Table ). These sections pertain to CRISPR’s genetic makeup, and can be collectively referred to as the basic science behind CRISPR.

In 2011, a “Discovery of CRISPR” section was added to the article, which was later renamed “History”. The addition of an explicitly historical section in the article indicated a new phase in the scientific narrative it put forward, perhaps the result of a new self-consciousness or understanding that the emerging field was now old enough to have a history of its own. After a few months, a section termed “Evolutionary significance and possible applications” was created . For the next three years it included three proposed applications:

  • “Artificial immunization against phage by introduction of engineered CRISPR loci in industrially important bacteria , including those used in food production and large-scale fermentations .
  • Knockdown of endogenous genes by transformation with a plasmid which contains a CRISPR area with a spacer , which inhibits a target gene .
  • Discrimination of different bacterial strains by comparison of CRISPR spacer sequences ( spoligotyping )”

However, these would change in the following year. In April 2013 , a user called Genomeengineering made what would be their sole yet extremely significant contribution to Wikipedia: Adding the 2012 paper by Doudna and Charpentier, and the two 2013 publications by Zhang and Church. They also amended the list of possible applications so it now included “genome engineering at cellular or organismic level by reprogramming of a CRISPR-Cas system to achieve RNA-guided genome engineering”. In November of that year the section’s title changed from “Possible applications” to “Applications”. This serves as a prime example of how the article documented changes in the field as they took place, with Wikipedia’s native “View history” tool’s textual comparison function offering snapshots of the “revolution” ( Fig 3D ).

Alongside this section’s growth, which also saw the birth of the “Further reading” section, and a section dedicated to “External links” was expanded, providing access to new utilities developed for CRISPR researchers. For example, a link to a “comprehensive software” for CRISPR guideRNA design was added as well as a link to a tool “for finding CRISPR targets.”

At the corpus level, this period also saw a spurt in article creation, with a number of CRISPR-related articles being created, like “CRISPR interference”. At this time, more articles directly based on or linked to CRISPR science and its applications were also created. For example, articles like “Genome editing” (2012) and “Cas9” (2013). It is also during this phase that the articles for scientists linked to its discovery were opened: an article about Doudna was created in 2012, coinciding with the publication of her landmark Science paper [ 30 ]. Soon thereafter, articles were created for “Epigenome editing” (2014) and “CRISPR/Cas tools” (2015). Thus, qualitatively, this period can be seen as covering the emergence and establishment of the applicative side of CRISPR.

On March 31, 2014, a few weeks after Doudna and Charpentier applied for a patent for their work, a “Patents” section was opened . In 2016, the section dealing with patents was expanded to include a “Patent and commercialization” subsection that detailed a list of patent holders that at the time were fighting in the courts over legal ownership and in academic media over credit ( S3 Table ). At the corpus level, we observed the creation of articles for Charpentier (2015) and Šikšnys (2016), in tandem to the credit and patent wars raging over their respective discoveries.

In February 2019, with the patent wars reaching their resolution in the courts, the section (then four paragraphs long) was completely removed from the article . However, it was not deleted, but rather migrated to a new article called “CRISPR gene editing,” opened that month in a big text-migration out of the anchor article. Also migrated was the section “Society and culture”, which described the ability to conduct human gene editing in terms of the wider social debate about it and the policy changes it sparked, alongside a subsection on “Recognition” that attempted to attribute the CRISPR discovery to specific persons. The migration of key sections into “CRISPR gene editing” is evident in the drop in the number of sections in 2019, alongside the uptick in the number of articles in the corpus like “genome-wide CRISPR-cas9 knockout screens”, “the CRISPR Journal” and “LEAPER gene editing” ( Fig 3 ).

This later phase also continued to document the growth of the biotech industry based on CRISPR, for example CRISPR Therapeutics, a company co-founded by Charpentier, received an article in 2021, further highlighting the field’s maturation and growth in technology.

Tellingly, 2020 also saw the creation of a “Pandemic prevention” article, which, in tandem with the COVID-19 pandemic, detailed all the medical and scientific attempts to preempt viral outbreaks—including those that could potentially make use of CRISPR. Articles like these raise an interesting question regarding the role of CRISPR in other bodies of knowledge and warrant an examination of the wider corpus.

Cross-pollination: CRISPR as a body of knowledge

Our analyses thus far shows that knowledge on Wikipedia is rarely confined to a single article, but is rather stored in groups of articles that are constantly changing and cross-pollinate one another. On Wikipedia, this process can take on two distinct forms: new articles opening about the topic that directly address it, or existing articles changing to include new text, references or sections dedicated to the scientific topic’s intersection with other bodies of knowledge. Tracking the migration between articles can illuminate how knowledge diffuses.

To better understand the temporal aspect of CRISPR’s representation across articles on Wikipedia we next compared the DOB of the different articles in our CRISPR corpus and the date the term CRISPR was first mentioned in them.

Of the 51 articles in the CRISPR corpus, 26 already had the term “CRISPR” in their first version ( Fig 4A ). Among these were the articles for researchers like Charpentier, Šikšnys and Francis Mojica. This group also included articles for scientific topics discovered in later stages of the CRISPR field’s growth, like “Cas12”, and articles reflecting CRISPR in culture, like the aforementioned academic journal. With few exceptions, like “CRISPR” and “CRISPR interference”, opened in 2005 and 2010, respectively, articles that were created with CRISPR already mentioned in their first version were mostly opened post-2014 ( Fig 4B ).

thumbnail

Comparing an article’s creation date and CRISPR’s first mentions. A) An article’s date of birth (DOB, blue) compared to the year of it first mentioned “CRISPR” (red), sorted by the former. B) The relation between the DOB and the time it took for the first mention of CRISPR of each article. Displayed is a linear trendline and R 2 .

https://doi.org/10.1371/journal.pone.0290827.g004

The 24 articles that lacked “CRISPR” in their inception provide insight into the growth of the field over time. Importantly, this analysis shows how many concepts now associated with CRISPR did actually exist prior to its discovery or its application in gene editing was known. Prime examples are “Gene knockout” and “Gene knockdown”, which in fact predate CRISPR. However, as we saw, in a later stage their content was recast to take CRISPR into account and the articles were retroactively affiliated with the CRISPR field (in 2017 and 2013, respectively). Similarly, “Genome editing” was opened in 2012 but mentioned CRISPR only in 2014. The article “Designer baby” opened in 2005 and depicted what was initially only a theoretical issue used in “popular scientific and bioethics literature.” However, this changed with CRISPR’s rise to prominence and since 2018 it directly referenced CRISPR, with a lengthy debate in wake of the “He Jiankui affair”, in which the Chinese scientist created in 2018 the world’s first so-called CRISPR babies in a widely reported incident.

We could also observe CRISPR’s interface with other scientific fields through articles related to wider topics. For example, the two oldest articles in the corpus, “Wheat” and “Antibiotic”, were opened in 2001, and were late to adopt “CRISPR” some twenty years later.

In sum, this analysis revealed a clear divide between articles that mentioned CRISPR from the onset and those that incorporated the term only in later stages: In general, this analysis underscores how CRISPR ramified across Wikipedia not just in the form of new articles, but also recasting older ones.

From lab to public: Wikipedic bibliometrics map the diffusion of knowledge over time

All claims on Wikipedia need to be attributed to a verifiable source [ 21 ]. For our purposes, these references constitute a source rich with text, information and data for additional analyses: combining quantitative bibliometric analyses like citation count, with content-dependent evaluation of the actual sources, to better understand the types of references supporting the “anchor” article. Quantitatively, we have previously developed two bibliometric analyses for Wikipedia articles—the “SciScore”, which gauges the ratio of academic to non-academic sources (ranges 0–1) [ 12 ], and the “Latency”, which gauges the duration between an academic paper’s publication and when it was referenced in a Wikipedia article [ 14 ].

Our automated tool scrapes only the reference list of each article in the corpus, which is then further parsed to identify and characterize its different sources: “.org”, “.com” and those containing DOIs/PMIDs/PMCs (i.e., scientific papers). Thus, we can assign a SciScore at both the corpus level and that of an individual article.

We found that the CRISPR anchor article was supported by 208 external sources in its “References” and “Further reading” sections ( Fig 5A ). The article’s SciScore was 0.92, ranking 13/51 in the corpus ( Fig 5B and S2A Fig ). The top cited journal was Science (23 papers), followed by Nature and Cell (14 each), ( S2B and S2C Fig ). These results are consistent with previous analyses of Wikipedia articles focused on scientific topics that show that these make use of peer reviewed, high-impact factor academic publications [ 8 , 23 ].

thumbnail

A) The number of references in the CRISPR article’s reference section since it opened until December 2021. B) “CRISPR”s SciScore (shown until December 2021). C) The article’s references latency distribution (i.e., duration between a scientific paper’s publication and its integration into Wikipedia). D) A timeline comparing the date of selected publications (black frames, left) to their citation in the CRISPR article (blue frames, right). E) A snapshot comparing two versions of the CRISPR article from May 2007, showing how changes to the wording of the text were linked to the citation of Barrangou et al., 2007.

https://doi.org/10.1371/journal.pone.0290827.g005

To attain a historical perspective, we next analyzed the temporal aspect of the above discussed bibliometric parameters, which were compared and contextualized to the changes in sections ( Fig 3A ). We found that these metrics, and overlapping trends between them, served as markers for important events in the history of the field. A prime example of this can be seen in the aforementioned “Patents” section: on March 6, 2014 Doudna’s and Charpentier’s patent application was published online and a few weeks later the “ Patents ” section was opened in the CRISPR article ( S3 Table ). It cited the US Patent Office website. By 2015, after the Broad Institute was awarded its own patent and the appeal against it was filed by the universities representing Doudna and Charpentier, the article’s text changed to indicate that, “As of December 2014, patent rights to CRISPR were still developing.” The text also noted that there was “a bitter fight over the patents for CRISPR”, a claim supported by this new type of citation which grew increasingly present in the CRISPR article: non-academic sources, in the form of both news articles about the legal cases and even the patents themselves. For example, the claim about the “bitter” legal battle was sourced to a story in MIT Technology Review, a popular science news site, while also referring directly to specific patents and or formal application documents made public online. Overall, the section included a laundry list of patent holders and claimants with a hodgepodge of popular and legal sources as citations. Throughout its entire existence, all the sources in this section were non-academic.

The fact that non-academic sources were deployed in the article to support non-academic aspects of the CRISPR history shows how these types of sources can document non-scientific ramifications of scientific developments. However, the entrance of non-academic sources was not limited to patent debates and also touched on CRISPR’s growing social prominence. For example, the 2015 selection of CRISPR as “Breakthrough of the year” [ 43 ] was supported by links to popular media sources. Together with the patent links, these non-academic sources led to a decrease in the article’s SciScore during this phase ( Fig 5B ).

Collectively, these highlight how bibliometric shifts are reflective of substantive changes in the article’s texts, which in turn are reflective of real-world developments in the field, both in terms of the science and of the social debates it inspires.

We next conducted bibliometric analysis on the entire corpus. We found a number of articles with high SciScores (like “CRISPR interference” or “Cas9”) alongside those with low percentage of academic sources, like that for Mojica or the concept of designer babies ( S2A Fig ). This indicates a correlation between the scientificness of an article’s topic and its SciScore, with biographical articles for scientists for example, usually ranking lower than those on scientific concepts.

The “CRISPR” article ranked among the top ten in terms of scientificness. To further gauge its current score with the state of the available research, we determined the latency of all the article’s references. This analysis revealed a distribution varying between a single day to over 30 years, with a median latency of 1.7 years ( Fig 5C ). This bibliometric data can be contextualized through the example of the integration dynamics of publications relating CRISPR to bacterial immunity ( Fig 5D ). Rodolphe Barrangou was the R&D director of genomics at DuPont chemicals manufacturer, who was first to have harnessed CRISPRs to provide immunity for their industrial bacterial strains. The resulting study was published in 2007, and was integrated into Wikipedia that year, a mere two months after going online. In this edit the text changed from “it is proposed that these spacers … protect the cell from infection” to “it was proposed, and more recently demonstrated , that these [can…] help protect the cell from infection” (bold added), ( Fig 5E ).

Only after this experimental demonstration were three landmark, yet in silico , papers from 2005 added to the article. These three studies which computationally supported the bacterial immune system hypotheses were added with a relatively large latency: Pourcel et al., 2005 was added two years after its publication, while Mojica et al., and Bolotin et al., were added only in 2011—six years after publication. By this time, the text and the early references, as well as CRISPR’s function in bacterial immunity, now backed by experimental evidence, were all inserted into the article’s lead section, too.

In sum, these quantitative shifts in bibliometrics, we found, were the result of textual changes in the article. This links together our different forms of analyses: the bibliometrics are linked to the historical shifts in the text which together reflected changes in the scientific field itself.

Quantitative comparison between fields on Wikipedia

We next aimed to examine whether the aforementioned methodology can provide insight into other scientific fields on Wikipedia. Therefore, we deployed our automated tool on two additional terms, “Circadian” (as in circadian clocks) and “Coronavirus”, which we have studied in different manners in earlier works [ 12 , 14 ] and thus serve as control groups to some degree. We hence created three corpuses side by side, at roughly the same time—June/July 2022, and demonstrated how some of the quantitative analyses described above can be utilized to create comparable yet distinct findings regarding different fields.

As we observed for the CRISPR field, a substantial number of articles can be easily identified and selected to be part of a research corpus—with 51, 138, and 306 articles for “CRISPR”, “Circadian”, and “Coronavirus”, respectively ( Fig 6 , S4 and S5 Tables). While varying in size, all corpuses are within a range that allows for reading and examination of their titles. Such examination validated that they indeed provide a diverse assortment of articles of different types that are relevant to each field—for example, articles for scientists alongside those for scientific terms or events. For example, the corpus for “Circadian” yielded the articles “Circadian rhythms” and “Sleep”, and the corpus for “Coronavirus” yielded articles both about the pandemic like “COVID-19 pandemic in Japan” and more generally for “Virus”.

thumbnail

Corpuses were generated and quantitative metrics automatically collected in June-July 2022, for the terms “CRISPR”, “Circadian” and “Coronavirus”. The following data are presented: A) the number of articles opened each year, B) the top 10 most cited journals, C) the top 10 most cited.org websites, D) the top 10 most cited references altogether, E) SciScore distribution, along with the total (sum of all references in all articles) and median scores of the articles’ distribution.

https://doi.org/10.1371/journal.pone.0290827.g006

After an initial corpus creation, the first automated analysis generates a timeline based on each articles’ DOB. A side-by-side view of all three corpus timelines ( Fig 6A ) illustrates how different fields display different modes of growth. For example, the “Coronavirus” timeline reveals a clear divide between scientific articles like “Pandemic” (2001) and “Spike protein” (2006), created early on in Wikipedia’s history, and post-pandemic articles like “Wuhan Institute of Virology” (2020). This timeline clearly shows how, with the outbreak of the pandemic, articles about the virus ballooned, but also how these were supported by a network of preexisting articles [ 12 ]. Meanwhile, the “Circadian” timeline exhibits a seemingly random distribution of article creation, with anchor articles (“Circadian Clock” and “Circadian Rhythms”), and auxiliary articles opening regularly over time. Some DOBs appear to tell a compelling scientific story—e.g., Paul Hardin, first author of the landmark paper highlighted in the 2017 Nobel declaration [ 44 ], received an article in 2017—but these seem anecdotal. Interestingly, the biannual peaks are likely a product of American chronobiologist Eric Herzog’s university course [ 45 ], which has students contribute to articles of their choice linked to the field. This DOB pattern or lack thereof can be explained by the fact that unlike the timeliness of CRISPR or coronavirus, circadian clocks is a more mature field. As such, its growth, as our previous work has shown, is reflected in a more subtle manner on Wikipedia, with a paradigmatic shift in the field being documented in minute nuanced textual detail [ 14 ]. Broadly, this suggests that article creation time is perhaps more applicable for contemporary and what can be termed “active” or even “emerging” fields.

One similarity between all three timelines is an increase in article creation centered around 2005–7, a period which has been shown to have held a massive surge in article creation in Wikipedia in general [ 46 ].

Our tool also supports automated scraping of bibliometric data. This analysis showed that the top ten journal references in all three corpuses were dominated with high impact-factor academic peer-reviewed publications ( Fig 6B ). Alongside prestigious scientific publications like Nature or PNAS, we can observe how each corpus refers to field-specific publications: For example, the Journal of Biological Rhythms in the circadian list, Nature Biotechnology for CRISPR, or The Journal of Virology for coronavirus.

Non-academic references (i.e., websites) were also quite field-specific. As researchers from both the circadian clocks and CRISPR fields were awarded a Nobel Prize, the website for the prestigious award was among the most cited in the respective corpuses ( Fig 6C ). In addition, the Sleep Foundation website was highly cited in the circadian corpus while three genome focused websites were highly cited in the CRISPR corpus. The International Committee on Taxonomy of Viruses (ICTV), which appears in Wikipedia articles for different variants, was among the top 10.org sites cited in the coronavirus corpus.

In general, we observed that the CRISPR and circadian corpuses relied more on scientific literature, while “coronavirus” referenced mostly.com sources ( Fig 6D ), which is also reflected in the different corpuses’ SciScore ( Fig 6E ). It appears the more prominent a scientific field is societally, the lower its SciScore: for example, the non-scientifically focused CRISPR-corpus article about designer babies which had a relatively low score, as did the circadian-corpus article of “Start school later movement.” Meanwhile, the more clearly scientifically focused articles “Surveyor nuclease assay” and “CSNK1D” had high scores. The patterns of SciScore distribution show how different fields manifest differently and that comparing them can shed light, for example, on how much public, as opposed to purely scientific interest, a field has online.

In summary, these analyses show how the same research tools and methods yield very different results for different research fields, all of which can facilitate the initial steps needed towards the creation of future case studies into how scientific knowledge is represented on Wikipedia over time.

Here, we delved into Wikipedia’s archives to examine the way a prominent scientific field, CRISPR, was represented from the site’s launch in January 2001 until 2022. By reviewing the CRISPR article’s history, we saw that the article started off describing the “basic science” behind CRISPR, and was updated in the wake of the publication of canonical works in the field. Over time, the article grew, and with the emergence of gene editing technology it forked off into a number of affiliated articles with a more narrow focus, while the original CRISPR article offered a consolidated overview of the scientific narrative of CRISPR in bacterial systems. The article’s text and its different citations served as a rich record of the growth of academic knowledge, the legal battles CRISPR sparked and the academic credit wars over what the journal Science called the “CRISPR Craze” [ 47 ], as well as the popular interest in the field.

This case study allowed us to flash out some essential metrics which can be used to conduct similar research, and we thus propose a method that can be deployed in the service of researching the history of contemporary science on other topics using Wikipedia. Automated tools were developed and are openly supplied to support this research permit work on additional topics, though combining these with manual and semantic work are key to contextualizing findings and interpreting them to provide substantial historical insight.

Using Wikipedia for the history of science

Our findings join a small yet growing body of research dedicated to using Wikipedia for historical purposes. Previously, we analyzed the growth of two Wikipedia articles dedicated to the circadian clock field through their edit histories (“Circadian clocks” and “Circadian rhythms”), using them to ask whether the article’s text reflected changes taking place in understanding how biological clocks work [ 14 ]. Within that more focused case-study we observed the importance of following the academic references, and developed the Latency metric. Meanwhile, our study on COVID-19 used large-scale quantitative bibliometrics to understand how the pandemic affected large swathes of articles during its “first wave”, putting forward metrics such as the SciScore to qualify hundreds of articles based on their reference list [ 12 ]. Together, these underscore the key role academic sources play on Wikipedia and serve as a wider proof-of-concept for the quantitative and qualitative underpinnings of this present study.

Wyatt suggested in a theoretical paper that Wikipedia could be used as a primary source in historical research [ 48 ]. From the edit history of articles, to metadata for traffic and even talk pages, he envisaged treating the open-source encyclopedia as an “endless palimpsest”. This is an idea that has also previously (2010) been expressed as an artwork: “The Iraq War: A Historiography of Wikipedia Changelogs” by artist James Bridle was 12-volume a book comprising all the versions of the article dedicated to the war in Iraq, with the online edit wars serving as a proxy for the real-world conflict. The aforementioned study on the Egyptian protest movement attributed historical significance to the addition of the word “revolution” to Wikipedia articles’ titles, taken to be reflective of the real revolution playing out in the streets [ 15 ]. This is a captivating demonstration showing the value of attributing historical significance to semantic shifts in Wikipedia articles, in line with our usage of sections and titles.

From the perspective of digital humanities and big data, an algorithmic approach was previously deployed to mine the text of tens of thousands of Wikipedia articles to try to map the history of knowledge since the dawn of human history, using network science and semantic analysis to “put the ideas of Kuhn to the test”. The study, currently a preprint [ 49 ], makes interesting findings, while highlighting the lack of a unification in methods in current Wikipedia-based historical research. To our knowledge, no academic demonstration nor a clear method has previously been put forward as to how researchers can actually use Wikipedia to utilize its historiographic potential to serve as this “endless palimpsest”.

Numerous studies have examined Wikipedia and bibliometrics [ 23 ], even those that focus on science [ 8 , 50 ]; but none that clearly link scientometrics to historical methods [ 17 ]. Others from the more humanistic side of academia have worked to connect the digital arena to contemporary fields like discourse analysis, based on the works of Michele Foucault [ 51 ]. However, these too are all theoretical works and as of yet no programmatic paper has outlined how Wikipedia can be actually used for historical research - especially not in the interest of following shifts in contemporary science.

Mapping out additional fields through our suggested methods can eventually support theories and models of scientific growth in a resolution never before possible. An initial method for selecting such future case studies could be to focus on the topics selected by Science and others as “Breakthrough of the Year”—these and their relevant Wikipedia articles are documented in a special list on Wikipedia [ 52 ] that could serve as the origin of many corpuses. Scientific developments that have garnered public interest over the past two decades, from the human genome project to Alpha Fold, could also serve as lucrative case studies, each providing a unique and rich dataset of text and information that could then be compared.

The advantages of Wikipedia

Wikipedia easily lends itself to research of this type. A digital and open website that is easily searchable, it also allows open use of its API for more complex queries and even provides a full dump of the entirety of Wikipedia in each language, including articles’ full edit history.

Importantly, English Wikipedia’s fixed article structure and uniform style allows comparable historical work across different fields, primarily since all articles are structured in a similar way: a lead text, table of contents, sections and then a reference list. This feature, in combination with the convenience of the “View history” function, facilitates in-depth analysis of the same line or section over time (for example, as was done here for “Patents”) in a manner difficult to imagine for comparing texts of academic literature. Moreover, cross-analyses of different subjects can yield results comparable through standardized metrics, like the DOB timelines, and the Latency or SciScore. The structural similarity creates a sort of internal control that lays the groundwork for a rigid research system that can be utilized by others and applied to additional fields.

Past versions that did not survive Wikipedia’s mob review process or that included facts that were considered true at the time but have since been rendered obsolete prove especially interesting from the perspective of the history of science. For example, with CRISPR, a December 2005 version of the article described Cas1 as the “most important” of the Cas genes, and one that is “present in almost every CRISPR/Cas system.” This was more cautiously reworded in July 2010 so that, “The most important of the Cas proteins appears to be Cas1, which is ubiquitous” in CRISPR systems. In March 2011, Cas1 ’s ubiquity was no longer said to be linked to its importance, and for the past decade the article has made due with noting in a subsection dedicated to CRISPR locus that “[m]ost CRISPR-Cas systems have a Cas1 protein.” These changes were the result of new knowledge forcing a reevaluation of the preexisting scientific narrative regarding CRISPR: Cas1 was not falsified per se, rather its importance in CRISPR’s story was reassessed. Another example from the CRISPR article can be seen in the shift in section title from “Potential Applications” to “Applications” regarding gene editing ( Fig 3D ). These are examples of what can be termed “negative” knowledge—knowledge whose relevance was negated by new “positive” discoveries that outweighed it in significance. However, as such, its degradation of scientific status in CRISPR’s narrative has much value from the historical perspective. Wikipedia, we suggest, is an inclusive media that documents both positive and negative knowledge—the accumulation and the rejection of scientific facts through its edit history.

Moreover, unlike social media websites that collect user data for financial reasons, posing a privacy threat and creating ethical dilemmas for researchers, Wikipedia collects no such information as it has no such business model. This makes it not only attractive to volunteers willing to donate hours to writing and editing the site, but also makes Wikipedia and its data ideal material for social research. Wikipedia’s texts are not single-handedly written and are edited collectively in a form of what is termed peer-production [ 53 ]. Though this system is not without its flaws, in the context of the contemporary history of science it proves a valuable resource: documenting the consensus regarding certain facts and fields’ growth in real-time and in potentially minute details.

Additional advantages that Wikipedia offers in respect to bibliometrics are numerous and deserve their own section.

Wikipedic bibliometrics

Various studies analyzing Wikipedia’s references, even those that focus on science [ 8 , 23 , 50 ]; exemplify the use of Wikipedia for bibliometric research, and to a degree support the view that Wikipedia is much more inclusive than academic publications, making use of non-academic sources usually excluded from scientific papers. Here we implicitly study this using our SciScore, and contextualize its trends through the historical thick description. On “CRISPR”, for example, legal sources or popular media were added to support the “patent war”, which was also expressed in a drop in the article’s SciScore. The expansion and then contraction of the “Patents” section ( S3 Table ), in tandem to the patent wars and their resolution in the courts, show how this historical inclusivity touches to both the text and to the sources.

The SciScore reveals a different historical perspective when comparing the CRISPR and Coronavirus corpuses. We previously discovered a decrease in the SciScore as the pandemic grew to public prominence and more articles about it were opened [ 12 ]. This was because many of the new articles opened post-pandemic depicted its social ramifications and outcomes, while the pre-pandemic articles focused on the science behind the virus. In the CRISPR anchor article, the SciScore revealed a completely different process: As CRISPR began as a purely scientific discovery, the decrease in SciScore (~2013–2018, Fig 5A ) was found to be the result of its growing public prominence outside scientific circles and the appearance of the first non-academic sources about the looming “CRISPR Craze” [ 47 ], followed by the much-publicized patent and credit wars, and finally the wider social, ethical and policy debates it sparked—backed by popular yet respectable sources.

Our latency analyses revealed that CRISPR, a nascent field, was making use of extremely up-to-date papers, in some cases references were added within days of their publication. Meanwhile, the circadian clock article had a median latency of five years [ 14 ]. This coincides with the respective histories of the fields: CRISPR is a high profile and emerging field, with advances being mirror almost instantaneously on Wikipedia. On the other hand, clocks, which is a more mature field that has been around for decades, was found to be based on contemporary but also older research which predated Wikipedia. Meanwhile, Coronavirus had a major 17-year peak in latency, exactly in line with the SARS pandemic of 2003; showing how research from a preceding viral pandemic provided the backbone of the sourcing for the 2020 pandemic [ 12 ]. Together these show how the character of each field is reflected in its bibliometrics.

One hypothesis regarding the potential of the SciScore and Latency is that this dynamic may also be taking place in other articles that began as purely scientific but are increasingly taking on social significance. Tracking articles that have short latencies and high SciScore which then begin to decrease could serve as a method for identifying new fields only now starting to make waves in terms of public interest. In light of emerging attempts to harness Wikipedia for trend detection [ 54 , 55 ], this idea remains to be examined as more case studies will be created in the future.

Using Wikipedia bibliometrics also has value from the scientometric perspective. Measuring the impact of scientific research is a well-established field that has in recent years expanded the metrics it works with—no longer just impact factor and citation counting, but also more inclusive metrics like altmetrics. In this sense, Wikipedia, too, can prove a valuable addition in the form of alternative metrics. Asking which papers are cited on Wikipedia and in which context, may provide insight into what parts of academic research are actually reaching the public [ 7 , 56 , 57 ]. As such, our tools can join and enrich existing studies on the history of contemporary science, augmenting work in the field of bibliometrics or even altmetrics, with Wikipedia.

Limitations

For all its benefits, this method also has its shortcomings. To begin with, corpus delineation can exclude possibly valuable articles—for example, the article for George Church was absent from our corpus despite his seemingly important role in the history of CRISPR. Being a prominent scientist with a broad scope, his Wikipedia article is devoid of the term in any section title, and succinctly mentions his contribution to the CRISPR field under a section titled “Synthetic biology and genome engineering” that despite its topic does not use the key term itself.

From a scientometric perspective, Wikipedia also poses some unique problems: Unlike bibliometric datasets created especially for such purposes, Wikipedia’s footnotes are not all properly formatted and issues with their templates exist that make scrapping them consistently hard [ 58 ], especially with older articles. Initially, all footnotes on Wikipedia were added manually by editors working directly in wiki-code, the HTML markup language the website uses. Over time, bots and tools were put into place to help this menial task and unify footnotes formatting; in some cases, older articles with older footnotes that did not benefit from this unified new formatting will not be scrapped properly if one uses only Wikipedia’s native bibliometric data. To overcome this issue in the present study, we scraped the references from the articles as simple text, regardless of how they were formatted by Wikipedia’s volunteer editors. This list of references was then analyzed in search of DOIs/PMIDs/PMCs which were taken as a proxy for academic publications. Nonetheless, other issues exist, for example duplicate DOIs or DOIs included in article’s texts and not just as footnotes. A manual validation of our method in random articles revealed this approach had a margin of error that was lower than 5 percent.

Moreover, our method also does not yet address all of Wikipedia’s content: Firstly, we only examined English Wikipedia. While it is the largest Wikipedia, and most if not all scientific papers are published in English, language asymmetry has been previously reported across different topics [ 59 , 60 ]. This is but one of many biases Wikipedia suffers from and including other language editions in future work may reveal different perspectives and richer narratives that are absent from our methods and findings. Even within the English Wikipedia, the talk page, a key arena that is rich in textual data, was not systematically included in this study, though debates about the patent war were found, and these included discussions of which type of sources (legal as opposed to scientific) should be cited on the article in this context.

Another yet untapped facet of Wikipedia touches on visual elements. Wikipedia’s sister project, WikiCommons, supports multimedia, usually in the form of copyright-free images, and in this respect we also saw a growth: The first infographic explaining the CRISPR system was introduced to the article in 2009 and the file itself was updated in 2010 to show a more complex understanding of the “CRISPR prokaryotic antiviral defense mechanism”, supported by a then-newly published review article [ 61 ]. Over time, additional more complex images were added to the article, for example those showing how CRISPR interference could be used for gene editing ( S3 Fig ). This multimedia aspect can serve in the future as another vector for like-minded research, for example by focusing on how infographics and scientific illustrations document growth of scientific knowledge overtime in visual terms.

We hope our proposed method will encourage use of Wikipedia’s ever-changing text as a rich historical source to augment existing work being done in the history of science and contribute to our understanding of the growth of scientific knowledge and its transference to the general public.

Supporting information

S1 fig. the crispr corpus in numbers..

The articles included in the corpus, sorted by number of references, size in kilobytes (kB) and number of edits. “CRISPR”, highlighted, was among the top 5 articles of each category.

https://doi.org/10.1371/journal.pone.0290827.s001

S2 Fig. CRISPR article’s references.

A) The corpus’ articles SciScore distribution. B) Peer-reviewed journals cited as references in the article as of June 2022, sorted by the number of references per publication. C) A list of the top cited journals (from B) with ≥5 appearances.

https://doi.org/10.1371/journal.pone.0290827.s002

S3 Fig. Illustrations of the CRISPR model.

Shown are a selection of screen grabs from the CRISPR article, reflecting the evolution of Wikicommons graphics of CRISPR’s mechanism of action and key players. These are of different versions of the same illustration (A and B) and of a third illustration added later to the article.

https://doi.org/10.1371/journal.pone.0290827.s003

S1 Table. The CRISPR corpus 2022.

The output of the automated tool for the term "CRISPR", as of 2022-06-27.

https://doi.org/10.1371/journal.pone.0290827.s004

S2 Table. CRISPR sections.

The "CRISPR" article’s table of content was examined at the indicated dates. The number of sections/subsections were counted and appear at the top of each column.

https://doi.org/10.1371/journal.pone.0290827.s005

S3 Table. Patents section history.

The "CRISPR" article’s "Patent" section is displayed for the indicated dates. Separation into different rows in the table were manually done for visibility.

https://doi.org/10.1371/journal.pone.0290827.s006

S4 Table. The circadian corpus 2022.

The output of the automated tool for the term "circadian", as of 2022-07-14.

https://doi.org/10.1371/journal.pone.0290827.s007

S5 Table. The coronavirus corpus 022.

The output of the automated tool for the term "coronavirus", as of 2022-07-14.

https://doi.org/10.1371/journal.pone.0290827.s008

Acknowledgments

We want to thank Dusan Misevic, Bastian Greshake Tzovaras, Marc Santolini, Mad Price Ball, Alex Webb, Gal Manella and all those who provided feedback.

Code accessibility

Our code for the corpus builder can be found at: https://github.com/RonaTheBrave/WikiCorpusBuilder .

  • View Article
  • PubMed/NCBI
  • Google Scholar
  • 3. Maree DJF. The Methodological Division: Quantitative and Qualitative Methods. Realism Psychol. Sci., Cham: Springer International Publishing; 2020, p. 13–42. https://doi.org/10.1007/978-3-030-45143-1_2 .
  • 9. Wikipedia contributors. Notability in the English Wikipedia—Wikipedia, The Free Encyclopedia 2022.
  • 10. Reagle J, Koerner J, editors. Wikipedia @ 20: Stories of an Incomplete Revolution. The MIT Press; 2020. https://doi.org/10.7551/mitpress/12366.001.0001 .
  • 15. Ford H. Writing the Revolution Wikipedia and the Survival of Facts in the Digital Age. MIT Press; 2022.
  • 16. Darnton R. The great cat massacre and other episodes in French cultural history. New York : Vintage Books; 1985.
  • 17. Price DJDS. Little Science, Big Science. Columbia University Press; 1963. https://doi.org/10.7312/pric91844 .
  • 21. Wikipedia:Verifiability. Wikipedia 2022.
  • 39. “Heroes of CRISPR” Disputed. Sci Mag n.d. https://www.the-scientist.com/news-opinion/heroes-of-crispr-disputed-34188 (accessed October 20, 2022).
  • 40. The Villain of CRISPR n.d. https://www.michaeleisen.org/blog/?p=1825 (accessed October 20, 2022).
  • 41. Gold MK, Klein LF, editors. Debates in the Digital Humanities 2019. University of Minnesota Press; 2019. https://doi.org/10.5749/j.ctvg251hk .
  • 42. Jemielniak D. Thick big data: doing digital social sciences. New product. New York: Oxford University Press; 2020.
  • 43. And Science’s 2015 Breakthrough of the Year is… | Science | AAAS n.d. https://www.science.org/content/article/and-science-s-2015-breakthrough-year (accessed October 20, 2022).
  • 44. The Nobel Prize in Physiology or Medicine 2017. NobelPrizeOrg n.d. https://www.nobelprize.org/prizes/medicine/2017/press-release/ (accessed October 20, 2022).
  • 46. Suh B, Convertino G, Chi EH, Pirolli P. The singularity is not near: slowing growth of Wikipedia. Proc. 5th Int. Symp. Wikis Open Collab.—WikiSym 09, Orlando, Florida: ACM Press; 2009, p. 1. https://doi.org/10.1145/1641309.1641322 .
  • 52. Breakthrough of the Year. Wikipedia 2022.
  • 53. Benkler Y. The Wealth of Networks: How Social Production Transforms Markets and Freedom. Yale University Press; 2006.
  • 59. Roy D, Bhatia S, Jain P. A Topic-Aligned Multilingual Corpus of Wikipedia Articles for Studying Information Asymmetry in Low Resource Languages. Proc. Twelfth Lang. Resour. Eval. Conf., Marseille, France: European Language Resources Association; 2020, p. 2373–80.
  • 60. Lewoniewski W, Węcel K, Abramowicz W. Quality and Importance of Wikipedia Articles in Different Languages. In: Dregvaite G, Damasevicius R, editors. Inf. Softw. Technol., vol. 639, Cham: Springer International Publishing; 2016, p. 613–24. https://doi.org/10.1007/978-3-319-46254-7_50 .

From Wikipedia, the free encyclopedia

A case study is one of several ways of doing research whether it is social science related or even socially related. It is an intensive study of a single group, incident, or community. [ 1 ] Other ways include experiments , surveys , multiple histories , and analysis of archival information [ 2 ] .

Rather than using samples and following a rigid protocol to examine limited number of variables, case study methods involve an in-depth, longitudinal examination of a single instance or event: a case . They provide a systematic way of looking at events, collecting data , analyzing information , and reporting the results. As a result the researcher may gain a sharpened understanding of why the instance happened as it did, and what might become important to look at more extensively in future research. Case studies lend themselves to both generating and testing hypotheses [ 3 ] .

Another suggestion is that case study should be defined as a research strategy , an empirical inquiry that investigates a phenomenon within its real-life context. Case study research means single and multiple case studies, can include quantitative evidence, relies on multiple sources of evidence and benefits from the prior development of theoretical propositions. Case studies should not be confused with qualitative research and they can be based on any mix of quantitative and qualitative evidence. Single-subject research provides the statistical framework for making inferences from quantitative case-study data. [ 2 ] [ 4 ] This is also supported and well-formulated in (Lamnek, 2005): "The case study is a research approach, situated between concrete data taking techniques and methodologic paradigms."

[ edit ] Case selection

When selecting a case for a case study, researchers often use information-oriented sampling, as opposed to random sampling [ 3 ] . This is because the typical or average case is often not the richest in information. Extreme or atypical cases reveal more information because they activate more basic mechanisms and more actors in the situation studied. In addition, from both an understanding-oriented and an action-oriented perspective, it is often more important to clarify the deeper causes behind a given problem and its consequences than to describe the symptoms of the problem and how frequently they occur. Random samples emphasizing representativeness will seldom be able to produce this kind of insight; it is more appropriate to select some few cases chosen for their validity.

Three types of information-oriented cases may be distinguished:

  • Extreme or deviant cases
  • Critical cases
  • Paradigmatic cases.

[ edit ] Extreme case

The extreme case can be well-suited for getting a point across in an especially dramatic way, which often occurs for well-known case studies such as in Freud’s ` Wolf-Man .’

[ edit ] Critical case

A critical case can be defined as having strategic importance in relation to the general problem. For example, an occupational medicine clinic wanted to investigate whether people working with organic solvents suffered brain damage. Instead of choosing a representative sample among all those enterprises in the clinic’s area that used organic solvents, the clinic strategically located , ‘If it is valid for this case, it is valid for all (or many) cases.’ In its negative form, the generalization would be, ‘If it is not valid for this case, then it is not valid for any (or only few) cases.’

For more on case selection, see [1]

[ edit ] Generalizing from case studies

The case study is effective for generalizing using the type of test that Karl Popper called falsification , which forms part of critical reflexivity [ 3 ] . Falsification is one of the most rigorous tests to which a scientific proposition can be subjected: if just one observation does not fit with the proposition it is considered not valid generally and must therefore be either revised or rejected. Popper himself used the now famous example of, "All swans are white," and proposed that just one observation of a single black swan would falsify this proposition and in this way have general significance and stimulate further investigations and theory-building. The case study is well suited for identifying "black swans" because of its in-depth approach: what appears to be "white" often turns out on closer examination to be "black."

For instance, Galileo Galilei ’s rejection of Aristotle ’s law of gravity was based on a case study selected by information-oriented sampling and not random sampling. The rejection consisted primarily of a conceptual experiment and later on of a practical one. These experiments, with the benefit of hindsight, are self-evident. Nevertheless, Aristotle’s incorrect view of gravity dominated scientific inquiry for nearly two thousand years before it was falsified. In his experimental thinking, Galileo reasoned as follows: if two objects with the same weight are released from the same height at the same time, they will hit the ground simultaneously, having fallen at the same speed. If the two objects are then stuck together into one, this object will have double the weight and will according to the Aristotelian view therefore fall faster than the two individual objects. This conclusion seemed contradictory to Galileo. The only way to avoid the contradiction was to eliminate weight as a determinant factor for acceleration in free fall. Galileo’s experimentalism did not involve a large random sample of trials of objects falling from a wide range of randomly selected heights under varying wind conditions, and so on. Rather, it was a matter of a single experiment, that is, a case study.(Flyvbjerg, 2006, p.225-6) [2]

Galileo’s view continued to be subjected to doubt, however, and the Aristotelian view was not finally rejected until half a century later, with the invention of the air pump. The air pump made it possible to conduct the ultimate experiment, known by every pupil, whereby a coin or a piece of lead inside a vacuum tube falls with the same speed as a feather. After this experiment, Aristotle’s view could be maintained no longer. What is especially worth noting, however, is that the matter was settled by an individual case due to the clever choice of the extremes of metal and feather. One might call it a critical case , for if Galileo’s thesis held for these materials, it could be expected to be valid for all or a large range of materials. Random and large samples were at no time part of the picture. However it was Galileo's view that was the subject of doubt as it was not reasonable enough to be Aristotelian view. By selecting cases strategically in this manner one may arrive at case studies that allow generalization.(Flyvbjerg, 2006, p.225-6) For more on generalizing from case studies, see [3]

[ edit ] Assumptions

1. Cases selected based on dimensions of a theory (pattern-matching) or on diversity on a dependent phenomenon (explanation-building).

2. No generalization to a population beyond cases similar to those studied.

3. Conclusions should be phrased in terms of model elimination, not model validation. Numerous alternative theories may be consistent with data gathered from a case study.

4. Case study approaches have difficulty in terms of evaluation of low-probability causal paths in a model as any given case selected for study may fail to display such a path, even when it exists in the larger population of potential cases.

[ edit ] History of the case study

As a distinct approach to research, use of the case study originated only in the early 20th century. The Oxford English Dictionary traces the phrase case study or case-study back as far as 1934, after the establishment of the concept of a case history in medicine. [ citation needed ]

The use of case studies for the creation of new theory in social sciences has been further developed by the sociologists Barney Glaser and Anselm Strauss who presented their research method, Grounded theory , in 1967.

The popularity of case studies in testing hypotheses has developed only in recent decades. One of the areas in which case studies have been gaining popularity is education and in particular educational evaluation. [ 5 ]

Case studies have also been used as a teaching method and as part of professional development, especially in business and legal education. The problem-based learning (PBL) movement is such an example. When used in (non-business) education and professional development, case studies are often referred to as critical incidents .

[ edit ] History of business cases

When the Harvard Business School was started, the faculty quickly realized that there were no textbooks suitable to a graduate program in business. Their first solution to this problem was to interview leading practitioners of business and to write detailed accounts of what these managers were doing. Of course the professors could not present these cases as practices to be emulated because there were no criteria available for determining what would succeed and what would not succeed. So the professors instructed their students to read the cases and to come to class prepared to discuss the cases and to offer recommendations for appropriate courses of action.

Business case studies recount real life business situations that present to business executives a dilemma. The case puts the scenario into the context of the factors that influence it. Cases are generally written by business school faculty with particular learning objectives in mind and are refined in the classroom before publication. Relevant documentation or AVitems and a carefully crafted teaching note often accompany cases.

[ edit ] See also

  • Casebook method
  • Case method

[ edit ] References

  • ^ Shepard, Jon ; Robert W. Greene (2003). Sociology and You . Ohio: Glencoe McGraw-Hill. pp. A-22. ISBN 0078285763 . http://www.glencoe.com/catalog/index.php/program?c=1675&s=21309&p=4213&parent=4526 .  
  • ^ a b Robert K. Yin. Case Study Research. Design and Methods . Third Edition. Applied social research method series Volume 5. Sage Publications. California, 2002. ISBN 0-7619-2553-8
  • ^ a b c Bent Flyvbjerg , "Five Misunderstandings About Case Study Research." Qualitative Inquiry , vol. 12, no. 2, April 2006, pp. 219-245.
  • ^ Siegfried Lamnek. Qualitative Sozialforschung . Lehrbuch. 4. Auflage. Beltz Verlag. Weihnhein, Basel, 2005
  • ^ Robert E. Stake, The Art of Case Study Research (Thousand Oaks: Sage, 1995). ISBN 080395767X

[ edit ] Useful Sources

  • Baxter, P and Jack, S. (2008) Qualitative Case Study Methodology: Study design and implementation for novice researchers, in The Qualitative Report , 13(4): 544-559. Available from [ http://www.nova.edu/ssss/QR/QR-4/baxter.pdf
  • Dul, J. and Hak, T (2008). Case Study Methodology in Business Research. Oxford: Butterworth-Heinemann. ISBN 978-0-7506-8196-4 .
  • Eisenhardt, K. M. (1989). Building theories from case study research. The Academy of Management Review , 14 (4), Oct, 532-550. doi : 10.2307/258557
  • Flyvbjerg, Bent. (2006). Five Misunderstandings About Case-Study Research, in Qualitative Inquiry , 12(2): 219-245. Available: [4]
  • Bent Flyvbjerg , Making Social Science Matter: Why Social Inquiry Fails and How It Can Succeed Again (Cambridge: Cambridge University Press , 2001). ISBN 052177568X
  • George, Alexander L. and Bennett,Andrew. (2005). Case studies and theory development in the social sciences . London, MIT Press 2005. ISBN 0-262-57222-2
  • Gerring, John. (2005) Case Study Research . New York: Cambridge University Press. ISBN 978-0-521-67656-4
  • Lijphart, Arend.(1971)Comparative Politics and the Comparative Method,in The American Political Science Review , 65(3): 682-693. Available from [5]
  • Ragin, Charles C. and Becker, Howard S. eds. (1992) What is a Case? Exploring the Foundations of Social Inquiry Cambridge: Cambridge University Press. ISBN 0521421888
  • Scholz, Roland W. and Tietje, Olaf. (2002) Embedded Case Study Methods. Integrating Quantitative and Qualitative Knowledge . Sage Publications. Thousand Oaks 2002, Sage. ISBN 0761919465
  • Straits, Bruce C. and Singleton, Royce A. (2004) Approaches to Social Research , 4th ed. Oxford University Press . ISBN 0195147944 Available from: [6]

[ edit ] External links

  • Introduction to Case Study
  • The Case Study as a Research Method
  • Case Studies
  • Darden Business Case Studies
  • ETH Zurich: Case studies in Environmental Sciences
  • 'Case Health' website - free downloadable book - Community Research - Case Studies in Health
  • Edit this page

Personal tools

  • NOTE: This is just a prototype of tag-based navigation on Wikipedia. Visit Wikipedia for the online encyclopedia. Log in / create account
  • Featured content
  • Current events
  • Random article

What is a Case Study? [+6 Types of Case Studies]

By Ronita Mohan , Sep 20, 2021

What is a Case Study Blog Header

Case studies have become powerful business tools. But what is a case study? What are the benefits of creating one? Are there limitations to the format?

If you’ve asked yourself these questions, our helpful guide will clear things up. Learn how to use a case study for business. Find out how cases analysis works in psychology and research.

We’ve also got examples of case studies to inspire you.

Haven’t made a case study before? You can easily  create a case study  with Venngage’s customizable templates.

CREATE A CASE STUDY

Click to jump ahead:

What is a case study, what is the case study method, benefits of case studies, limitations of case studies, types of case studies, faqs about case studies.

Case studies are research methodologies. They examine subjects, projects, or organizations to tell a story.

Case Study Definition LinkedIn Post

USE THIS TEMPLATE

Numerous sectors use case analyses. The social sciences, social work, and psychology create studies regularly.

Healthcare industries write reports on patients and diagnoses. Marketing case study examples , like the one below, highlight the benefits of a business product.

Bold Social Media Business Case Study Template

CREATE THIS REPORT TEMPLATE

Now that you know what a case study is, we explain how case reports are used in three different industries.

What is a business case study?

A business or marketing case study aims at showcasing a successful partnership. This can be between a brand and a client. Or the case study can examine a brand’s project.

There is a perception that case studies are used to advertise a brand. But effective reports, like the one below, can show clients how a brand can support them.

Light Simple Business Case Study Template

Hubspot created a case study on a customer that successfully scaled its business. The report outlines the various Hubspot tools used to achieve these results.

Hubspot case study

Hubspot also added a video with testimonials from the client company’s employees.

So, what is the purpose of a case study for businesses? There is a lot of competition in the corporate world. Companies are run by people. They can be on the fence about which brand to work with.

Business reports  stand out aesthetically, as well. They use  brand colors  and brand fonts . Usually, a combination of the client’s and the brand’s.

With the Venngage  My Brand Kit  feature, businesses can automatically apply their brand to designs.

A business case study, like the one below, acts as social proof. This helps customers decide between your brand and your competitors.

Modern lead Generation Business Case Study Template

Don’t know how to design a report? You can learn  how to write a case study  with Venngage’s guide. We also share design tips and examples that will help you convert.

Related: 55+ Annual Report Design Templates, Inspirational Examples & Tips [Updated]

What is a case study in psychology?

In the field of psychology, case studies focus on a particular subject. Psychology case histories also examine human behaviors.

Case reports search for commonalities between humans. They are also used to prescribe further research. Or these studies can elaborate on a solution for a behavioral ailment.

The American Psychology Association  has a number of case studies on real-life clients. Note how the reports are more text-heavy than a business case study.

What is a case study in psychology? Behavior therapy example

Famous psychologists such as Sigmund Freud and Anna O popularised the use of case studies in the field. They did so by regularly interviewing subjects. Their detailed observations build the field of psychology.

It is important to note that psychological studies must be conducted by professionals. Psychologists, psychiatrists and therapists should be the researchers in these cases.

Related: What Netflix’s Top 50 Shows Can Teach Us About Font Psychology [Infographic]

What is a case study in research?

Research is a necessary part of every case study. But specific research fields are required to create studies. These fields include user research, healthcare, education, or social work.

For example, this UX Design  report examined the public perception of a client. The brand researched and implemented new visuals to improve it. The study breaks down this research through lessons learned.

What is a case study in research? UX Design case study example

Clinical reports are a necessity in the medical field. These documents are used to share knowledge with other professionals. They also help examine new or unusual diseases or symptoms.

The pandemic has led to a significant increase in research. For example,  Spectrum Health  studied the value of health systems in the pandemic. They created the study by examining community outreach.

What is a case study in research? Spectrum healthcare example

The pandemic has significantly impacted the field of education. This has led to numerous examinations on remote studying. There have also been studies on how students react to decreased peer communication.

Social work case reports often have a community focus. They can also examine public health responses. In certain regions, social workers study disaster responses.

You now know what case studies in various fields are. In the next step of our guide, we explain the case study method.

Return to Table of Contents

A case analysis is a deep dive into a subject. To facilitate this case studies are built on interviews and observations. The below example would have been created after numerous interviews.

Case studies are largely qualitative. They analyze and describe phenomena. While some data is included, a case analysis is not quantitative.

There are a few steps in the case method. You have to start by identifying the subject of your study. Then determine what kind of research is required.

In natural sciences, case studies can take years to complete. Business reports, like this one, don’t take that long. A few weeks of interviews should be enough.

Blue Simple Business Case Study Template

The case method will vary depending on the industry. Reports will also look different once produced.

As you will have seen, business reports are more colorful. The design is also more accessible . Healthcare and psychology reports are more text-heavy.

Designing case reports takes time and energy. So, is it worth taking the time to write them? Here are the benefits of creating case studies.

  • Collects large amounts of information
  • Helps formulate hypotheses
  • Builds the case for further research
  • Discovers new insights into a subject
  • Builds brand trust and loyalty
  • Engages customers through stories

For example, the business study below creates a story around a brand partnership. It makes for engaging reading. The study also shows evidence backing up the information.

Blue Content Marketing Case Study Template

We’ve shared the benefits of why studies are needed. We will also look at the limitations of creating them.

Related: How to Present a Case Study like a Pro (With Examples)

There are a few disadvantages to conducting a case analysis. The limitations will vary according to the industry.

  • Responses from interviews are subjective
  • Subjects may tailor responses to the researcher
  • Studies can’t always be replicated
  • In certain industries, analyses can take time and be expensive
  • Risk of generalizing the results among a larger population

These are some of the common weaknesses of creating case reports. If you’re on the fence, look at the competition in your industry.

Other brands or professionals are building reports, like this example. In that case, you may want to do the same.

Coral content marketing case study template

There are six common types of case reports. Depending on your industry, you might use one of these types.

Descriptive case studies

Explanatory case studies, exploratory case reports, intrinsic case studies, instrumental case studies, collective case reports.

6 Types Of Case Studies List

USE THIS TEMPLATE

We go into more detail about each type of study in the guide below.

Related:  15+ Professional Case Study Examples [Design Tips + Templates]

When you have an existing hypothesis, you can design a descriptive study. This type of report starts with a description. The aim is to find connections between the subject being studied and a theory.

Once these connections are found, the study can conclude. The results of this type of study will usually suggest how to develop a theory further.

A study like the one below has concrete results. A descriptive report would use the quantitative data as a suggestion for researching the subject deeply.

Lead generation business case study template

When an incident occurs in a field, an explanation is required. An explanatory report investigates the cause of the event. It will include explanations for that cause.

The study will also share details about the impact of the event. In most cases, this report will use evidence to predict future occurrences. The results of explanatory reports are definitive.

Note that there is no room for interpretation here. The results are absolute.

The study below is a good example. It explains how one brand used the services of another. It concludes by showing definitive proof that the collaboration was successful.

Bold Content Marketing Case Study Template

Another example of this study would be in the automotive industry. If a vehicle fails a test, an explanatory study will examine why. The results could show that the failure was because of a particular part.

Related: How to Write a Case Study [+ Design Tips]

An explanatory report is a self-contained document. An exploratory one is only the beginning of an investigation.

Exploratory cases act as the starting point of studies. This is usually conducted as a precursor to large-scale investigations. The research is used to suggest why further investigations are needed.

An exploratory study can also be used to suggest methods for further examination.

For example, the below analysis could have found inconclusive results. In that situation, it would be the basis for an in-depth study.

Teal Social Media Business Case Study Template

Intrinsic studies are more common in the field of psychology. These reports can also be conducted in healthcare or social work.

These types of studies focus on a unique subject, such as a patient. They can sometimes study groups close to the researcher.

The aim of such studies is to understand the subject better. This requires learning their history. The researcher will also examine how they interact with their environment.

For instance, if the case study below was about a unique brand, it could be an intrinsic study.

Vibrant Content Marketing Case Study Template

Once the study is complete, the researcher will have developed a better understanding of a phenomenon. This phenomenon will likely not have been studied or theorized about before.

Examples of intrinsic case analysis can be found across psychology. For example, Jean Piaget’s theories on cognitive development. He established the theory from intrinsic studies into his own children.

Related: What Disney Villains Can Tell Us About Color Psychology [Infographic]

This is another type of study seen in medical and psychology fields. Instrumental reports are created to examine more than just the primary subject.

When research is conducted for an instrumental study, it is to provide the basis for a larger phenomenon. The subject matter is usually the best example of the phenomenon. This is why it is being studied.

Purple SAAS Business Case Study Template

Assume it’s examining lead generation strategies. It may want to show that visual marketing is the definitive lead generation tool. The brand can conduct an instrumental case study to examine this phenomenon.

Collective studies are based on instrumental case reports. These types of studies examine multiple reports.

There are a number of reasons why collective reports are created:

  • To provide evidence for starting a new study
  • To find pattens between multiple instrumental reports
  • To find differences in similar types of cases
  • Gain a deeper understanding of a complex phenomenon
  • Understand a phenomenon from diverse contexts

A researcher could use multiple reports, like the one below, to build a collective case report.

Social Media Business Case Study template

Related: 10+ Case Study Infographic Templates That Convert

What makes a case study a case study?

A case study has a very particular research methodology. They are an in-depth study of a person or a group of individuals. They can also study a community or an organization. Case reports examine real-world phenomena within a set context.

How long should a case study be?

The length of studies depends on the industry. It also depends on the story you’re telling. Most case studies should be at least 500-1500 words long. But you can increase the length if you have more details to share.

What should you ask in a case study?

The one thing you shouldn’t ask is ‘yes’ or ‘no’ questions. Case studies are qualitative. These questions won’t give you the information you need.

Ask your client about the problems they faced. Ask them about solutions they found. Or what they think is the ideal solution. Leave room to ask them follow-up questions. This will help build out the study.

How to present a case study?

When you’re ready to present a case study, begin by providing a summary of the problem or challenge you were addressing. Follow this with an outline of the solution you implemented, and support this with the results you achieved, backed by relevant data. Incorporate visual aids like slides, graphs, and images to make your case study presentation more engaging and impactful.

Now you know what a case study means, you can begin creating one. These reports are a great tool for analyzing brands. They are also useful in a variety of other fields.

Use a visual communication platform like Venngage to design case studies. With Venngage’s templates, you can design easily. Create branded, engaging reports, all without design experience.

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • PMC10499201

Logo of plosone

Wikipedia as a tool for contemporary history of science: A case study on CRISPR

Omer Benjakob

1 System Engineering and Evolution Dynamics, Inserm, Université Paris Cité, Paris, France

2 Learning Planet Institute, Paris, France

Jean-Marc Sevin

Leo blondel, ariane augustoni, matthieu collet, louise jouveshomme.

3 Bezalel Academy of Arts and Design, Jerusalem, Israel

Ariel Linder

Rona aviram, associated data.

All relevant data are within the manuscript and its Supporting Information files.

Rapid developments and methodological divides hinder the study of how scientific knowledge accumulates, consolidates and transfers to the public sphere. Our work proposes using Wikipedia, the online encyclopedia, as a historiographical source for contemporary science. We chose the high-profile field of gene editing as our test case, performing a historical analysis of the English-language Wikipedia articles on CRISPR. Using a mixed-method approach, we qualitatively and quantitatively analyzed the CRISPR article’s text, sections and references, alongside 50 affiliated articles. These, we found, documented the CRISPR field’s maturation from a fundamental scientific discovery to a biotechnological revolution with vast social and cultural implications. We developed automated tools to support such research and demonstrated its applicability to two other scientific fields–coronavirus and circadian clocks. Our method utilizes Wikipedia as a digital and free archive, showing it can document the incremental growth of knowledge and the manner scientific research accumulates and translates into public discourse. Using Wikipedia in this manner compliments and overcomes some issues with contemporary histories and can also augment existing bibliometric research.

Introduction

In recent years, the historically qualitative field of history of science has undergone a data revolution [ 1 ], with research increasingly making more use of big data and computational techniques for historical ends [ 2 ]. Alongside the rise of digital humanities, a divide has persisted between quantitative historical research and textually rich qualitative work. The result is a historiographic lacuna [ 3 ] and a debate regarding the materials and methods which can be researched [ 4 , 5 ]. Here, we suggest a new resource can be utilized for the context of history of contemporary science by systematizing existing research methods on an unlikely arena that is rich in both bibliometric data and historical text: Wikipedia.

Wikipedia’s volunteer-run-editorial process was long lambasted as inherently unreliable. When in 2005 Wikipedia was found by a study by Nature ’s news site to be as reliable as Encyclopedia Britannica—the news was met with surprise [ 6 , 7 ]. Issues of accuracy in the encyclopedia anyone could edit, along with “edit wars” between volatile unknown editors, all became a topic of research [ 7 – 9 ]. In recent years however, this narrative has reversed: both academic research and the media have praised Wikipedia’s coverage and reliance on sources, which in some cases have been found to be in lock step with science [ 7 , 10 , 11 ], especially in regards to the COVID-19 pandemic [ 12 , 13 ].

If Wikipedia is indeed meetings its self-defined goal of representing the scientific consensus on scientific knowledge while making use of scientific sources, then we wanted see if this was also true historically: Can Wikipedia articles document shifts in science over time ? Using Wikipedia’s edit history, we hypothesized, could allow to see changes in how a field was represented in the past and allow the tracking of citations, with new ones being added and old ones removed as new knowledge accumulated, sparking reassessments of existing paradigms. Our first study showed how two Wikipedia articles on the Nobel Prize-winning field of circadian clocks managed to accurately reflect the field for over 15 years, even as it underwent scientific revolutions [ 14 ]. Others have used Wikipedia in a similar manner to focus on documentation of political events such as the 2011 Egyptian Revolution [ 15 ].

Wikipedia provides a rich, open, and accessible source of information as past versions of all articles can be viewed through what is termed the changelog. This continuum of text throughout time compliments the traditional historical practice of textual analyses, yet in this case the reading is of changing versions of the same text as opposed to comparing different scientific reviews and papers. For us, this feature raised the possibility to map the changes of specific parts of the article’s text, structure and references and easily track new additions and deletions.

The study of academic sources and publications are mainstays of the history of science. Both qualitative and quantitative researchers make use of them—be it for bibliometric analysis or thick description [ 4 , 16 ].

The historical methods born with historian Derek J. de Solla Price that made use of publication data [ 17 ] joined the works of earlier historians like Robert K. Merton that laid the historiographic framework for research into the scientific revolution [ 18 ]. Later on, sociological works, written by historians like Robert Darnton on the history of books, offered a qualitative detail-rich chronicle of the rise of scientific media during the Enlightenment, substantiating the scientometrics of history with rich detail [ 19 , 20 ].

Unlike academic publications focused on the state-of-art of the field or review papers coverage of the aforementioned, Wikipedia does not aim to publish original research—it only reflects the scientific consensus based on already published sources. In that, Wikipedia also provides a textually rich base for qualitative work regarding narrativization and the communication of science outwards from academia to the public. Importantly, policies enforced by Wikipedia’s editors require “verifiable” sources to back all factual claims [ 21 ], and every article has a reference list. A small but growing body of bibliometric research based on Wikipedia has also emerged [ 22 , 23 ] and even found that on medical [ 24 ] and science [ 8 ] topics English-language articles have an explicit bias towards using academic sources.

Wikipedia thus easily lends itself to such research, providing both data and text that can be used for historical analysis. In this work, we demonstrate this through a case study on the CRISPR field.

In a relatively short period of time, CRISPR-based gene-editing tools have been labeled the scientific “breakthrough” of the 21 st century [ 25 ]. While CRISPRs were identified in the 1980’s, and received their name in 2002 [ 26 ], their function remained unclear until 2005, when different labs deduced from in silico studies that CRISPR sequences were part of a bacterial adaptive immune system [ 27 – 29 ]. The academic studies that first performed CRISPR-based directed gene editing in vitro were famously published in 2012: First from the labs of Jennifer Doudna and Emmanuelle Charpentier [ 30 ] and shortly after in a paper of the Virginijus Šikšnys group [ 31 ]. These were rapidly followed by publications in February 2013 that performed genetic engineering in vivo in mammals, led by scientists Fang Zhang [ 32 ] and George Church [ 33 ]. Thus, the field matured from a basic science discovery into the ability to utilize CRISPR-associated proteins like Cas9 for genetic engineering, currently used by countless labs around the globe [ 34 ]. Doudna and Charpentier were awarded the 2020 Nobel Prize for Chemistry for their scientific contribution to genetic editing technologies, showcasing how the so-called CRISPR revolution played out over the past 20 years. Told as such, CRISPR’s history can seemingly be deduced through academic publications. However, the science itself does not tell the full scientific story.

In contrast to many other groundbreaking scientific discoveries which remain known only within scientific circles, gene editing has also been in the spotlight of much public debate. For example, many news outlets have dedicated reports to developments in the field and debated the ethical implications of “designer babies” [ 35 ]. Netflix has even broadcasted a documentary on CRISPR, underscoring its iconic status in popular culture. By now, CRISPR is not a purely scientific phenomenon. Wikipedia, a popular source read and compiled by the general public, strives to document these facets as well.

The CRISPR field’s brief history has been riddled with controversies, and legal battles over credit and CRISPR patents were all covered extensively in the media [ 36 ]. Most famously, Eric Lander’s perspective in Cell, the “Heroes of CRISPR” [ 37 ], was met with fierce criticism [ 38 ]. Critics claimed that the text offered a biased version of the field’s history that minimized the roles of some scientists as part of the patent war raging between academic institutions [ 39 ]—going as far as to label Lander the “villain” of CRISPR [ 40 ]. This controversy underscores how scientific outlets, even those famous for publishing novel scientific research, may not necessarily serve as reliable historical sources on contemporary science itself.

The encyclopedia’s text and sources can thus be viewed as an inclusive media, one that can potentially help track the interaction with additional fields and allow a better understanding of how scientific knowledge ramifies well outside the realm of academic publications.

CRISPR is a prime example of a scientific field that has undergone massive growth during Wikipedia’s lifespan. It is an ideal case study as its history is short (i.e., parallel to Wikipedia’s lifetime) and multi-faceted: a highly scientific topic with wide-ranging technological and social ramifications. These facets, we found, were documented on Wikipedia and its different articles, supported by scientific, public and popular sources alike. Together, our findings—based on an analysis of the CRISPR article and 50 others with related content—suggest that Wikipedia can indeed serve as a tool in the digital history of contemporary science. To that end, we put forward a methodology and provide automated tools utilizing Wikipedia’s data—its articles, their edit histories and their references. Our method relies on both quantitative and qualitative analyses that may help consolidate research into Wikipedia and help address the aforementioned conflict between data and content-dependent historical research.

Methods & results

Delineating the research scope.

Historical research requires a clear delineation of the field being studied, for instance gathering a collection of academic publications [ 5 ]. Similarly, the manner in which a scientific field is represented on Wikipedia requires clear delineation of scope and span—i.e., the articles that touch on it and the time frame being examined. While a single article can provide a rich source of textual and historical data, related articles may represent more nuanced facets of a field—like scientists’ biographies or related events and technologies. Identifying these requires sieving through Wikipedia’s massive body of articles—currently numbering well above 6 million in English alone.

For this aim, we propose a stepwise strategy for defining a research corpus about a certain topic. The first step utilizes Wikipedia’s free-text search function to find all articles that contain the topic being researched ( Fig 1A ). In the present study, searching for “CRISPR” yielded 720 Wikipedia articles containing that term, as of June 2022 ( Fig 1B ). Reading of these articles revealed that the majority made only minor or incidental use of CRISPR. Thus, to permit qualitative analyses on a more focused pool, we designed the second stage of the research funnel, which calls for retaining only those articles with the term in either their title or one of their sections, as these will likely contain a substantial amount of information directly related to the subject. To facilitate these steps of search and filtration we designed a tool to do this automatically for any given term of interest ( WikiCorpusBuilder ). Continuing with the term “CRISPR”, this filtering yielded 51 articles ( S1 Table ). Out of these, 10 had CRISPR in their title, and another 41 that only had it in the title of one of their sections.

An external file that holds a picture, illustration, etc.
Object name is pone.0290827.g001.jpg

A) Scheme of proposed research flow as supported by our tool: A free search of Wikipedia’s English-language articles is conducted to identify relevant articles; these are then filtered to include only those with the term in either their title or that of a section. Next, different analyses can be performed on the anchor article and corpus. Of the listed examples, in bold are the data provided by our tool, the rest are currently collected manually. B) Breakdown of flow scheme in the CRISPR case study, as of June 2022.

Even among this list of corpus articles, a clear hierarchy arises between those which in subject, text and focus are fully aligned with the topic being researched (which we term the “anchor article/s”); and “auxiliary” articles, those that represent secondary aspects of the topic or instances in which it is embedded within other fields. The distinction is important as it allows us to focus the qualitative work described in the following sections. Here, the anchor article selected was”CRISPR”, which was determined semantically based on its title and content.

Within this CRISPR corpus, several auxiliary articles focused on scientific topics, for example the article for “CRISPR Activation”, “Cas9”, or “CRISPR gene editing”; while others had wider scientific topics, such as “Antibiotic”, “Gene knockdown”, and “Genome editing”. Also included were articles with broad topics, for example “Wheat” which had a section on CRISPR-edited strains of grain. Another group of articles were those dedicated to scientists, like the 2020 Nobel laureates Doudna and Charpentier, awarded for their groundbreaking work in the field; or Šikšnys, who also played a pivotal role in CRISPR’s history. Other science-adjacent articles touched on more social facets of CRISPR, e.g., “The CRISPR Journal” and “Designer baby”, showing how cultural aspects are also captured by this method. We therefore concluded that these articles provide a good sample of CRISPR related knowledge.

Another advantage Wikipedia provides is open access to these articles’ data, which we harness using our tool to further characterize the corpus. For example, “CRISPR” ranked amongst the top five articles in terms of size, number of references, and number of edits ( S1 Fig ).

Thus, we have composed a clearly defined research corpus regarding our key term of interest. The corpus can be depicted through the titles (qualitative) or the data (quantitative), and further mixed analyses can take place as will be demonstrated below.

Mixed method analyses for understanding historical growth of knowledge

After having established the research scope, created our corpus and identified the anchor article(s) within it, we move to the analysis phase. We deployed three different complementary analyses: (1) A qualitative reading of anchor article(s) text and structure, both its current and past versions; a quantitative analysis of the (2) anchor article and (3) of the corpus. All three are based on data and materials readily available on Wikipedia and many aspects of the quantitative analyses have been automated through our tool ( Fig 1 ). The following describes how these were deployed on CRISPR.

Mixed-methods research [ 41 ] combines quantitative and qualitative analyses and served as the basis for this research. We employed this in what can be termed Wikipedia-focused “thick big data” [ 42 ] (as opposed to content-agnostic big data approaches) which meshes the world of thick description and data analysis. In our case, Wikipedia articles, their edit histories and sources are treated as an initial dataset, which are then analyzed semantically as well as through quantitative methods and then interpreted in a detailed manner.

First, using Wikipedia’s “view history” tool, available at the top right of every article ( Fig 2A ), we can access the anchor article’s past versions and perform a comparative reading. Here, we used annual intervals to sample textual changes—at times narrowing the time frame to provide a more detailed account of the article’s historical textual growth.

An external file that holds a picture, illustration, etc.
Object name is pone.0290827.g002.jpg

A) An example of the top of a Wikipedia article, note the `View history`(frame added) tab that enables accessing older versions of the text. Snapshots from the Wikipedia archive of the CRISPR article: B) the full text of the article when it first opened on June 30th 2005, and C) extract of the lead section’s opening paragraphs, as of July 6th, 2022.

Comparing the article’s past versions provided rich historical context: The article for CRISPR was created in June 2005, as what is termed a “stub”—a short entry in need of further elaboration ( Fig 2B ). This first version included but a single paragraph elucidating the CRISPR acronym and describing the genetic locus. At the time, there was no mention of its relation to bacterial immunity or gene editing, two points which would be integral to the field, and as a result be highlighted in the article’s lead text in future versions ( Fig 2C ).

Next, we augmented this form of textual comparison with structural analyses of the CRISPR article’s architecture, i.e., its table of contents, a basic feature of all Wikipedia’s articles. This “table of contents” or “section” analysis is done as a mixed method: Quantitatively, we measured the overall number of sections and subsections ( Fig 3A ); qualitatively, we reviewed their titles and documented the changes they underwent to provide insight into the content of the article, with the section titles serving as a proxy for new units of CRISPR-related knowledge ( Fig 3B and S2 Table ). Due to the semantic variation between sections, we prefer to gather this data and perform the comparison manually.

An external file that holds a picture, illustration, etc.
Object name is pone.0290827.g003.jpg

A) The number of sections and subsections in the CRISPR article since it opened in 2005. B) Titles of the article’s sections throughout 2010–2022, sampled biannually. Subsections and those listing sources were removed for clarity and can be found in S2 Table . Alignment and coloring were added manually to highlight sections repeating in consecutive revisions. C) Timeline of the number of the corpus’ articles opened each year since Wikipedia was launched (2001). The articles titles and DOB can be found in S1 Table . D) Changelog as of November 25th, 2013, documenting the section title change from “Possible applications” to “Applications” (Other changes that occurred as part of that edit were removed for visibility, and can be found in the archive ). All analyses shown occurred until June 2022.

As we shall see, the section analysis is intertwined with shifts in the corpus. To understand the historical processes that took place in the CRISPR corpus, we can examine the articles based on their Date Of Birth (DOB), ( Fig 3C and S1 Table ). Even more so than the appearance of a new section—opening new articles on Wikipedia requires the topic at hand to have a certain level of “notability” [ 9 ], and we therefore considered the creation of a new article as an indication that a critical abundance of knowledge and editor interest has been reached. Here too, we combined a quantitative evaluation of the number of articles being created with a content-dependent reading of their titles ( Fig 3C and S1 Table ). Finally, a side-by-side view of these two timelines (sections and DOB) adds another layer of information, interpreted to provide a narrative to contextualize the findings, as described below.

Qualitative reading of the section titles showed that the structural changes were directly linked to shifts in the article’s content, pertaining to either the accumulation of new knowledge or the restructuring of the growing field’s representation on Wikipedia. For example, the first sections added in 2010 were “CRISPR Mechanism”, “CRISPR Spacer and Repeats,” “CAS Genes” and the reference section ( Fig 3B and S2 Table ). These sections pertain to CRISPR’s genetic makeup, and can be collectively referred to as the basic science behind CRISPR.

In 2011, a “Discovery of CRISPR” section was added to the article, which was later renamed “History”. The addition of an explicitly historical section in the article indicated a new phase in the scientific narrative it put forward, perhaps the result of a new self-consciousness or understanding that the emerging field was now old enough to have a history of its own. After a few months, a section termed “Evolutionary significance and possible applications” was created . For the next three years it included three proposed applications:

  • “Artificial immunization against phage by introduction of engineered CRISPR loci in industrially important bacteria , including those used in food production and large-scale fermentations .
  • Knockdown of endogenous genes by transformation with a plasmid which contains a CRISPR area with a spacer , which inhibits a target gene .
  • Discrimination of different bacterial strains by comparison of CRISPR spacer sequences ( spoligotyping )”

However, these would change in the following year. In April 2013 , a user called Genomeengineering made what would be their sole yet extremely significant contribution to Wikipedia: Adding the 2012 paper by Doudna and Charpentier, and the two 2013 publications by Zhang and Church. They also amended the list of possible applications so it now included “genome engineering at cellular or organismic level by reprogramming of a CRISPR-Cas system to achieve RNA-guided genome engineering”. In November of that year the section’s title changed from “Possible applications” to “Applications”. This serves as a prime example of how the article documented changes in the field as they took place, with Wikipedia’s native “View history” tool’s textual comparison function offering snapshots of the “revolution” ( Fig 3D ).

Alongside this section’s growth, which also saw the birth of the “Further reading” section, and a section dedicated to “External links” was expanded, providing access to new utilities developed for CRISPR researchers. For example, a link to a “comprehensive software” for CRISPR guideRNA design was added as well as a link to a tool “for finding CRISPR targets.”

At the corpus level, this period also saw a spurt in article creation, with a number of CRISPR-related articles being created, like “CRISPR interference”. At this time, more articles directly based on or linked to CRISPR science and its applications were also created. For example, articles like “Genome editing” (2012) and “Cas9” (2013). It is also during this phase that the articles for scientists linked to its discovery were opened: an article about Doudna was created in 2012, coinciding with the publication of her landmark Science paper [ 30 ]. Soon thereafter, articles were created for “Epigenome editing” (2014) and “CRISPR/Cas tools” (2015). Thus, qualitatively, this period can be seen as covering the emergence and establishment of the applicative side of CRISPR.

On March 31, 2014, a few weeks after Doudna and Charpentier applied for a patent for their work, a “Patents” section was opened . In 2016, the section dealing with patents was expanded to include a “Patent and commercialization” subsection that detailed a list of patent holders that at the time were fighting in the courts over legal ownership and in academic media over credit ( S3 Table ). At the corpus level, we observed the creation of articles for Charpentier (2015) and Šikšnys (2016), in tandem to the credit and patent wars raging over their respective discoveries.

In February 2019, with the patent wars reaching their resolution in the courts, the section (then four paragraphs long) was completely removed from the article . However, it was not deleted, but rather migrated to a new article called “CRISPR gene editing,” opened that month in a big text-migration out of the anchor article. Also migrated was the section “Society and culture”, which described the ability to conduct human gene editing in terms of the wider social debate about it and the policy changes it sparked, alongside a subsection on “Recognition” that attempted to attribute the CRISPR discovery to specific persons. The migration of key sections into “CRISPR gene editing” is evident in the drop in the number of sections in 2019, alongside the uptick in the number of articles in the corpus like “genome-wide CRISPR-cas9 knockout screens”, “the CRISPR Journal” and “LEAPER gene editing” ( Fig 3 ).

This later phase also continued to document the growth of the biotech industry based on CRISPR, for example CRISPR Therapeutics, a company co-founded by Charpentier, received an article in 2021, further highlighting the field’s maturation and growth in technology.

Tellingly, 2020 also saw the creation of a “Pandemic prevention” article, which, in tandem with the COVID-19 pandemic, detailed all the medical and scientific attempts to preempt viral outbreaks—including those that could potentially make use of CRISPR. Articles like these raise an interesting question regarding the role of CRISPR in other bodies of knowledge and warrant an examination of the wider corpus.

Cross-pollination: CRISPR as a body of knowledge

Our analyses thus far shows that knowledge on Wikipedia is rarely confined to a single article, but is rather stored in groups of articles that are constantly changing and cross-pollinate one another. On Wikipedia, this process can take on two distinct forms: new articles opening about the topic that directly address it, or existing articles changing to include new text, references or sections dedicated to the scientific topic’s intersection with other bodies of knowledge. Tracking the migration between articles can illuminate how knowledge diffuses.

To better understand the temporal aspect of CRISPR’s representation across articles on Wikipedia we next compared the DOB of the different articles in our CRISPR corpus and the date the term CRISPR was first mentioned in them.

Of the 51 articles in the CRISPR corpus, 26 already had the term “CRISPR” in their first version ( Fig 4A ). Among these were the articles for researchers like Charpentier, Šikšnys and Francis Mojica. This group also included articles for scientific topics discovered in later stages of the CRISPR field’s growth, like “Cas12”, and articles reflecting CRISPR in culture, like the aforementioned academic journal. With few exceptions, like “CRISPR” and “CRISPR interference”, opened in 2005 and 2010, respectively, articles that were created with CRISPR already mentioned in their first version were mostly opened post-2014 ( Fig 4B ).

An external file that holds a picture, illustration, etc.
Object name is pone.0290827.g004.jpg

Comparing an article’s creation date and CRISPR’s first mentions. A) An article’s date of birth (DOB, blue) compared to the year of it first mentioned “CRISPR” (red), sorted by the former. B) The relation between the DOB and the time it took for the first mention of CRISPR of each article. Displayed is a linear trendline and R 2 .

The 24 articles that lacked “CRISPR” in their inception provide insight into the growth of the field over time. Importantly, this analysis shows how many concepts now associated with CRISPR did actually exist prior to its discovery or its application in gene editing was known. Prime examples are “Gene knockout” and “Gene knockdown”, which in fact predate CRISPR. However, as we saw, in a later stage their content was recast to take CRISPR into account and the articles were retroactively affiliated with the CRISPR field (in 2017 and 2013, respectively). Similarly, “Genome editing” was opened in 2012 but mentioned CRISPR only in 2014. The article “Designer baby” opened in 2005 and depicted what was initially only a theoretical issue used in “popular scientific and bioethics literature.” However, this changed with CRISPR’s rise to prominence and since 2018 it directly referenced CRISPR, with a lengthy debate in wake of the “He Jiankui affair”, in which the Chinese scientist created in 2018 the world’s first so-called CRISPR babies in a widely reported incident.

We could also observe CRISPR’s interface with other scientific fields through articles related to wider topics. For example, the two oldest articles in the corpus, “Wheat” and “Antibiotic”, were opened in 2001, and were late to adopt “CRISPR” some twenty years later.

In sum, this analysis revealed a clear divide between articles that mentioned CRISPR from the onset and those that incorporated the term only in later stages: In general, this analysis underscores how CRISPR ramified across Wikipedia not just in the form of new articles, but also recasting older ones.

From lab to public: Wikipedic bibliometrics map the diffusion of knowledge over time

All claims on Wikipedia need to be attributed to a verifiable source [ 21 ]. For our purposes, these references constitute a source rich with text, information and data for additional analyses: combining quantitative bibliometric analyses like citation count, with content-dependent evaluation of the actual sources, to better understand the types of references supporting the “anchor” article. Quantitatively, we have previously developed two bibliometric analyses for Wikipedia articles—the “SciScore”, which gauges the ratio of academic to non-academic sources (ranges 0–1) [ 12 ], and the “Latency”, which gauges the duration between an academic paper’s publication and when it was referenced in a Wikipedia article [ 14 ].

Our automated tool scrapes only the reference list of each article in the corpus, which is then further parsed to identify and characterize its different sources: “.org”, “.com” and those containing DOIs/PMIDs/PMCs (i.e., scientific papers). Thus, we can assign a SciScore at both the corpus level and that of an individual article.

We found that the CRISPR anchor article was supported by 208 external sources in its “References” and “Further reading” sections ( Fig 5A ). The article’s SciScore was 0.92, ranking 13/51 in the corpus ( Fig 5B and S2A Fig ). The top cited journal was Science (23 papers), followed by Nature and Cell (14 each), ( S2B and S2C Fig ). These results are consistent with previous analyses of Wikipedia articles focused on scientific topics that show that these make use of peer reviewed, high-impact factor academic publications [ 8 , 23 ].

An external file that holds a picture, illustration, etc.
Object name is pone.0290827.g005.jpg

A) The number of references in the CRISPR article’s reference section since it opened until December 2021. B) “CRISPR”s SciScore (shown until December 2021). C) The article’s references latency distribution (i.e., duration between a scientific paper’s publication and its integration into Wikipedia). D) A timeline comparing the date of selected publications (black frames, left) to their citation in the CRISPR article (blue frames, right). E) A snapshot comparing two versions of the CRISPR article from May 2007, showing how changes to the wording of the text were linked to the citation of Barrangou et al., 2007.

To attain a historical perspective, we next analyzed the temporal aspect of the above discussed bibliometric parameters, which were compared and contextualized to the changes in sections ( Fig 3A ). We found that these metrics, and overlapping trends between them, served as markers for important events in the history of the field. A prime example of this can be seen in the aforementioned “Patents” section: on March 6, 2014 Doudna’s and Charpentier’s patent application was published online and a few weeks later the “ Patents ” section was opened in the CRISPR article ( S3 Table ). It cited the US Patent Office website. By 2015, after the Broad Institute was awarded its own patent and the appeal against it was filed by the universities representing Doudna and Charpentier, the article’s text changed to indicate that, “As of December 2014, patent rights to CRISPR were still developing.” The text also noted that there was “a bitter fight over the patents for CRISPR”, a claim supported by this new type of citation which grew increasingly present in the CRISPR article: non-academic sources, in the form of both news articles about the legal cases and even the patents themselves. For example, the claim about the “bitter” legal battle was sourced to a story in MIT Technology Review, a popular science news site, while also referring directly to specific patents and or formal application documents made public online. Overall, the section included a laundry list of patent holders and claimants with a hodgepodge of popular and legal sources as citations. Throughout its entire existence, all the sources in this section were non-academic.

The fact that non-academic sources were deployed in the article to support non-academic aspects of the CRISPR history shows how these types of sources can document non-scientific ramifications of scientific developments. However, the entrance of non-academic sources was not limited to patent debates and also touched on CRISPR’s growing social prominence. For example, the 2015 selection of CRISPR as “Breakthrough of the year” [ 43 ] was supported by links to popular media sources. Together with the patent links, these non-academic sources led to a decrease in the article’s SciScore during this phase ( Fig 5B ).

Collectively, these highlight how bibliometric shifts are reflective of substantive changes in the article’s texts, which in turn are reflective of real-world developments in the field, both in terms of the science and of the social debates it inspires.

We next conducted bibliometric analysis on the entire corpus. We found a number of articles with high SciScores (like “CRISPR interference” or “Cas9”) alongside those with low percentage of academic sources, like that for Mojica or the concept of designer babies ( S2A Fig ). This indicates a correlation between the scientificness of an article’s topic and its SciScore, with biographical articles for scientists for example, usually ranking lower than those on scientific concepts.

The “CRISPR” article ranked among the top ten in terms of scientificness. To further gauge its current score with the state of the available research, we determined the latency of all the article’s references. This analysis revealed a distribution varying between a single day to over 30 years, with a median latency of 1.7 years ( Fig 5C ). This bibliometric data can be contextualized through the example of the integration dynamics of publications relating CRISPR to bacterial immunity ( Fig 5D ). Rodolphe Barrangou was the R&D director of genomics at DuPont chemicals manufacturer, who was first to have harnessed CRISPRs to provide immunity for their industrial bacterial strains. The resulting study was published in 2007, and was integrated into Wikipedia that year, a mere two months after going online. In this edit the text changed from “it is proposed that these spacers … protect the cell from infection” to “it was proposed, and more recently demonstrated , that these [can…] help protect the cell from infection” (bold added), ( Fig 5E ).

Only after this experimental demonstration were three landmark, yet in silico , papers from 2005 added to the article. These three studies which computationally supported the bacterial immune system hypotheses were added with a relatively large latency: Pourcel et al., 2005 was added two years after its publication, while Mojica et al., and Bolotin et al., were added only in 2011—six years after publication. By this time, the text and the early references, as well as CRISPR’s function in bacterial immunity, now backed by experimental evidence, were all inserted into the article’s lead section, too.

In sum, these quantitative shifts in bibliometrics, we found, were the result of textual changes in the article. This links together our different forms of analyses: the bibliometrics are linked to the historical shifts in the text which together reflected changes in the scientific field itself.

Quantitative comparison between fields on Wikipedia

We next aimed to examine whether the aforementioned methodology can provide insight into other scientific fields on Wikipedia. Therefore, we deployed our automated tool on two additional terms, “Circadian” (as in circadian clocks) and “Coronavirus”, which we have studied in different manners in earlier works [ 12 , 14 ] and thus serve as control groups to some degree. We hence created three corpuses side by side, at roughly the same time—June/July 2022, and demonstrated how some of the quantitative analyses described above can be utilized to create comparable yet distinct findings regarding different fields.

As we observed for the CRISPR field, a substantial number of articles can be easily identified and selected to be part of a research corpus—with 51, 138, and 306 articles for “CRISPR”, “Circadian”, and “Coronavirus”, respectively ( Fig 6 , S4 and S5 Tables). While varying in size, all corpuses are within a range that allows for reading and examination of their titles. Such examination validated that they indeed provide a diverse assortment of articles of different types that are relevant to each field—for example, articles for scientists alongside those for scientific terms or events. For example, the corpus for “Circadian” yielded the articles “Circadian rhythms” and “Sleep”, and the corpus for “Coronavirus” yielded articles both about the pandemic like “COVID-19 pandemic in Japan” and more generally for “Virus”.

An external file that holds a picture, illustration, etc.
Object name is pone.0290827.g006.jpg

Corpuses were generated and quantitative metrics automatically collected in June-July 2022, for the terms “CRISPR”, “Circadian” and “Coronavirus”. The following data are presented: A) the number of articles opened each year, B) the top 10 most cited journals, C) the top 10 most cited.org websites, D) the top 10 most cited references altogether, E) SciScore distribution, along with the total (sum of all references in all articles) and median scores of the articles’ distribution.

After an initial corpus creation, the first automated analysis generates a timeline based on each articles’ DOB. A side-by-side view of all three corpus timelines ( Fig 6A ) illustrates how different fields display different modes of growth. For example, the “Coronavirus” timeline reveals a clear divide between scientific articles like “Pandemic” (2001) and “Spike protein” (2006), created early on in Wikipedia’s history, and post-pandemic articles like “Wuhan Institute of Virology” (2020). This timeline clearly shows how, with the outbreak of the pandemic, articles about the virus ballooned, but also how these were supported by a network of preexisting articles [ 12 ]. Meanwhile, the “Circadian” timeline exhibits a seemingly random distribution of article creation, with anchor articles (“Circadian Clock” and “Circadian Rhythms”), and auxiliary articles opening regularly over time. Some DOBs appear to tell a compelling scientific story—e.g., Paul Hardin, first author of the landmark paper highlighted in the 2017 Nobel declaration [ 44 ], received an article in 2017—but these seem anecdotal. Interestingly, the biannual peaks are likely a product of American chronobiologist Eric Herzog’s university course [ 45 ], which has students contribute to articles of their choice linked to the field. This DOB pattern or lack thereof can be explained by the fact that unlike the timeliness of CRISPR or coronavirus, circadian clocks is a more mature field. As such, its growth, as our previous work has shown, is reflected in a more subtle manner on Wikipedia, with a paradigmatic shift in the field being documented in minute nuanced textual detail [ 14 ]. Broadly, this suggests that article creation time is perhaps more applicable for contemporary and what can be termed “active” or even “emerging” fields.

One similarity between all three timelines is an increase in article creation centered around 2005–7, a period which has been shown to have held a massive surge in article creation in Wikipedia in general [ 46 ].

Our tool also supports automated scraping of bibliometric data. This analysis showed that the top ten journal references in all three corpuses were dominated with high impact-factor academic peer-reviewed publications ( Fig 6B ). Alongside prestigious scientific publications like Nature or PNAS, we can observe how each corpus refers to field-specific publications: For example, the Journal of Biological Rhythms in the circadian list, Nature Biotechnology for CRISPR, or The Journal of Virology for coronavirus.

Non-academic references (i.e., websites) were also quite field-specific. As researchers from both the circadian clocks and CRISPR fields were awarded a Nobel Prize, the website for the prestigious award was among the most cited in the respective corpuses ( Fig 6C ). In addition, the Sleep Foundation website was highly cited in the circadian corpus while three genome focused websites were highly cited in the CRISPR corpus. The International Committee on Taxonomy of Viruses (ICTV), which appears in Wikipedia articles for different variants, was among the top 10.org sites cited in the coronavirus corpus.

In general, we observed that the CRISPR and circadian corpuses relied more on scientific literature, while “coronavirus” referenced mostly.com sources ( Fig 6D ), which is also reflected in the different corpuses’ SciScore ( Fig 6E ). It appears the more prominent a scientific field is societally, the lower its SciScore: for example, the non-scientifically focused CRISPR-corpus article about designer babies which had a relatively low score, as did the circadian-corpus article of “Start school later movement.” Meanwhile, the more clearly scientifically focused articles “Surveyor nuclease assay” and “CSNK1D” had high scores. The patterns of SciScore distribution show how different fields manifest differently and that comparing them can shed light, for example, on how much public, as opposed to purely scientific interest, a field has online.

In summary, these analyses show how the same research tools and methods yield very different results for different research fields, all of which can facilitate the initial steps needed towards the creation of future case studies into how scientific knowledge is represented on Wikipedia over time.

Here, we delved into Wikipedia’s archives to examine the way a prominent scientific field, CRISPR, was represented from the site’s launch in January 2001 until 2022. By reviewing the CRISPR article’s history, we saw that the article started off describing the “basic science” behind CRISPR, and was updated in the wake of the publication of canonical works in the field. Over time, the article grew, and with the emergence of gene editing technology it forked off into a number of affiliated articles with a more narrow focus, while the original CRISPR article offered a consolidated overview of the scientific narrative of CRISPR in bacterial systems. The article’s text and its different citations served as a rich record of the growth of academic knowledge, the legal battles CRISPR sparked and the academic credit wars over what the journal Science called the “CRISPR Craze” [ 47 ], as well as the popular interest in the field.

This case study allowed us to flash out some essential metrics which can be used to conduct similar research, and we thus propose a method that can be deployed in the service of researching the history of contemporary science on other topics using Wikipedia. Automated tools were developed and are openly supplied to support this research permit work on additional topics, though combining these with manual and semantic work are key to contextualizing findings and interpreting them to provide substantial historical insight.

Using Wikipedia for the history of science

Our findings join a small yet growing body of research dedicated to using Wikipedia for historical purposes. Previously, we analyzed the growth of two Wikipedia articles dedicated to the circadian clock field through their edit histories (“Circadian clocks” and “Circadian rhythms”), using them to ask whether the article’s text reflected changes taking place in understanding how biological clocks work [ 14 ]. Within that more focused case-study we observed the importance of following the academic references, and developed the Latency metric. Meanwhile, our study on COVID-19 used large-scale quantitative bibliometrics to understand how the pandemic affected large swathes of articles during its “first wave”, putting forward metrics such as the SciScore to qualify hundreds of articles based on their reference list [ 12 ]. Together, these underscore the key role academic sources play on Wikipedia and serve as a wider proof-of-concept for the quantitative and qualitative underpinnings of this present study.

Wyatt suggested in a theoretical paper that Wikipedia could be used as a primary source in historical research [ 48 ]. From the edit history of articles, to metadata for traffic and even talk pages, he envisaged treating the open-source encyclopedia as an “endless palimpsest”. This is an idea that has also previously (2010) been expressed as an artwork: “The Iraq War: A Historiography of Wikipedia Changelogs” by artist James Bridle was 12-volume a book comprising all the versions of the article dedicated to the war in Iraq, with the online edit wars serving as a proxy for the real-world conflict. The aforementioned study on the Egyptian protest movement attributed historical significance to the addition of the word “revolution” to Wikipedia articles’ titles, taken to be reflective of the real revolution playing out in the streets [ 15 ]. This is a captivating demonstration showing the value of attributing historical significance to semantic shifts in Wikipedia articles, in line with our usage of sections and titles.

From the perspective of digital humanities and big data, an algorithmic approach was previously deployed to mine the text of tens of thousands of Wikipedia articles to try to map the history of knowledge since the dawn of human history, using network science and semantic analysis to “put the ideas of Kuhn to the test”. The study, currently a preprint [ 49 ], makes interesting findings, while highlighting the lack of a unification in methods in current Wikipedia-based historical research. To our knowledge, no academic demonstration nor a clear method has previously been put forward as to how researchers can actually use Wikipedia to utilize its historiographic potential to serve as this “endless palimpsest”.

Numerous studies have examined Wikipedia and bibliometrics [ 23 ], even those that focus on science [ 8 , 50 ]; but none that clearly link scientometrics to historical methods [ 17 ]. Others from the more humanistic side of academia have worked to connect the digital arena to contemporary fields like discourse analysis, based on the works of Michele Foucault [ 51 ]. However, these too are all theoretical works and as of yet no programmatic paper has outlined how Wikipedia can be actually used for historical research - especially not in the interest of following shifts in contemporary science.

Mapping out additional fields through our suggested methods can eventually support theories and models of scientific growth in a resolution never before possible. An initial method for selecting such future case studies could be to focus on the topics selected by Science and others as “Breakthrough of the Year”—these and their relevant Wikipedia articles are documented in a special list on Wikipedia [ 52 ] that could serve as the origin of many corpuses. Scientific developments that have garnered public interest over the past two decades, from the human genome project to Alpha Fold, could also serve as lucrative case studies, each providing a unique and rich dataset of text and information that could then be compared.

The advantages of Wikipedia

Wikipedia easily lends itself to research of this type. A digital and open website that is easily searchable, it also allows open use of its API for more complex queries and even provides a full dump of the entirety of Wikipedia in each language, including articles’ full edit history.

Importantly, English Wikipedia’s fixed article structure and uniform style allows comparable historical work across different fields, primarily since all articles are structured in a similar way: a lead text, table of contents, sections and then a reference list. This feature, in combination with the convenience of the “View history” function, facilitates in-depth analysis of the same line or section over time (for example, as was done here for “Patents”) in a manner difficult to imagine for comparing texts of academic literature. Moreover, cross-analyses of different subjects can yield results comparable through standardized metrics, like the DOB timelines, and the Latency or SciScore. The structural similarity creates a sort of internal control that lays the groundwork for a rigid research system that can be utilized by others and applied to additional fields.

Past versions that did not survive Wikipedia’s mob review process or that included facts that were considered true at the time but have since been rendered obsolete prove especially interesting from the perspective of the history of science. For example, with CRISPR, a December 2005 version of the article described Cas1 as the “most important” of the Cas genes, and one that is “present in almost every CRISPR/Cas system.” This was more cautiously reworded in July 2010 so that, “The most important of the Cas proteins appears to be Cas1, which is ubiquitous” in CRISPR systems. In March 2011, Cas1 ’s ubiquity was no longer said to be linked to its importance, and for the past decade the article has made due with noting in a subsection dedicated to CRISPR locus that “[m]ost CRISPR-Cas systems have a Cas1 protein.” These changes were the result of new knowledge forcing a reevaluation of the preexisting scientific narrative regarding CRISPR: Cas1 was not falsified per se, rather its importance in CRISPR’s story was reassessed. Another example from the CRISPR article can be seen in the shift in section title from “Potential Applications” to “Applications” regarding gene editing ( Fig 3D ). These are examples of what can be termed “negative” knowledge—knowledge whose relevance was negated by new “positive” discoveries that outweighed it in significance. However, as such, its degradation of scientific status in CRISPR’s narrative has much value from the historical perspective. Wikipedia, we suggest, is an inclusive media that documents both positive and negative knowledge—the accumulation and the rejection of scientific facts through its edit history.

Moreover, unlike social media websites that collect user data for financial reasons, posing a privacy threat and creating ethical dilemmas for researchers, Wikipedia collects no such information as it has no such business model. This makes it not only attractive to volunteers willing to donate hours to writing and editing the site, but also makes Wikipedia and its data ideal material for social research. Wikipedia’s texts are not single-handedly written and are edited collectively in a form of what is termed peer-production [ 53 ]. Though this system is not without its flaws, in the context of the contemporary history of science it proves a valuable resource: documenting the consensus regarding certain facts and fields’ growth in real-time and in potentially minute details.

Additional advantages that Wikipedia offers in respect to bibliometrics are numerous and deserve their own section.

Wikipedic bibliometrics

Various studies analyzing Wikipedia’s references, even those that focus on science [ 8 , 23 , 50 ]; exemplify the use of Wikipedia for bibliometric research, and to a degree support the view that Wikipedia is much more inclusive than academic publications, making use of non-academic sources usually excluded from scientific papers. Here we implicitly study this using our SciScore, and contextualize its trends through the historical thick description. On “CRISPR”, for example, legal sources or popular media were added to support the “patent war”, which was also expressed in a drop in the article’s SciScore. The expansion and then contraction of the “Patents” section ( S3 Table ), in tandem to the patent wars and their resolution in the courts, show how this historical inclusivity touches to both the text and to the sources.

The SciScore reveals a different historical perspective when comparing the CRISPR and Coronavirus corpuses. We previously discovered a decrease in the SciScore as the pandemic grew to public prominence and more articles about it were opened [ 12 ]. This was because many of the new articles opened post-pandemic depicted its social ramifications and outcomes, while the pre-pandemic articles focused on the science behind the virus. In the CRISPR anchor article, the SciScore revealed a completely different process: As CRISPR began as a purely scientific discovery, the decrease in SciScore (~2013–2018, Fig 5A ) was found to be the result of its growing public prominence outside scientific circles and the appearance of the first non-academic sources about the looming “CRISPR Craze” [ 47 ], followed by the much-publicized patent and credit wars, and finally the wider social, ethical and policy debates it sparked—backed by popular yet respectable sources.

Our latency analyses revealed that CRISPR, a nascent field, was making use of extremely up-to-date papers, in some cases references were added within days of their publication. Meanwhile, the circadian clock article had a median latency of five years [ 14 ]. This coincides with the respective histories of the fields: CRISPR is a high profile and emerging field, with advances being mirror almost instantaneously on Wikipedia. On the other hand, clocks, which is a more mature field that has been around for decades, was found to be based on contemporary but also older research which predated Wikipedia. Meanwhile, Coronavirus had a major 17-year peak in latency, exactly in line with the SARS pandemic of 2003; showing how research from a preceding viral pandemic provided the backbone of the sourcing for the 2020 pandemic [ 12 ]. Together these show how the character of each field is reflected in its bibliometrics.

One hypothesis regarding the potential of the SciScore and Latency is that this dynamic may also be taking place in other articles that began as purely scientific but are increasingly taking on social significance. Tracking articles that have short latencies and high SciScore which then begin to decrease could serve as a method for identifying new fields only now starting to make waves in terms of public interest. In light of emerging attempts to harness Wikipedia for trend detection [ 54 , 55 ], this idea remains to be examined as more case studies will be created in the future.

Using Wikipedia bibliometrics also has value from the scientometric perspective. Measuring the impact of scientific research is a well-established field that has in recent years expanded the metrics it works with—no longer just impact factor and citation counting, but also more inclusive metrics like altmetrics. In this sense, Wikipedia, too, can prove a valuable addition in the form of alternative metrics. Asking which papers are cited on Wikipedia and in which context, may provide insight into what parts of academic research are actually reaching the public [ 7 , 56 , 57 ]. As such, our tools can join and enrich existing studies on the history of contemporary science, augmenting work in the field of bibliometrics or even altmetrics, with Wikipedia.

Limitations

For all its benefits, this method also has its shortcomings. To begin with, corpus delineation can exclude possibly valuable articles—for example, the article for George Church was absent from our corpus despite his seemingly important role in the history of CRISPR. Being a prominent scientist with a broad scope, his Wikipedia article is devoid of the term in any section title, and succinctly mentions his contribution to the CRISPR field under a section titled “Synthetic biology and genome engineering” that despite its topic does not use the key term itself.

From a scientometric perspective, Wikipedia also poses some unique problems: Unlike bibliometric datasets created especially for such purposes, Wikipedia’s footnotes are not all properly formatted and issues with their templates exist that make scrapping them consistently hard [ 58 ], especially with older articles. Initially, all footnotes on Wikipedia were added manually by editors working directly in wiki-code, the HTML markup language the website uses. Over time, bots and tools were put into place to help this menial task and unify footnotes formatting; in some cases, older articles with older footnotes that did not benefit from this unified new formatting will not be scrapped properly if one uses only Wikipedia’s native bibliometric data. To overcome this issue in the present study, we scraped the references from the articles as simple text, regardless of how they were formatted by Wikipedia’s volunteer editors. This list of references was then analyzed in search of DOIs/PMIDs/PMCs which were taken as a proxy for academic publications. Nonetheless, other issues exist, for example duplicate DOIs or DOIs included in article’s texts and not just as footnotes. A manual validation of our method in random articles revealed this approach had a margin of error that was lower than 5 percent.

Moreover, our method also does not yet address all of Wikipedia’s content: Firstly, we only examined English Wikipedia. While it is the largest Wikipedia, and most if not all scientific papers are published in English, language asymmetry has been previously reported across different topics [ 59 , 60 ]. This is but one of many biases Wikipedia suffers from and including other language editions in future work may reveal different perspectives and richer narratives that are absent from our methods and findings. Even within the English Wikipedia, the talk page, a key arena that is rich in textual data, was not systematically included in this study, though debates about the patent war were found, and these included discussions of which type of sources (legal as opposed to scientific) should be cited on the article in this context.

Another yet untapped facet of Wikipedia touches on visual elements. Wikipedia’s sister project, WikiCommons, supports multimedia, usually in the form of copyright-free images, and in this respect we also saw a growth: The first infographic explaining the CRISPR system was introduced to the article in 2009 and the file itself was updated in 2010 to show a more complex understanding of the “CRISPR prokaryotic antiviral defense mechanism”, supported by a then-newly published review article [ 61 ]. Over time, additional more complex images were added to the article, for example those showing how CRISPR interference could be used for gene editing ( S3 Fig ). This multimedia aspect can serve in the future as another vector for like-minded research, for example by focusing on how infographics and scientific illustrations document growth of scientific knowledge overtime in visual terms.

We hope our proposed method will encourage use of Wikipedia’s ever-changing text as a rich historical source to augment existing work being done in the history of science and contribute to our understanding of the growth of scientific knowledge and its transference to the general public.

Supporting information

The articles included in the corpus, sorted by number of references, size in kilobytes (kB) and number of edits. “CRISPR”, highlighted, was among the top 5 articles of each category.

A) The corpus’ articles SciScore distribution. B) Peer-reviewed journals cited as references in the article as of June 2022, sorted by the number of references per publication. C) A list of the top cited journals (from B) with ≥5 appearances.

Shown are a selection of screen grabs from the CRISPR article, reflecting the evolution of Wikicommons graphics of CRISPR’s mechanism of action and key players. These are of different versions of the same illustration (A and B) and of a third illustration added later to the article.

The output of the automated tool for the term "CRISPR", as of 2022-06-27.

The "CRISPR" article’s table of content was examined at the indicated dates. The number of sections/subsections were counted and appear at the top of each column.

The "CRISPR" article’s "Patent" section is displayed for the indicated dates. Separation into different rows in the table were manually done for visibility.

The output of the automated tool for the term "circadian", as of 2022-07-14.

The output of the automated tool for the term "coronavirus", as of 2022-07-14.

Acknowledgments

We want to thank Dusan Misevic, Bastian Greshake Tzovaras, Marc Santolini, Mad Price Ball, Alex Webb, Gal Manella and all those who provided feedback.

Code accessibility

Our code for the corpus builder can be found at: https://github.com/RonaTheBrave/WikiCorpusBuilder .

Funding Statement

Thanks to the Bettencourt Schueller Foundation long term partnership, this work was partly supported by the LPI Research Fellowship, Université de Paris, INSERM U1284, to RAv and OB. RAv’s work was supported in part at the Technion by a fellowship of "The Israel Academy of Science and Humanities”. In either case, the funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Data Availability

  • PLoS One. 2023; 18(9): e0290827.

Decision Letter 0

18 Jul 2023

PONE-D-23-06451Wikipedia as a tool for contemporary history of science: A case study on CRISPRPLOS ONE

Dear Dr. aviram,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript by Sep 01 2023 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at  gro.solp@enosolp . When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.
  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.
  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols . Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols .

We look forward to receiving your revised manuscript.

Kind regards,

Claire Seungeun Lee

Academic Editor

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. Thank you for stating the following financial disclosure:

“Thanks to the Bettencourt Schueller Foundation long term partnership, this work was partly supported by the LPI Research Fellowship, Université de Paris, INSERM U1284, to RAv and OB.”

Please state what role the funders took in the study.  If the funders had no role, please state: "The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript."

If this statement is not correct you must amend it as needed.

Please include this amended Role of Funder statement in your cover letter; we will change the online submission form on your behalf.

3. Please include captions for your Supporting Information files at the end of your manuscript, and update any in-text citations to match accordingly. Please see our Supporting Information guidelines for more information: http://journals.plos.org/plosone/s/supporting-information .

4. Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

2. Has the statistical analysis been performed appropriately and rigorously?

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: This is an interesting paper, with a solid, mixed-method based approach.

For some reason, the references are tangled up (each is in its own brackets, instead of integrated) - I'm assuming your reference manager software got a hiccup? I suggest fixing it in your next revision.

Other than this minor comment, I don't really have much to suggest. One small thing, perhaps: very often people write about Wikipedia as if it was one entity, but there are about 300 different language editions. In fact, if you check studies on cultural diversity of quality of information on different Wikipedias, it turns out that the standards for knowledge quality, presentation, as well as procedures differs significantly. For that reason it may be useful to add a sentence or two mentioning this caveat.

6. PLOS authors have the option to publish the peer review history of their article ( what does this mean? ). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy .

Reviewer #1: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool,  https://pacev2.apexcovantage.com/ . PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at  gro.solp@serugif . Please note that Supporting Information files do not need this step.

Author response to Decision Letter 0

26 Jul 2023

Response to reviewers

We humbly resubmit for publication our manuscript, entitled “Wikipedia as a tool for

contemporary history of science: A case study on CRISPR”.

Please see our point by points response:

Reviewer #1:

This is an interesting paper, with a solid, mixed-method based approach.

� We apologize for this bug and have corrected the above-mentioned references, as well as reformatted the entire reference section based on PLOS guidelines.

� We thank the reviewer for this suggestion, and have amended our discussion with respect to this caveat. The revised text appears in Track Changes, and includes the following lines (and references below):

“Moreover, our method also does not yet address all of Wikipedia’s content: Firstly, we only examined English Wikipedia. While it is the largest Wikipedia, and most if not all scientific papers are published in English, language asymmetry has been previously reported across different topics [59,60]. This is but one of many biases Wikipedia suffers from and including other language editions in future work may reveal different perspectives and richer narratives that are absent from our methods and findings.”

Respectfully yours,

Dr. Rona Aviram and the team

New references:

[59] Roy D, Bhatia S, Jain P. A Topic-Aligned Multilingual Corpus of Wikipedia Articles for Studying Information Asymmetry in Low Resource Languages. Proc. Twelfth Lang. Resour. Eval. Conf., Marseille, France: European Language Resources Association; 2020, p. 2373–80.

[60] Lewoniewski W, Węcel K, Abramowicz W. Quality and Importance of Wikipedia Articles in Different Languages. In: Dregvaite G, Damasevicius R, editors. Inf. Softw. Technol., vol. 639, Cham: Springer International Publishing; 2016, p. 613–24. https://doi.org/10.1007/978-3-319-46254-7_50 .

Decision Letter 1

17 Aug 2023

PONE-D-23-06451R1

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/ , click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at gro.solp@gnillibrohtua .

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact gro.solp@sserpeno .

Additional Editor Comments (optional):

Please make sure all the suggested changes in your final manuscript.

Acceptance letter

22 Aug 2023

Dear Dr. Aviram:

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact gro.solp@sserpeno .

If we can help with anything else, please email us at gro.solp@enosolp .

Thank you for submitting your work to PLOS ONE and supporting open access.

PLOS ONE Editorial Office Staff

on behalf of

Dr. Claire Seungeun Lee

Education/Case Studies

Attention the community programs team needs new case studies submit your case study and get featured in a new brochure. [[<tvar|lm>case_studies</tvar>| click here to learn more]]..

case study wikipedia

Since Wikipedia began in 2001, educators around the world have integrated the free encyclopedia that anyone can edit into their curriculum. In 2010, the Wikimedia Foundation started the Wikipedia Education Program to provide more support for educators who are interested in using Wikipedia as a teaching tool.

In the following pages, educators will explain Wikipedia assignments they’ve used to meet learning objectives for their courses. They will also explain how they assessed or graded these assignments. These case studies can help you form a plan for how you can use Wikipedia as a teaching tool in your class.

Published case studies [ edit ]

  • Saul Hoffmann, Wikipedia in Teaching: Improving Autonomy in Research, Critical Sense, and Collaborative Abilities in Students, Making Them Contribute to the Free Encyclopedia , Università Ca' Foscari Venezia, 2016: «students enjoyed the assignment and found it very interesting and useful for their future academic careers [...] a beneficial activity for both improving specific subject learning and Wikipedia literacy».

Case Studies: Assignments [ edit ]

  • Copyediting , Adrianne Wadewitz, United States
  • Definitions , James M. Lipuma and Davida Scharf, United States
  • Write an article , Juliana Bastos Marques, Brazil
  • Write a Featured Article , Jon Beasley-Murray, Canada
  • Translation (Spanish) , Leigh Thelmadatter, Mexico
  • Translation (Arabic) , Dalia Mohamed El Toukhy, Egypt
  • Photos , Jiří Reif, Czech Republic
  • Illustrations , Bruce Sharky, United States
  • Videos , Jennifer Geigel Mikulay, United States
  • Extend a Stub , Edis Kittrell, United States
  • Research and edit , Lila Pagola y Cristina Siragusa, Argentina
  • Creating local content , Lila Pagola y Pablo Género, Argentina
  • Internship , Edward Galloway and William Daw, United States
  • Writing DYK/Good Articles , Piotr Konieczny, Korea

Case Studies: Grading [ edit ]

  • Content , Anne McNeil, United States
  • Milestones , Robert Cummings, United States
  • Reflective papers , Michael Mandiberg, United States
  • Five criteria , Rochelle Davis, United States
  • Peer reviews , Shamira Gelbman, United States

Write your own case study! Help us learn from your experience! [ edit ]

To get started, fill out the form box below with a descriptive title. Once you have chosen that title, click the "Draft a case study!" button, and you will be taken to a page where you can start drafting your case study.

Current Drafts [ edit ]

( Purge the cache of this page in case a recent draft does not yet appear here)

No pages meet these criteria.

External links [ edit ]

  • https://wikiedu.org/blog/category/testimonials/

case study wikipedia

  • Pages using DynamicPageList
  • Education case studies
  • Wikipedia Education Program

Navigation menu

  • EXPLORE Random Article

How to Do a Case Study

Last Updated: March 22, 2024 Fact Checked

This article was co-authored by Sarah Evans . Sarah Evans is a Public Relations & Social Media Expert based in Las Vegas, Nevada. With over 14 years of industry experience, Sarah is the Founder & CEO of Sevans PR. Her team offers strategic communications services to help clients across industries including tech, finance, medical, real estate, law, and startups. The agency is renowned for its development of the "reputation+" methodology, a data-driven and AI-powered approach designed to elevate brand credibility, trust, awareness, and authority in a competitive marketplace. Sarah’s thought leadership has led to regular appearances on The Doctors TV show, CBS Las Vegas Now, and as an Adobe influencer. She is a respected contributor at Entrepreneur magazine, Hackernoon, Grit Daily, and KLAS Las Vegas. Sarah has been featured in PR Daily and PR Newswire and is a member of the Forbes Agency Council. She received her B.A. in Communications and Public Relations from Millikin University. There are 9 references cited in this article, which can be found at the bottom of the page. This article has been fact-checked, ensuring the accuracy of any cited facts and confirming the authority of its sources. This article has been viewed 749,877 times.

Many fields require their own form of case study, but they are most widely used in academic and business contexts. An academic case study focuses on an individual or a small group, producing a detailed but non-generalized report based on months of research. In the business world, marketing case studies describe a success story presented to promote a company.

Planning an Academic Case Study

Step 1 Define the subject of study.

  • For example, a medical case study might study how a single patient is affected by an injury. A psychology case study might study a small group of people in an experimental form of therapy.
  • Case studies are not designed for large group studies or statistical analysis.

Step 2 Decide between prospective and retrospective research.

  • A case study may or may not include both types of research.

Step 3 Narrow down your research goal.

  • Illustrative case studies describe an unfamiliar situation in order to help people understand it. For instance, a case study of a person with depression, designed to help communicate the subjective experience of depression to therapist trainees.
  • Exploratory case studies are preliminary projects to help guide a future, larger-scale project. They aim to identify research questions and possible research approaches. For example, a case study of three school tutoring programs would describe the pros and cons of each approach, and give tentative recommendations on how a new tutoring program could be organized.
  • Critical instance case studies focus on a unique cases, without a generalized purpose. Examples include a descriptive study of a patient with a rare condition, or a study of a specific case to determine whether a broadly applied "universal" theory is actually applicable or useful in all cases.

Step 4 Apply for ethical approval.

  • Follow this step even if you are conducting a retrospective case study. In some cases, publishing a new interpretation can cause harm to the participants in the original study.

Step 5 Plan for a long-term study.

  • Create four or five bullet points that you intend to answer, if possible, in the study. Consider perspectives on approaching the question and the related bullet points.
  • Choose at least two, and preferably more, of these data sources: report collection, internet research, library research, interviewing research subjects, interviewing experts, other fieldwork, and mapping concepts or typologies.
  • Design interview questions that will lead to in-depth answers and continued conversations related to your research goals.

Step 7 Recruit participants if necessary.

  • Since you aren't conducting a statistical analysis, you do not need to recruit a diverse cross-section of society. You should be aware of any biases in your small sample, and make them clear in your report, but they do not invalidate your research.

Conducting Academic Case Study Research

Step 1 Perform background research.

  • Any case study, but especially case studies with a retrospective component, will benefit from basic academic research strategies.

Step 2 Learn how to conduct obtrusive observation.

  • Establishing trust with participants can result in less inhibited behavior. Observing people in their home, workplaces, or other "natural" environments may be more effective than bringing them to a laboratory or office.
  • Having subjects fill out a questionnaire is a common example of obtrusive research. Subjects know they are being studied, so their behavior will change, but this is a quick and sometimes the only way to gain certain information.

Step 3 Take notes.

  • Describe experience — ask the participant what it's like to go through the experience you're studying, or be a part of the system you're studying.
  • Describe meaning — ask the participant what the experience means to them, or what "life lessons" they take from it. Ask what mental and emotional associations they have with the subject of your study, whether it's a medical condition, an event, or another topic.
  • Focus – in later interviews, prepare questions that fill gaps in your knowledge, or that are particularly relevant to the development of your research questions and theories over the course of the study.

Step 5 Stay rigorous.

  • If you are working with more than one person you will want to assign sections for completion together to make sure your case study will flow. For example, one person may be in charge of making charts of the data you gathered, while other people will each write an analysis of one of your bullet points you are trying to answer.

Step 7 Write your final case study report.

  • If writing a case study for a non-academic audience, consider using a narrative form, describing the events that occurred during your case study in chronological order. Minimize your use of jargon.

Writing a Marketing Case Study

Step 1 Ask permission from a client.

  • Request high-level involvement from the client's side for best results. [4] X Trustworthy Source Centers for Disease Control and Prevention Main public health institute for the US, run by the Dept. of Health and Human Services Go to source Even if the client only wants to vet the materials you send them, make sure the person involved is high up in the organization, and knowledgeable about the company–client relationship.

Step 2 Outline the story.

  • Collaboration with the client is especially helpful here, so you make sure to include the points that left the most impact and biggest impression.
  • If your target audience wouldn't immediately identify with your client's problem, start with a more general intro describing that type of problem in the industry. [6] X Research source

Step 3 Keep the study readable and powerful.

  • Charts and graphs can be great visual tools, but label these with large letters that make the positive meaning obvious to people who aren't used to reading raw data. [8] X Research source

Step 5 Solicit quotes or write them yourself.

  • These are typically brief quotes just one or two sentences long, describing your service in a positive light.

Step 6 Add images.

Expert Q&A

Sarah Evans

  • Remember that a case study does not aim to answer the research question definitively. Its aim is to develop one or more hypotheses about the answer. Thanks Helpful 0 Not Helpful 0
  • Other fields use the term "case study" to mean a short, less intense process. Most notably, in law and programming, a case study is a real or hypothetical situation (legal case or programming problem), accompanied by an oral or written discussion of possible conclusions or solutions. Thanks Helpful 0 Not Helpful 0

You Might Also Like

Become Taller Naturally

Expert Interview

case study wikipedia

Thanks for reading our article! If you’d like to learn more about business writing, check out our in-depth interview with Sarah Evans .

  • ↑ https://scholarworks.umass.edu/cgi/viewcontent.cgi?article=1001&context=nursing_faculty_pubs
  • ↑ https://www.apa.org/monitor/jan03/principles.html
  • ↑ https://www.cdc.gov/nceh/clearwriting/docs/Case_Study_Guide-H.pdf
  • ↑ https://libguides.usc.edu/writingguide/outline
  • ↑ https://extension.usu.edu/apec/files/uploads/Target_Market_Identification.pdf
  • ↑ https://files.eric.ed.gov/fulltext/EJ1105535.pdf
  • ↑ https://files.eric.ed.gov/fulltext/EJ1079541.pdf
  • ↑ https://med.ucf.edu/media/2012/05/Writing-Letters-of-Recommendation.pdf
  • ↑ https://docs.lib.purdue.edu/cgi/viewcontent.cgi?article=1012&context=cgttheses

About this article

Sarah Evans

To do a case study, start by defining the subject and goal of your study and then getting ethical approval from the institution or department you're working under. Once you've received approval, design your research strategy and recruit any participants you'll be using. Prepare to work on your case study for 3-6 months by scheduling routine interviews with participants and setting aside time each day to do research and take notes. When you're finished, compile all of your research and write your final case study report. To learn how to do a marketing case study, scroll down! Did this summary help you? Yes No

Reader Success Stories

Anonymous

Dec 27, 2017

Did this article help you?

Rosalinda Tantiado

Rosalinda Tantiado

Dec 15, 2019

K. T.

Aug 13, 2017

Galal M. Salim Lardhi

Galal M. Salim Lardhi

Nov 15, 2016

Anonymous

Feb 1, 2017

Become Taller Naturally

  • About wikiHow
  • Terms of Use
  • Privacy Policy
  • Do Not Sell or Share My Info
  • Not Selling Info

Book cover

International Conference on Collaboration Technologies and Social Computing

CollabTech 2023: Collaboration Technologies and Social Computing pp 84–100 Cite as

Fairness in Socio-Technical Systems: A Case Study of Wikipedia

  • Mir Saeed Damadi 13 &
  • Alan Davoust 13  
  • Conference paper
  • First Online: 22 August 2023

248 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14199))

Wikipedia content is produced by a complex socio-technical systems (STS), and exhibits numerous biases, such as gender and cultural biases. We investigate how these biases relate to the concepts of algorithmic bias and fairness defined in the context of algorithmic systems. We systematically review 75 papers describing different types of bias in Wikipedia, which we classify and relate to established notions of harm and normative expectations of fairness as defined for machine learning-driven algorithmic systems. In addition, by analysing causal relationships between the observed phenomena, we demonstrate the complexity of the socio-technical processes causing harm.

This is a preview of subscription content, log in via an institution .

Buying options

  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
  • Available as EPUB and PDF
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Adams, J., Brückner, H., Naslund, C.: Who counts as a notable sociologist on wikipedia? gender, race, and the “professor test”. Socius 5 (2019)

Google Scholar  

Alvarez, G., Oeberst, A., Cress, U., Ferrari, L.: Linguistic evidence of in-group bias in English and Spanish Wikipedia articles about international conflicts. Discourse Context Media 35 , 100391 (2020)

Article   Google Scholar  

Beytía, P.: The positioning matters: estimating geographical bias in the multilingual record of biographies on Wikipedia. In: Companion Proceedings of the Web Conference 2020, pp. 806–810 (2020)

Binns, R.: Fairness in machine learning: Lessons from political philosophy. In: Conference on Fairness, Accountability and Transparency, pp. 149–159. PMLR (2018)

Bird, S., et al.: Fairlearn: a toolkit for assessing and improving fairness in AI. Microsoft, Tech. Rep. MSR-TR-2020-32 (2020)

Blodgett, S.L., Barocas, S., Daumé III, H., Wallach, H.: Language (technology) is power: a critical survey of “bias” in NLP. arXiv preprint arXiv:2005.14050 (2020)

Callahan, E.S., Herring, S.C.: Cultural bias in Wikipedia content on famous persons. J. Am. Soc. Inf. Sci. Tech. 62 (10) (2011)

Chen, Z., Zhang, J.M., Hort, M., Sarro, F., Harman, M.: Fairness testing: A comprehensive survey and analysis of trends. arXiv preprint arXiv:2207.10223 (2022)

De Laat, P.B.: The use of software tools and autonomous bots against vandalism: eroding Wikipedia’s moral order? Ethics and Information Technology 17

Gardner, S.: Nine reasons women don’t edit Wikipedia (in their own words). Sue Gardner’s blog (2011)

Garg, S., Perot, V., Limtiaco, N., Taly, A., Chi, E.H., Beutel, A.: Counterfactual fairness in text classification through robustness. In: Proceedings of AI, Ethics and Society Conference, pp. 219–226 (2019)

Geiger, R.S.: The lives of bots. arXiv preprint arXiv:1810.09590 (2018)

Glott, R., Ghosh, R.: Analysis of Wikipedia survey data. Topic: Age Gender Differences 14 , 2014 (2010)

Graham, M., Straumann, R.K., Hogan, B.: Digital divisions of labor and informational magnetism: mapping participation in Wikipedia. Ann. Assoc. Am. Geographers 105 (6), 1158–1178 (2015)

Greenstein, S., Zhu, F.: Do experts or crowd-based models produce more bias? evidence from encyclopedia Britannica and Wikipedia. Mis Quarterly 42 (3) (2018)

Hardt, M., Price, E., Srebro, N.: Equality of opportunity in supervised learning. Advances in neural information processing systems 29 (2016)

Kay, M., Matuszek, C., Munson, S.A.: Unequal representation and gender stereotypes in image search results for occupations. In: Proceedings of CHI (2015)

Kittur, A., Suh, B., Pendleton, B.A., Chi, E.H.: He says, she says: conflict and coordination in Wikipedia. In: Proceedings of ACM CHI (2007)

de Laat, P.B.: Profiling vandalism in Wikipedia: a schauerian approach to justification. Ethics Inf. Technol. 18 , 131–148 (2016)

Livingstone, R.M.: Population automation: An interview with wikipedia bot pioneer ram-man. First Monday (2016)

Massa, P., Scrinzi, F.: Manypedia: comparing language points of view of wikipedia communities. In: International Symposium on Wikis and Open Collaboration, pp. 1–9 (2012)

Mehrabi, N., Morstatter, F., Saxena, N., Lerman, K., Galstyan, A.: A survey on bias and fairness in machine learning. ACM Comput. Surv. 54 (6), 1–35 (2021)

Mesgari, M., Okoli, C., Mehdi, M., Nielsen, F.Å., Lanamäki, A.: “The sum of all human knowledge’’: a systematic review of scholarly research on the content of Wikipedia. J. Assoc. Inf. Sci. Tech. 66 (2), 219–245 (2015)

Mola-Velasco, S.M.: Wikipedia vandalism detection through machine learning: Feature review and new proposals. arXiv preprint arXiv:1210.5560 (2012)

Morris-O’Connor, D.A., Strotmann, A., Zhao, D.: The colonization of Wikipedia: evidence from characteristic editing behaviors of warring camps. J. Documentation (ahead-of-print) (2022)

Niederer, S., Van Dijck, J.: Wisdom of the crowd or technicity of content? Wikipedia as a sociotechnical system. New Media Soc. 12 (8), 1368–1387 (2010)

Nielsen, F.: Wikipedia research and tools: Review and comments. Technical University of Denmark, Working draft (2019)

Okoli, C., Mehdi, M., Mesgari, M., Nielsen, F.Å., Lanamäki, A.: Wikipedia in the eyes of its beholders: a systematic review of scholarly research on Wikipedia readers and readership. J. Assoc. Inf. Sci. Tech. 65 (12) (2014)

Prates, M.O., Avelar, P.H., Lamb, L.C.: Assessing gender bias in machine translation: a case study with google translate. Neural Comput. Appl. 32 , 6363–6381 (2020)

Ratković, N., Madžarević, I.: Women’s participation in Wikipedia: cross-border balkan perspective. Área Abierta 21 (2), 237–253 (2021)

Rawat, C.: An ethical reflection on user privacy and transparency of algorithmic blocking systems in the Wikipedia community

Rogers, R., Sendijarevic, E., et al.: Neutral or national point of view? a comparison of Srebrenica articles across Wikipedia’s language versions (2012)

Shadbolt, N., O’Hara, K., De Roure, D., Hall, W.: The theory and practice of social machines. Springer, Cham (2019)

Book   Google Scholar  

Shelby, R., et al.: Sociotechnical harms: scoping a taxonomy for harm reduction. arXiv preprint arXiv:2210.05791 (2022)

Sun, J., Peng, N.: Men are elected, women are married: events gender bias on wikipedia. arXiv preprint arXiv:2106.01601 (2021)

Sweeney, L.: Discrimination in online ad delivery. Commun. ACM 56 (5), 44–54 (2013)

TeBlunthuis, N., Hill, B.M., Halfaker, A.: Effects of algorithmic flagging on fairness: Quasi-experimental evidence from Wikipedia. In: Proceedings of the ACM on Human-Computer Interaction 5(CSCW1), pp. 1–27 (2021)

Tripodi, F.: Ms. categorized: Gender, notability, and inequality on Wikipedia. New Media & Society, p. 14614448211023772 (2021)

Vincent, N., Hecht, B.: A deeper investigation of the importance of Wikipedia links to search engine results. Proc. ACM Hum.-Comput. Interact. 5(CSCW1), April 2021. https://doi.org/10.1145/3449078

Wagner, C., Garcia, D., Jadidi, M., Strohmaier, M.: It’s a man’s Wikipedia? assessing gender inequality in an online encyclopedia. In: Proceedings of the international AAAI Conference on Web and Social media, vol. 9, pp. 454–463 (2015)

Wikipedia: Wikipedia: Neutral point of view. https://en.wikipedia.org/wiki/Wikipedia:Neutral point of view (2023)

Worku, Z., Bipat, T., McDonald, D.W., Zachry, M.: Exploring systematic bias through article deletions on Wikipedia from a behavioral perspective. In: Proceedings of the 16th International Symposium on Open Collaboration, pp. 1–22 (2020)

Yang, P., Colavizza, G.: Polarization and reliability of news sources in Wikipedia. arXiv preprint arXiv:2210.16065 (2022)

Yanisky-Ravid, S., Mittelman, A.: Gender biases in cyberspace: a two-stage model, the new arena of Wikipedia and other websites. Fordham Intell. Prop. Media Ent. LJ 26 , 381 (2015)

Young, A., Wigdor, A.D., Kane, G.: It’s not what you think: Gender bias in information about Fortune 1000 CEOs on Wikipedia (2016)

Young, A.G., Wigdor, A.D., Kane, G.C.: The gender bias tug-of-war in a co-creation community: core-periphery tension on Wikipedia. J. Manage. Inf. Syst. 37 (4), 1047–1072 (2020)

Zafar, B., Valera, I., Gomez-Rodriguez, M., Gummadi, K.P.: Training fair classifiers. In: International Conference on Artificial Intelligence and Statistics (2017)

Zandpour, M.: The gender gap on Wikipedia (2020)

Zheng, L., Albano, C.M., Vora, N.M., Mai, F., Nickerson, J.V.: The roles bots play in Wikipedia. Proc. ACM Hum.-Comput. Inter. 3 (CSCW), 1–20 (2019)

Download references

Author information

Authors and affiliations.

Université du Québec en Outaouais, Gatineau, Québec, Canada

Mir Saeed Damadi & Alan Davoust

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Mir Saeed Damadi .

Editor information

Editors and affiliations.

Ritsumeikan University, Shiga, Japan

Hideyuki Takada

Kyoto University of Advanced Science, Kyoto, Japan

D. Moritz Marutschke

Universidad de Los Andes, Las Condes, RM - Santiago, Chile

Claudio Alvarez

University of Tsukuba, Tsukuba, Japan

Tomoo Inoue

Ritsumeikan University, Ibaraki, Japan

Yugo Hayashi

Pompeu Fabra University, Barcelona, Spain

Davinia Hernandez-Leo

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Cite this paper.

Damadi, M.S., Davoust, A. (2023). Fairness in Socio-Technical Systems: A Case Study of Wikipedia. In: Takada, H., Marutschke, D.M., Alvarez, C., Inoue, T., Hayashi, Y., Hernandez-Leo, D. (eds) Collaboration Technologies and Social Computing. CollabTech 2023. Lecture Notes in Computer Science, vol 14199. Springer, Cham. https://doi.org/10.1007/978-3-031-42141-9_6

Download citation

DOI : https://doi.org/10.1007/978-3-031-42141-9_6

Published : 22 August 2023

Publisher Name : Springer, Cham

Print ISBN : 978-3-031-42140-2

Online ISBN : 978-3-031-42141-9

eBook Packages : Computer Science Computer Science (R0)

Share this paper

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research
  • PRO Courses Guides New Tech Help Pro Expert Videos About wikiHow Pro Upgrade Sign In
  • EDIT Edit this Article
  • EXPLORE Tech Help Pro About Us Random Article Quizzes Request a New Article Community Dashboard This Or That Game Popular Categories Arts and Entertainment Artwork Books Movies Computers and Electronics Computers Phone Skills Technology Hacks Health Men's Health Mental Health Women's Health Relationships Dating Love Relationship Issues Hobbies and Crafts Crafts Drawing Games Education & Communication Communication Skills Personal Development Studying Personal Care and Style Fashion Hair Care Personal Hygiene Youth Personal Care School Stuff Dating All Categories Arts and Entertainment Finance and Business Home and Garden Relationship Quizzes Cars & Other Vehicles Food and Entertaining Personal Care and Style Sports and Fitness Computers and Electronics Health Pets and Animals Travel Education & Communication Hobbies and Crafts Philosophy and Religion Work World Family Life Holidays and Traditions Relationships Youth
  • Browse Articles
  • Learn Something New
  • Quizzes Hot
  • This Or That Game New
  • Train Your Brain
  • Explore More
  • Support wikiHow
  • About wikiHow
  • Log in / Sign up
  • Education and Communications

How to Write a Case Study

Last Updated: April 1, 2024 Approved

This article was co-authored by Annaliese Dunne . Annaliese Dunne is a Middle School English Teacher. With over 10 years of teaching experience, her areas of expertise include writing and grammar instruction, as well as teaching reading comprehension. She is also an experienced freelance writer. She received her Bachelor's degree in English. wikiHow marks an article as reader-approved once it receives enough positive feedback. In this case, 82% of readers who voted found the article helpful, earning it our reader-approved status. This article has been viewed 580,525 times.

There are many different kinds of case studies. There are also various uses for writing case studies, from academic research purposes to provision of corporate proof points. There are approximately four types of case studies: illustrative (descriptive of events), exploratory (investigative), cumulative (collective information comparisons) and critical (examine particular subject with cause and effect outcomes). After becoming familiar with the different types and styles of case study instructions and how each applies to your purposes, there are some steps that make writing flow smoothly and ensure the development and delivery of a uniform case study that can be used to prove a point or illustrate accomplishments.

Getting Started

Step 1 Determine which case study type, design or style is most suitable to your intended audience.

  • Whatever case study method you're employing, your purpose is to thoroughly analyze a situation (or "case") which could reveal factors or information otherwise ignored or unknown. These can be written about companies, whole countries, or even individuals. What's more, these can be written on more abstract things, like programs or practices. Really, if you can dream it, you can write a case study about it. [1] X Research source

Step 2 Determine the topic of your case study.

  • Start your research at the library and/or on the Internet to begin delving into a specific problem. Once you've narrowed down your search to a specific problem, find as much about it as you can from a variety of different sources. Look up information in books, journals, DVDs, websites, magazines, newspapers, etc. As you go through each one, take adequate notes so you can find the info later on! [1] X Research source

Step 3 Search for case studies that have been published on the same or similar subject matter.

  • Find out what has been written before, and read the important articles about your case's situation . When you do this, you may find there is an existing problem that needs solution, or you may find that you have to come up with an interesting idea that might or might not work in your case situation.
  • Review sample case studies that are similar in style and scope to get an idea of composition and format, too.

Preparing the Interview

Step 1 Select participants that you will interview for inclusion in your case study.

  • Find knowledgeable people to interview. They don't necessarily have to be on your site, but they must be, actively or in the past, directly involved.
  • Determine whether you will interview an individual or group of individuals to serve as examples in your case study. It may be beneficial for participants to gather as a group and provide insight collectively. If the study focuses on personal subject matter or medical issues, it may be better to conduct personal interviews.
  • Gather as much information as possible about your subjects to ensure that you develop interviews and activities that will result in obtaining the most advantageous information to your study.

Step 2 Draft a list of interview questions and decide upon how you will conduct your study.

  • When you are interviewing people, ask them questions that will help you understand their opinions. I.e., How do you feel about the situation? What can you tell me about how the site (or the situation) developed? What do you think should be different, if anything? You also need to ask questions that will give you facts that might not be available from an article--make your work different and purposeful.

Step 3 Set up interviews...

  • Make sure all your informants are aware of what you're doing. They need to be fully informed (and signing waivers in certain cases) and your questions need to be appropriate and not controversial.

Obtaining Data

Step 1 Conduct interviews.

  • When you ask a question that doesn't let someone answer with a "yes" or a "no" you usually get more information. What you are trying to do is get the person to tell you whatever it is that he or she knows and thinks --even though you don't always know just what that is going to be before you ask the question. Keep your questions open-ended.
  • Request data and materials from subjects as applicable to add credibility to your findings and future presentations of your case study. Clients can provide statistics about usage of a new tool or product and participants can provide photos and quotes that show evidence of findings that may support the case.

Step 2 Collect and analyze all applicable data, including documents, archival records, observations and artifacts.

  • You can't include it all. So, you need to think about how to sort through it, take out the excess, and arrange it so that the situation at the case site will be understandable to your readers. Before you can do this, you have to put all the information together where you can see it and analyze what is going on.

Step 3 Formulate the problem in one or two sentences.

  • This will allow you to concentrate on what material is the most important. You're bound to receive information from participants that should be included, but solely on the periphery. Organize your material to mirror this.

Writing Your Piece

Step 1 Develop and write your case study using the data collected throughout the research, interviewing and analysis processes.

  • The introduction should very clearly set the stage. In a detective story, the crime happens right at the beginning and the detective has to put together the information to solve it for the rest of the story. In a case, you can start by raising a question. You could quote someone you interviewed.
  • Make sure to include background information on your study site, why your interviewees are a good sample, and what makes your problem pressing to give your audience a panoramic view of the issue. [2] X Research source After you've clearly stated the problem at hand, of course. [1] X Research source Include photos or a video if it would benefit your work to be persuasive and personalized.
  • After the reader has all the knowledge needed to understand the problem, present your data. Include customer quotes and data (percentages, awards and findings) if possible to add a personal touch and more credibility to the case presented. Describe for the reader what you learned in your interviews about the problem at this site, how it developed, what solutions have already been proposed and/or tried, and feelings and thoughts of those working or visiting there. You may have to do calculations or extra research yourself to back up any claims.
  • At the end of your analysis, you should offer possible solutions, but don't worry about solving the case itself. You may find referring to some interviewees' statements will do the alluding for you. Let the reader leave with a full grasp of the problem, but trying to come up with their own desire to change it. [1] X Research source Feel free to leave the reader with a question, forcing them to think for themselves. If you have written a good case, they will have enough information to understand the situation and have a lively class discussion.

Step 2 Add references and appendices (if any).

  • You may have terms that would be hard for other cultures to understand. If this is the case, include it in the appendix or in a Note for the Instructor .

Step 3 Make additions and deletions.

  • Go over your study section by section, but also as a whole. Each data point needs to fit into both it's place and the entirety of the work. If you can't find an appropriate place for something, stick it in the appendix.

Step 4 Edit and proofread your work.

  • Have someone else proofread, too. Your mind may have become oblivious to the errors it has seen 100 times. Another set of eyes may also notice content that has been left open-ended or is otherwise confusing.

Expert Q&A

Annaliese Dunne

  • If you are developing many case studies for the same purpose using the same general subjects, use a uniform template and/or design. Thanks Helpful 0 Not Helpful 0
  • Be sure to ask open-ended questions while conducting interviews to foster a discussion. Thanks Helpful 0 Not Helpful 0
  • Ask for permission to contact case study participants as you develop the written case study. You may discover that you need additional information as you analyze all data. Thanks Helpful 0 Not Helpful 0

case study wikipedia

You Might Also Like

Find Information on People

Expert Interview

case study wikipedia

Thanks for reading our article! If you’d like to learn more about writing, check out our in-depth interview with Annaliese Dunne .

  • ↑ 1.0 1.1 1.2 1.3 http://www.essayforum.com/grammar-usage-13/to-write-case-study-366/
  • ↑ https://www.universalclass.com/articles/business/the-process-of-writing-a-case-study.htm
  • http://writing.colostate.edu/guides/research/casestudy/pop2a.cfm Colorado State University Case Study writing guides
  • http://www.hoffmanmarcom.com/casestudy/howtowrite.php Hoffman Marketing and Communications case study overview

About This Article

Annaliese Dunne

To write a case study, start with an introduction that defines key terms, outlines the problem your case study addresses, and gives necessary background information. You can also include photos or a video if they will help your work to be more persuasive. Then, present your findings from the case study and explain your methodology, including how you used your data to come to your conclusions. In your conclusion, offer possible solutions or next steps for research, based on your results. To learn how to select participants for your case study, keep reading. Did this summary help you? Yes No

  • Send fan mail to authors

Reader Success Stories

Aaron Farrell

Aaron Farrell

Jul 11, 2019

Did this article help you?

case study wikipedia

Sep 11, 2016

Anonymous

Sep 21, 2017

Anonymous

Dec 18, 2017

Mar 6, 2017

Am I a Narcissist or an Empath Quiz

Featured Articles

Show Integrity

Trending Articles

View an Eclipse

Watch Articles

Make Sticky Rice Using Regular Rice

  • Terms of Use
  • Privacy Policy
  • Do Not Sell or Share My Info
  • Not Selling Info

wikiHow Tech Help Pro:

Level up your tech skills and stay ahead of the curve

‘Baby Reindeer’: What Happened to Richard Gadd's Stalker and Abuser?

The acclaimed Netflix show is based on a series of horrifying real-life ordeals

preview for Baby Reindeer: Official Trailer (Netflix)

Spoilers for Baby Reindeer ahead

Baby Reindeer might just be one of the most brilliant yet devastating things you’ll see on TV all year.

What makes it even more affecting is that it’s a true story, based on the life of the comedian Richard Gadd. Much like Michaela Coel in I May Destroy You , Gadd plays himself, reenacting and reliving his own real-life sexual assault, and the depths of PTSD that follow. While Gadd plays a fictional version of himself, a mid-Fringe comedian called Donny Dunn, and he renames his real-life stalker, most of what viewers see on screen happened to Gadd in real life.

The show provides a somewhat ambiguous ending. But what happened in reality to his stalker, represented in the show by the character Martha Scott (played by Jessica Gunning) and his industry abuser, the so-called Darrien (played by Tom Goodman-Hill)?

The backstory

In the series, Gadd details how a chance encounter with a crying woman at the pub he worked in led to a years-long obsession and campaign of stalking and harassment.

Over a three-year period from 2015, she sent him more than 40,000 emails and 350 hours worth of voicemails. She also turned up repeatedly to his work, when he was on stage in comedy clubs and also harassed his parents, falsely telling police that Gadd’s dad was a paedophile.

The police dismissed Gadd’s ordeal with his stalker – as happened in real life, as he told The Guardian in 2019 during the original theatre show of Baby Reindeer : “I was getting told off for harassing the police about being harassed… I’ve been through two police investigations in my life and they’ve both been hilarious, fly-on-the-wall terrible. Honestly my advice to someone who ever thought of pressing charges would be: it’s a fucking nightmare process, and it takes years.”

In the show, Martha eventually gets arrested and ends up in court for leaving a threatening voicemail on Donny’s phone. She is charged with three counts of stalking and harassment, of which she pleads guilty, and receives nine months in prison and a five-year restraining order.

Gadd, however, is at pains to point out that his stalker isn’t an evil monster, but a vulnerable person with extreme mental health issues. Speaking with The Independent in 2019, he said: “I can’t emphasise enough how much of a victim she is in all this. When we think of stalkers, we always think of films like Misery and Fatal Attraction , where the stalker is a monstrous figure in the night down an alleyway. But usually, it’s a prior relationship or someone you know or a work colleague. Stalking and harassment is a form of mental illness. It would have been wrong to paint her as a monster, because she’s unwell, and the system’s failed her.”

In reality, he hasn’t revealed the fate of his own stalker – other than he managed to get a restraining order out against her – and, according to an interview with The Times , “It is resolved. I had mixed feelings about it — I didn’t want to throw someone who was that level of mentally unwell in prison.”

Perhaps the most shocking incident in the series doesn’t feature Martha the stalker though, and instead happens in episode four, where Donny is groomed, loaded with mind-altering drugs and then raped by an older male TV industry mentor.

This also happened to Gadd in real life, and he previously attempted to process the sexual violation through another theatre show, improbably a comedy, Monkey See Monkey Do , which won the Edinburgh Comedy award in 2016.

In 2012, as he described in the theatre show, he met a man at a party who drugged and then sexually attacked him. In the series, Darrien – a writer on the fictional show Cotton Mouth – draws him in by offering him industry contacts, and says he’ll help him get his own TV show commissioned. The pair meet frequently at Darrien’s flat, where he plies Donny with harder and harder drugs, including GHB and acid, and in a graphic scene, rapes him.

In the series, Donny eventually goes back, seemingly to confront Darrien, but when Darrien acts like nothing’s wrong – he makes him a cup of tea and offers him a writing position on his TV show – Donny leaves, without saying anything.

It’s unknown what happened to Gadd’s real abuser. But in an interview with The Guardian he said that he worked to channel his experiences into his creative theatre and TV show: “Keeping this in was really hard. And I knew the only way I’d be free of it is if I start to tell people. I don’t think anyone knows how bad it is until it happens. What doesn’t kill you makes you stronger, I think.”

Commenting on Instagram on the release of Baby Reindeer – which charted at number one on Netflix’s most watched in the UK and US – Gadd added: “‘I am okay, for those asking. I promise.”

Rape Crisis offers support for those affected by rape and sexual abuse; you can call them on 0808 802 9999 in England and Wales, 0808 801 0302 in Scotland, and 0800 0246 991 in Northern Ireland, or visit their website at www.rapecrisis.org.uk .

Laura Martin is a freelance journalist  specializing in pop culture.

@media(max-width: 73.75rem){.css-1ktbcds:before{margin-right:0.4375rem;color:#FF3A30;content:'_';display:inline-block;}}@media(min-width: 64rem){.css-1ktbcds:before{margin-right:0.5625rem;color:#FF3A30;content:'_';display:inline-block;}} Culture

richard gadd in netflix's baby reindeer

The Best Movies of 2024

top boy

The Life and Death of 'Top Boy'

best spy films

Classic Spy Films, From 'Argo' to 'Austin Powers'

sex movies

The Best Sex Movies: A Countdown

kirsten dunst, civil war official trailer

How to Watch Alex Garland's 'Civil War'

the sympathizer

Is The Sympathizer a true story?

fallout

Everything We Know About 'Fallout' S2

mike faist zendaya josh o'connor

‘Challengers’ Review: Game, Set and Mismatch

reacher season 3

‘Reacher’ Series 3: Release Date, Cast & Spoilers

pilton, united kingdom british pop singer amy winehouse performs at the glastonbury music festival, in pilton, somerset, in south west england, 22 june 2007 the glastonbury festival kicked off friday with arctic monkeys and bjork headlining as rain began to turn the vast site into the traditional mudbath the planets largest greenfield music and performing arts festival is back and bigger than ever after taking a break in 2006, with 177,500 people packing out worthy farm in southwest england, to catch some of the worlds hottest acts but the 800 acres 320 hectares of rolling somerset countryside was gradually descending into a muddy bog as heavy rain soaked the giant tent city afp photocarl de souza photo credit should read carl de souzaafp via getty images

The True Story Behind ‘Back To Black'

3 body problem

‘3 Body Problem’ Series 2 Has a $160 Mill Problem

IMAGES

  1. User Case Study: Wikipedia on Behance

    case study wikipedia

  2. (PDF) Wikipedia and the University, a case study

    case study wikipedia

  3. Wikipedia Case Study

    case study wikipedia

  4. Case study research design and methods wikipedia

    case study wikipedia

  5. Everything you should know about the Case studies

    case study wikipedia

  6. Introduction to Case Study

    case study wikipedia

VIDEO

  1. How does a case system work?

  2. Part 8

  3. My ChatGPT Article Outranked Quora, Reddit & Wikipedia

  4. How to write a case series? Journal paper writing, article publishing basics

  5. What is #Case Study Sociology

  6. How To Use Recent Scientific Studies To Get The BEST Results

COMMENTS

  1. Case study

    A case study is an in-depth, detailed examination of a particular case (or cases) within a real-world context. For example, case studies in medicine may focus on an individual patient or ailment; case studies in business might cover a particular firm's strategy or a broader market; similarly, case studies in politics can range from a narrow happening over time like the operations of a specific ...

  2. Case-control study

    A case-control study (also known as case-referent study) is a type of observational study in which two existing groups differing in outcome are identified and compared on the basis of some supposed causal attribute. Case-control studies are often used to identify factors that may contribute to a medical condition by comparing subjects who ...

  3. Case study

    Case study is a useful methodology for focusing on relationships connecting everyday practices in natural settings, placing attention on a local situation (Stake, 2006). The case study is useful to investigate an issue in depth and 'provide an explanation that can cope with the complexity and subtlety of real of life situation' (Denscombe ...

  4. What Is a Case Study?

    Revised on November 20, 2023. A case study is a detailed study of a specific subject, such as a person, group, place, event, organization, or phenomenon. Case studies are commonly used in social, educational, clinical, and business research. A case study research design usually involves qualitative methods, but quantitative methods are ...

  5. Case study (psychology)

    Case study in psychology refers to the use of a descriptive research approach to obtain an in-depth analysis of a person, group, or phenomenon. A variety of techniques may be employed including personal interviews, direct-observation, psychometric tests, and archival records. In psychology case studies are most often used in clinical research ...

  6. HBS Cases: How Wikipedia Works (or Doesn't)

    For HBS professor Andrew McAfee, Wikipedia is a surprisingly high-quality product.But when his concept of "Enterprise 2.0" turned up on the online encyclopedia one day—and was recommended for deletion—McAfee and colleague Karim R. Lakhani knew they had the makings of an insightful case study on collaboration and governance in the digital world. . Key concepts include:

  7. Wikipedia as a tool for contemporary history of science: A case study

    Rapid developments and methodological divides hinder the study of how scientific knowledge accumulates, consolidates and transfers to the public sphere. Our work proposes using Wikipedia, the online encyclopedia, as a historiographical source for contemporary science. We chose the high-profile field of gene editing as our test case, performing a historical analysis of the English-language ...

  8. Understanding the case method: Teaching public administration case by

    Case studies represent an established approach in research and teaching across a wide range of academic disciplines. In social science research, the merits of the case study approach are particularly appreciated by advocates of qualitative methods who value their high degree of internal validity and the close link between empirical data and their context, including institutions and actors ().

  9. Case study

    A case study is one of several ways of doing research whether it is social science related or even socially related. It is an intensive study of a single group, incident, or community. [1] Other ways include experiments, surveys, multiple histories, and analysis of archival information [2].. Rather than using samples and following a rigid protocol to examine limited number of variables, case ...

  10. What is a Case Study? [+6 Types of Case Studies]

    A business or marketing case study aims at showcasing a successful partnership. This can be between a brand and a client. Or the case study can examine a brand's project. There is a perception that case studies are used to advertise a brand. But effective reports, like the one below, can show clients how a brand can support them.

  11. Wikipedia as a tool for contemporary history of science: A case study

    It is an ideal case study as its history is short (i.e., parallel to Wikipedia's lifetime) and multi-faceted: a highly scientific topic with wide-ranging technological and social ramifications. These facets, we found, were documented on Wikipedia and its different articles, supported by scientific, public and popular sources alike. ...

  12. Education/Case Studies

    In 2010, the Wikimedia Foundation started the Wikipedia Education Program to provide more support for educators who are interested in using Wikipedia as a teaching tool. In the following pages, educators will explain Wikipedia assignments they've used to meet learning objectives for their courses. They will also explain how they assessed or ...

  13. Using Wikipedia to enhance student learning: A case study in economics

    Currently, there is widespread interest in how Web 2.0 tools can be used to improve students' learning experiences. Previous studies have focused either on the advantages of wikis or on concerns over the use of Wikipedia. In this study, we propose to use Wikipedia as a classroom wiki. In doing so, we discuss how students can improve their standard written assignments using Wikipedia instead ...

  14. PDF Fairness in Socio-technical Systems: a Case Study of Wikipedia

    expose a case study where we analyse the fairness properties of Wikipedia. Wikipedia is an emblematic STS, with complex interaction processes governed by numerous rules and norms [26], with many automated components involved (including thousands of bots [19], [58] responsible for 20% of edits

  15. 3 Ways to Do a Case Study

    5. Plan for a long-term study. Most academic case studies last at least 3-6 months, and many of them continue for years. You may be limited by your research funding or the length of your degree program, but you should allow a few weeks to conduct the study at the very least. 6. Design your research strategy in detail.

  16. Fairness in Socio-Technical Systems: A Case Study of Wikipedia

    Wikipedia is arguably the greatest success of large-scale collaboration on the Web: with over 60 million articles across more than 300 language editions, Wikipedia has become one of the most visited Web sites in the world, with additional impact through search results and other channels [].Wikipedia content is produced through a complex collaborative process involving humans, bots and semi ...

  17. (PDF) Wikipedia and the University, a case study

    In their study of a university in Liverpool, UK, Knight & Pryke (2012) found that 75% of academics and students use Wikipedia regularly, particularly in the early stages of research, yet only ...

  18. 4 Ways to Write a Case Study

    2. Collect and analyze all applicable data, including documents, archival records, observations and artifacts. Organize all of your data in the same place to ensure easy access to information and materials while writing the case study. You can't include it all.

  19. Wikipedia Case Study

    Case Study: Wikipedia. Wikipedia is a free-content encyclopedia, written collaboratively by people from all around the world. The site is a wiki, which means that anyone with access to an Internet-connected computer can edit entries simply by clicking on the edit this page link.

  20. Wikipedia: One of the last, best internet spaces for teaching digital

    1. Introduction. In a 2012 study of randomly generated search terms in Google, Wikipedia was the 1st result in over half of the search terms (56%), and Wikipedia appears on the first page of results in 99% of Google searches. 1 As we know from studying internet user behavior, this means that Wikipedia is the most likely source of information that first-year composition students will encounter ...

  21. Case method

    The case method is a teaching approach that uses decision-forcing cases to put students in the role of people who were faced with difficult decisions at some point in the past. It developed during the course of the twentieth-century from its origins in the casebook method of teaching law pioneered by Harvard legal scholar Christopher C. Langdell.In sharp contrast to many other teaching methods ...

  22. Case Study by Graeme Macrae Burnet

    Case Study is a pitch perfect retrospective look at one small corner of the 1960s and early 1970s, although Burnet's age precluded him from witnessing firsthand the hubbub around therapeutic schools and star therapists. Piling verisimilitude upon verisimilitude, Collin Braithwaite's opus as imagined by Burnet includes such wonderfully ...

  23. 'Baby Reindeer': What Happened to Richard Gadd's Stalker ...

    The police dismissed Gadd's ordeal with his stalker - as happened in real life, as he told The Guardian in 2019 during the original theatre show of Baby Reindeer: "I was getting told off for ...

  24. Wikipedia Redesign Case Study. Case study made for Aela Master…

    The exercise goal was redesign some desktop pages of Wikipedia at our discretion using the good principles of usability, solving real problems with at least the following techniques: analysis of ...

  25. Fallstudie

    Fallstudie (engelska: Case study) är en forskningsstrategi som syftar till att ge djupgående kunskaper om det man undersöker. [1] Målet med en fallstudie är att generalisera för teoriutveckling. [2]Det som kännetecknar fallstudien är att den fokuserar på ett fenomen som ofta är svårt att särskilja från fenomenets kontext. [förtydliga] [källa behövs] Ofta fokuserar man på ett ...