iThenticate_petrol_cropped

  • Our Solutions
  • Academic Surface potential plagiarism prior to publication with iThenticate’s expansive content database.
  • Admissions Screen personal essays for potential plagiarism and help ensure the highest level of integrity even before matriculation.
  • Government Ensure the originality of public-facing content, from legal documents to grant applications, and reporting.
  • Medical Prevent invalid findings dissemination, grant misconduct, and improper medical practices.
  • Publishing Protect your journal’s reputation by publishing only the highest quality articles.
  • Our Resources
  • FAQ Find the answers to the commonly asked questions about how iThenticate works.
  • Content Database Comprehensive coverage you can trust across the internet, scholarly articles, and industry papers.
  • Guidance Search our comprehensive site for the launch, integration and usage information.
  • Buy Credits

The Challenge of Repeating Methods While Avoiding Plagiarism

Popular posts, posts by topic.

  • Current Events (131)
  • Best Practices (32)
  • Academic (30)
  • Technology (29)
  • Newsletter (28)
  • Interviews (25)
  • Social Media (25)
  • International (19)
  • Scientific Technical Medical (15)
  • Resources (12)

zhejiang university logo

The reason is that, even when doing original research, authors will inevitably find themselves repeating steps that they or others have taken before. Whether they are using an established technique to study something new or simply replicating an earlier study, it can be frustrating to try and find new ways to accurately describe something already covered countless times before.

This has caused many authors, especially when they are repeating methods they’ve used before, to simply copy and paste their description and, often times, it’s without attribution to the original source.

Editors, however, have generally found this behavior to be unacceptable. Though a recent survey conducted by Yue-hong Helen Zhang and Xiao-yan Jia found that 20% of journal editors in the biosciences took no issue with up to 40% of duplicated content in the methods section, most editors took a much harder line on the issue.

So how do researchers handle the problem of duplicative methodologies? Another recent survey, this one conducted by Xiaoyan Jia, Xufei Tan and Yuehong Zhang for the journal Scientometrics , asked 178 researchers, mostly veteran researchers with more than 20 published papers, to look at their last three papers and described how they approached the issue.

According to their responses, they used a variety of methods to handle the problem, including using different approaches in the same paper.

The most common solution was to simply rewrite the potentially duplicative content in their own (new) words (294 cases out of 829). However, some reuse was still far more common, totaling 535 cases. Of those cases, researchers were much more likely to reuse their own words than someone else’s and were also much more likely to attribute the content than not, whether it was verbatim or with rewording.

Still, 93 of the cases involved reusing previous content without attribution and 25 of those involved repeating methods, either verbatim or with rewording, from others without attribution.

Two other potential solutions were also explored, one involved reusing no content at all and simply providing a citation to the original methods. This was employed by the surveyed authors 115 times but, according to an analysis of major journals performed by the surveyors, it was rarely employed by journals.

The reason is likely because simply providing a citation leaves the methods section extremely thin and can leave out key details needed for understanding the research, making them only available in other papers.

Another solution was to include the duplicative methods as an attachment to the paper. This helped both ensure proper citation and clarity while keeping the necessary information with the work. However, this method was even less used with only one surveyed journal, Science, making widespread use of it.

The end result of this is that, while it’s clear editors are not tolerant of unattributed duplicative text in the methods section of a paper, there’s no clear way for researchers to address this problem.

For paper authors, the best advice likely comes from an editorial and case study written for the journal Biomedicine & Biotechnology by Yue-hong Helen Zhang, Xiao-yan Jia, Han-feng Lin and Xu-fei Tan. They proposed authors should always cite previous methods being used and, when descriptions of methods are used verbatim, to indicate as such through the use of quote marks or blockquotes as appropriate.

Though the approach might seem inefficient, especially when dealing with standard practices that should be widely-understood, it is always better to provide too much citation your work than to provide too little.

See list of editor papers focused on scholarly publishing & academic ethics on the Journal of ZheJiang University website. Editors of the Zhejiang University-SCIENCE (A/B/C) are users of CrossCheck powered by iThenticate.

Rotten Memories: Memoir Plagiarism

Welcome from iThenticate!

How the Rise of Freelancing Changes Plagiarism in Journalism

Topics: Best Practices

© 2024 Turnitin, LLC . All rights reserved.

  • Privacy and Security

For Instructors:

For Students:

For Educational Resources:

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • v.18(3); 2020 Mar

Logo of plosbiol

What is replication?

Brian a. nosek.

1 Center for Open Science, Charlottesville, Virginia, United States of America

2 University of Virginia, Charlottesville, Virginia, United States of America

Timothy M. Errington

Credibility of scientific claims is established with evidence for their replicability using new data. According to common understanding, replication is repeating a study’s procedure and observing whether the prior finding recurs. This definition is intuitive, easy to apply, and incorrect. We propose that replication is a study for which any outcome would be considered diagnostic evidence about a claim from prior research. This definition reduces emphasis on operational characteristics of the study and increases emphasis on the interpretation of possible outcomes. The purpose of replication is to advance theory by confronting existing understanding with new evidence. Ironically, the value of replication may be strongest when existing understanding is weakest. Successful replication provides evidence of generalizability across the conditions that inevitably differ from the original study; Unsuccessful replication indicates that the reliability of the finding may be more constrained than recognized previously. Defining replication as a confrontation of current theoretical expectations clarifies its important, exciting, and generative role in scientific progress.

What is replication? This Perspective article proposes that the answer shifts the conception of replication from a boring, uncreative, housekeeping activity to an exciting, generative, vital contributor to research progress.

Introduction

Credibility of scientific claims is established with evidence for their replicability using new data [ 1 ]. This is distinct from retesting a claim using the same analyses and same data (usually referred to as reproducibility or computational reproducibility ) and using the same data with different analyses (usually referred to as robustness ). Recent attempts to systematically replicate published claims indicate surprisingly low success rates. For example, across 6 recent replication efforts of 190 claims in the social and behavioral sciences, 90 (47%) replicated successfully according to each study’s primary success criterion [ 2 ]. Likewise, a large-sample review of 18 candidate gene or candidate gene-by-interaction hypotheses for depression found no support for any of them [ 3 ], a particularly stunning result considering that more than 1,000 articles have investigated their effects. Replication challenges have spawned initiatives to improve research rigor and transparency such as preregistration and open data, materials, and code [ 4 – 6 ]. Simultaneously, failures-to-replicate have spurred debate about the meaning of replication and its implications for research credibility. Replications are inevitably different from the original studies. How do we decide whether something is a replication? The answer shifts the conception of replication from a boring, uncreative, housekeeping activity to an exciting, generative, vital contributor to research progress.

Replication reconsidered

According to common understanding, replication is repeating a study’s procedure and observing whether the prior finding recurs [ 7 ]. This definition of replication is intuitive, easy to apply, and incorrect.

The problem is this definition’s emphasis on repetition of the technical methods—the procedure, protocol, or manipulated and measured events. Why is that a problem? Imagine an original behavioral study was conducted in the United States in English. What if the replication is to be done in the Philippines with a Tagalog-speaking sample? To be a replication, must the materials be administered in English? With no revisions for the cultural context? If minor changes are allowed, then what counts as minor to still qualify as repeating the procedure? More broadly, it is not possible to recreate an earthquake, a supernova, the Pleistocene, or an election. If replication requires repeating the manipulated or measured events of the study, then it is not possible to conduct replications in observational research or research on past events.

The repetition of the study procedures is an appealing definition of replication because it often corresponds to what researchers do when conducting a replication—i.e., faithfully follow the original methods and procedures as closely as possible. But the reason for doing so is not because repeating procedures defines replication. Replications often repeat procedures because theories are too vague and methods too poorly understood to productively conduct replications and advance theoretical understanding otherwise [ 8 ].

Prior commentators have drawn distinctions between types of replication such as “direct” versus “conceptual” replication and argue in favor of valuing one over the other (e.g., [ 9 , 10 ]). By contrast, we argue that distinctions between “direct” and “conceptual” are at least irrelevant and possibly counterproductive for understanding replication and its role in advancing knowledge. Procedural definitions of replication are masks for underdeveloped theoretical expectations, and “conceptual replications” as they are identified in practice often fail to meet the criteria we develop here and deem essential for a test to qualify as a replication.

Replication redux

We propose an alternative definition for replication that is more inclusive of all research and more relevant for the role of replication in advancing knowledge. Replication is a study for which any outcome would be considered diagnostic evidence about a claim from prior research. This definition reduces emphasis on operational characteristics of the study and increases emphasis on the interpretation of possible outcomes.

To be a replication, 2 things must be true: outcomes consistent with a prior claim would increase confidence in the claim, and outcomes inconsistent with a prior claim would decrease confidence in the claim. The symmetry promotes replication as a mechanism for confronting prior claims with new evidence. Therefore, declaring that a study is a replication is a theoretical commitment. Replication provides the opportunity to test whether existing theories, hypotheses, or models are able to predict outcomes that have not yet been observed. Successful replications increase confidence in those models; unsuccessful replications decrease confidence and spur theoretical innovation to improve or discard the model. This does not imply that the magnitude of belief change is symmetrical for “successes” and “failures.” Prior and existing evidence inform the extent to which replication outcomes alter beliefs. However, as a theoretical commitment, replication does imply precommitment to taking all outcomes seriously.

Because replication is defined based on theoretical expectations, not everyone will agree that one study is a replication of another. Moreover, it is not always possible to make precommitments to the diagnosticity of a study as a replication, often for the simple reason that study outcomes are already known. Deciding whether studies are replications after observing the outcomes can leverage post hoc reasoning biases to dismiss “failures” as nonreplications and “successes” as diagnostic tests of the claims, or the reverse if the observer wishes to discredit the claims. This can unproductively retard research progress by dismissing replication counterevidence. Simultaneously, replications can fail to meet their intended diagnostic aims because of error or malfunction in the procedure that is only identifiable after the fact. When there is uncertainty about the status of claims and the quality of methods, there is no easy solution to distinguishing between motivated and principled reasoning about evidence. Science’s most effective solution is to replicate, again.

At its best, science minimizes the impact of ideological commitments and reasoning biases by being an open, social enterprise. To achieve that, researchers should be rewarded for articulating their theories clearly and a priori so that they can be productively confronted with evidence [ 4 , 6 ]. Better theories are those that make it clear how they can be supported and challenged by replication. Repeated replication is often necessary to resolve confidence in a claim, and, invariably, researchers will have plenty to argue about even when replication and precommitment are normative practices.

Replication resolved

The purpose of replication is to advance theory by confronting existing understanding with new evidence. Ironically, the value of replication may be strongest when existing understanding is weakest. Theory advances in fits and starts with conceptual leaps, unexpected observations, and a patchwork of evidence. That is okay; it is fuzzy at the frontiers of knowledge. The dialogue between theory and evidence facilitates identification of contours, constraints, and expectations about the phenomena under study. Replicable evidence provides anchors for that iterative process. If evidence is replicable, then theory must eventually account for it, even if only to dismiss it as irrelevant because of invalidity of the methods. For example, the claims that there are more obese people in wealthier countries compared with poorer countries on average and that people in wealthier countries live longer than people in poorer countries on average could both be highly replicable. All theoretical perspectives about the relations between wealth, obesity, and longevity would have to account for those replicable claims.

There is no such thing as exact replication. We cannot reproduce an earthquake, era, or election, but replication is not about repeating historical events. Replication is about identifying the conditions sufficient for assessing prior claims. Replication can occur in observational research when the conditions presumed essential for observing the evidence recur, such as when a new seismic event has the characteristics deemed necessary and sufficient to observe an outcome predicted by a prior theory or when a new method for reassessing a fossil offers an independent test of existing claims about that fossil. Even in experimental research, original and replication studies inevitably differ in some aspects of the sample—or units—from which data are collected, the treatments that are administered, the outcomes that are measured, and the settings in which the studies are conducted [ 11 ].

Individual studies do not provide comprehensive or definitive evidence about all conditions for observing evidence about claims. The gaps are filled with theory. A single study examines only a subset of units, treatments, outcomes, and settings. The study was conducted in a particular climate, at particular times of day, at a particular point in history, with a particular measurement method, using particular assessments, with a particular sample. Rarely do researchers limit their inference to precisely those conditions. If they did, scientific claims would be historical claims because those precise conditions will never recur. If a claim is thought to reveal a regularity about the world, then it is inevitably generalizing to situations that have not yet been observed. The fundamental question is: of the innumerable variations in units, treatments, outcomes, and settings, which ones matter? Time-of-day for data collection may be expected to be irrelevant for a claim about personality and parenting or critical for a claim about circadian rhythms and inhibition.

When theories are too immature to make clear predictions, repetition of original procedures becomes very useful. Using the same procedures is an interim solution for not having clear theoretical specification of what is needed to produce evidence about a claim. And, using the same procedures reduces uncertainty about what qualifies as evidence “consistent with” earlier claims. Replication is not about the procedures per se, but using similar procedures reduces uncertainty in the universe of possible units, treatments, outcomes, and settings that could be important for the claim.

Because there is no exact replication, every replication test assesses generalizability to the new study’s unique conditions. However, every generalizability test is not a replication. Fig 1 ‘s left panel illustrates a discovery and conditions around it to which it is potentially generalizable. The generalizability space is large because of theoretical immaturity; there are many conditions in which the claim might be supported, but failures would not discredit the original claim. Fig 1 ‘s right panel illustrates a maturing understanding of the claim. The generalizability space has shrunk because some tests identified boundary conditions (gray tests), and the replicability space has increased because successful replications and generalizations (colored tests) have improved theoretical specification for when replicability is expected.

An external file that holds a picture, illustration, etc.
Object name is pbio.3000691.g001.jpg

For underspecified theories, there is a larger space for which the claim may or may not be supported—the theory does not provide clear expectations. These are generalizability tests. Testing replicability is a subset of testing generalizability. As theory specification improves (moving from left panel to right panel), usually interactively with repeated testing, the generalizability and replicability space converge. Failures-to-replicate or generalize shrink the space (dotted circle shows original plausible space). Successful replications and generalizations expand the replicability space—i.e., broadening and strengthening commitments to replicability across units, treatments, outcomes, and settings.

Successful replication provides evidence of generalizability across the conditions that inevitably differ from the original study; unsuccessful replication indicates that the reliability of the finding may be more constrained than recognized previously. Repeatedly testing replicability and generalizability across units, treatments, outcomes, and settings facilitates improvement in theoretical specificity and future prediction.

Theoretical maturation is illustrated in Fig 2 . A progressive research program (the left path) succeeds in replicating findings across conditions presumed to be irrelevant and also matures the theoretical account to more clearly distinguish conditions for which the phenomenon is expected to be observed or not observed. This is illustrated by a shrinking generalizability space in which the theory does not make clear predictions. A degenerative research program (the right path) persistently fails to replicate the findings and progressively narrows the universe of conditions to which the claim could apply. This is illustrated by shrinking generalizability and replicability space because the theory must be constrained to ever narrowing conditions [ 12 ].

An external file that holds a picture, illustration, etc.
Object name is pbio.3000691.g002.jpg

With progressive success (left path) theoretical expectations mature, clarifying when replicability is expected. Also, boundary conditions become clearer, reducing the potential generalizability space. A complete theoretical account eliminates generalizability space because the theoretical expectations are so clear and precise that all tests are replication tests. With repeated failures (right path) the generalizability and replicability space both shrink, eventually to a theory so weak that it makes no commitments to replicability.

This exposes an inevitable ambiguity in failures-to-replicate. Was the original evidence a false positive or the replication a false negative, or does the replication identify a boundary condition of the claim? We can never know for certain that earlier evidence was a false positive. It is always possible that it was “real,” and we cannot identify or recreate the conditions necessary to replicate successfully. But that does not mean that all claims are true, and science cannot be self-correcting. Accumulating failures-to-replicate could result in a much narrower but more precise set of circumstances in which evidence for the claim is replicable, or it may result in failure to ever establish conditions for replicability and relegate the claim to irrelevance.

The ambiguity between disconfirming an original claim or identifying a boundary condition also means that understanding whether or not a study is a replication can change due to accumulation of knowledge. For example, the famous experiment by Otto Loewi (1936 Nobel Prize in Physiology or Medicine) showed that the inhibitory factor “vagusstoff,” subsequently determined to be acetylcholine, was released from the vagus nerve of frogs, suggesting that neurotransmission was a chemical process. Much later, after his and others’ failures-to-replicate his original claim, a crucial theoretical insight identified that the time of year at which Loewi performed his experiment was critical to its success [ 13 ]. The original study was performed with so-called winter frogs. The replication attempts performed with summer frogs failed because of seasonal sensitivity of the frog heart to the unrecognized acetylcholine, making the effects of vagal stimulation far more difficult to demonstrate. With subsequent tests providing supporting evidence, the understanding of the claim improved. What had been perceived as replications were not anymore because new evidence demonstrated that they were not studying the same thing. The theoretical understanding evolved, and subsequent replications supported the revised claims. That is not a problem, that is progress.

Replication is rare

The term “conceptual replication” has been applied to studies that use different methods to test the same question as a prior study. This is a useful research activity for advancing understanding, but many studies with this label are not replications by our definition. Recall that “to be a replication, 2 things must be true: outcomes consistent with a prior claim would increase confidence in the claim, and outcomes inconsistent with a prior claim would decrease confidence in the claim." Many "conceptual replications" meet the first criterion and fail the second. That is, they are not designed such that a failure to replicate would revise confidence in the original claim. Instead, “conceptual replications” are often generalizability tests. Failures are interpreted, at most, as identifying boundary conditions. A self-assessment of whether one is testing replicability or generalizability is answering—would an outcome inconsistent with prior findings cause me to lose confidence in the theoretical claims? If no, then it is a generalizability test.

Designing a replication with a different methodology requires understanding of the theory and methods so that any outcome is considered diagnostic evidence about the prior claim. In practice, this means that replication is often limited to relatively close adherence to original methods for topics in which theory and methodology is immature—a circumstance commonly called “direct” or “close” replication—because the similarity of methods serves as a stand-in for theoretical and measurement precision. In fact, conducting a replication of a prior claim with a different methodology can be considered a milestone for theoretical and methodological maturity.

Replication is characterized as the boring, rote, clean-up work of science. This misperception makes funders reluctant to fund it, journals reluctant to publish it, and institutions reluctant to reward it. The disincentives for replication are a likely contributor to existing challenges of credibility and replicability of published claims [ 14 ].

Defining replication as a confrontation of current theoretical expectations clarifies its important, exciting, and generative role in scientific progress. Single studies, whether they pursue novel ends or confront existing expectations, never definitively confirm or disconfirm theories. Theories make predictions; replications test those predictions. Outcomes from replications are fodder for refining, altering, or extending theory to generate new predictions. Replication is a central part of the iterative maturing cycle of description, prediction, and explanation. A shift in attitude that includes replication in funding, publication, and career opportunities will accelerate research progress.

Acknowledgments

We thank Alex Holcombe, Laura Scherer, Leonhard Held, and Don van Ravenzwaaij for comments on earlier versions of this paper, and we thank Anne Chestnut for graphic design support.

Funding Statement

This work was supported by grants from Arnold Ventures, John Templeton Foundation, Templeton World Charity Foundation, and Templeton Religion Trust. The funders had no role in the preparation of the manuscript or the decision to publish.

Provenance: Commissioned; not externally peer reviewed.

National Academies Press: OpenBook

Reproducibility and Replicability in Science (2019)

Chapter: 5 replicability, 5 replicability.

Replication is one of the key ways scientists build confidence in the scientific merit of results. When the result from one study is found to be consistent by another study, it is more likely to represent a reliable claim to new knowledge. As Popper (2005 , p. 23) wrote (using “reproducibility” in its generic sense):

We do not take even our own observations quite seriously, or accept them as scientific observations, until we have repeated and tested them. Only by such repetitions can we convince ourselves that we are not dealing with a mere isolated ‘coincidence,’ but with events which, on account of their regularity and reproducibility, are in principle inter-subjectively testable.

However, a successful replication does not guarantee that the original scientific results of a study were correct, nor does a single failed replication conclusively refute the original claims. A failure to replicate previous results can be due to any number of factors, including the discovery of an unknown effect, inherent variability in the system, inability to control complex variables, substandard research practices, and, quite simply, chance. The nature of the problem under study and the prior likelihoods of possible results in the study, the type of measurement instruments and research design selected, and the novelty of the area of study and therefore lack of established methods of inquiry can also contribute to non-replicability. Because of the complicated relationship between replicability and its variety of sources, the validity of scientific results should be considered in the context of an entire body of evidence, rather than an individual study or an individual replication. Moreover, replication may be a matter of degree, rather than a binary result of “success” or “failure.” 1 We explain in Chapter 7 how research synthesis, especially meta-analysis, can be used to evaluate the evidence on a given question.

ASSESSING REPLICABILITY

How does one determine the extent to which a replication attempt has been successful? When researchers investigate the same scientific question using the same methods and similar tools, the results are not likely to be identical—unlike in computational reproducibility in which bitwise agreement between two results can be expected (see Chapter 4 ). We repeat our definition of replicability, with emphasis added: obtaining consistent results across studies aimed at answering the same scientific question, each of which has obtained its own data.

Determining consistency between two different results or inferences can be approached in a number of ways ( Simonsohn, 2015 ; Verhagen and Wagenmakers, 2014 ). Even if one considers only quantitative criteria for determining whether two results qualify as consistent, there is variability across disciplines ( Zwaan et al., 2018 ; Plant and Hanisch, 2018 ). The Royal Netherlands Academy of Arts and Sciences (2018 , p. 20) concluded that “it is impossible to identify a single, universal approach to determining [replicability].” As noted in Chapter 2 , different scientific disciplines are distinguished in part by the types of tools, methods, and techniques used to answer questions specific to the discipline, and these differences include how replicability is assessed.

___________________

1 See, for example, the cancer biology project in Table 5-1 in this chapter.

Acknowledging the different approaches to assessing replicability across scientific disciplines, however, we emphasize eight core characteristics and principles:

  • Attempts at replication of previous results are conducted following the methods and using similar equipment and analyses as described in the original study or under sufficiently similar conditions ( Cova et al., 2018 ). 2 Yet regardless of how similar the replication study is, no second event can exactly repeat a previous event.
  • The concept of replication between two results is inseparable from uncertainty, as is also the case for reproducibility (as discussed in Chapter 4 ).
  • Any determination of replication (between two results) needs to take account of both proximity (i.e., the closeness of one result to the other, such as the closeness of the mean values) and uncertainty (i.e., variability in the measures of the results).
  • To assess replicability, one must first specify exactly what attribute of a previous result is of interest. For example, is only the direction of a possible effect of interest? Is the magnitude of effect of interest? Is surpassing a specified threshold of magnitude of interest? With the attribute of interest specified, one can then ask whether two results fall within or outside the bounds of “proximity-uncertainty” that would qualify as replicated results.
  • Depending on the selected criteria (e.g., measure, attribute), assessments of a set of attempted replications could appear quite divergent. 3
  • A judgment that “Result A is replicated by Result B” must be identical to the judgment that “Result B is replicated by Result A.” There must be a symmetry in the judgment of replication; otherwise, internal contradictions are inevitable.
  • There could be advantages to inverting the question from, “Does Result A replicate Result B (given their proximity and uncertainty)?”

2 Cova et al. (2018, fn. 3) discuss the challenge of defining sufficiently similar as well as the interpretation of the results:

In practice, it can be hard to determine whether the ‘sufficiently similar’ criterion has actually been fulfilled by the replication attempt, whether in its methods or in its results ( Nakagawa and Parker 2015 ). It can therefore be challenging to interpret the results of replication studies, no matter which way these results turn out ( Collins, 1975 ; Earp and Trafimow, 2015 ; Maxwell et al., 2015 ).

3 See Table 5-1 , for an example of this in the reviews of a psychology replication study by Open Science Collaboration (2015) and Patil et al. (2016) .

to “Are Results A and B sufficiently divergent (given their proximity and uncertainty) so as to qualify as a non-replication ?” It may be advantageous, in assessing degrees of replicability, to define a relatively high threshold of similarity that qualifies as “replication,” a relatively low threshold of similarity that qualifies as “non-replication,” and the intermediate zone between the two thresholds that is considered “indeterminate.” If a second study has low power and wide uncertainties, it may be unable to produce any but indeterminate results.

  • While a number of different standards for replicability/non-replicability may be justifiable, depending on the attributes of interest, a standard of “repeated statistical significance” has many limitations because the level of statistical significance is an arbitrary threshold ( Amrhein et al., 2019a ; Boos and Stefanski, 2011 ; Goodman, 1992 ; Lazzeroni et al., 2016 ). For example, one study may yield a p -value of 0.049 (declared significant at the p ≤ 0.05 level) and a second study yields a p -value of 0.051 (declared nonsignificant by the same p -value threshold) and therefore the studies are said not to be replicated. However, if the second study had yielded a p -value of 0.03, the reviewer would say it had successfully replicated the first study, even though the result could diverge more sharply (by proximity and uncertainty) from the original study than in the first comparison. Rather than focus on an arbitrary threshold such as statistical significance, it would be more revealing to consider the distributions of observations and to examine how similar these distributions are. This examination would include summary measures, such as proportions, means, standard deviations (or uncertainties), and additional metrics tailored to the subject matter.

The final point above is reinforced by a recent special edition of the American Statistician in which the use of a statistical significance threshold in reporting is strongly discouraged due to overuse and wide misinterpretation ( Wasserstein et al., 2019 ). A figure from ( Amrhein et al., 2019b ) also demonstrates this point, as shown in Figure 5-1 .

One concern voiced by some researchers about using a proximity-uncertainty attribute to assess replicability is that such an assessment favors studies with large uncertainties; the potential consequence is that many researchers would choose to perform low-power studies to increase the replicability chances ( Cova et al., 2018 ). While two results with large uncertainties and within proximity, such that the uncertainties overlap with each other, may be consistent with replication, the large uncertainties indicate that not much confidence can be placed in that conclusion.

Image

CONCLUSION 5-1: Different types of scientific studies lead to different or multiple criteria for determining a successful replication. The choice of criteria can affect the apparent rate of non-replication, and that choice calls for judgment and explanation.

CONCLUSION 5-2: A number of parametric and nonparametric methods may be suitable for assessing replication across studies. However, a restrictive and unreliable approach would accept replication only when the results in both studies have attained “statistical significance,” that is, when the p -values in both studies have exceeded a selected threshold. Rather, in determining replication, it is important to consider the distributions of observations and to examine how similar these distributions are. This examination would include summary measures, such as proportions, means, standard deviations (uncertainties), and additional metrics tailored to the subject matter.

THE EXTENT OF NON-REPLICABILITY

The committee was asked to assess what is known and, if necessary, identify areas that may need more information to ascertain the extent

of non-replicability in scientific and engineering research. The committee examined current efforts to assess the extent of non-replicability within several fields, reviewed literature on the topic, and heard from expert panels during its public meetings. We also drew on the previous work of committee members and other experts in the field of replicability of research.

Some efforts to assess the extent of non-replicability in scientific research directly measure rates of replication, while others examine indirect measures to infer the extent of non-replication. Approaches to assessing non-replicability rates include

  • direct and indirect assessments of replicability;
  • perspectives of researchers who have studied replicability;
  • surveys of researchers; and
  • retraction trends.

This section discusses each of these lines of evidence.

Assessments of Replicability

The most direct method to assess replicability is to perform a study following the original methods of a previous study and to compare the new results to the original ones. Some high-profile replication efforts in recent years include studies by Amgen, which showed low replication rates in biomedical research ( Begley and Ellis, 2012 ), and work by the Center for Open Science on psychology ( Open Science Collaboration, 2015 ), cancer research ( Nosek and Errington, 2017 ), and social science ( Camerer et al., 2018 ). In these examples, a set of studies was selected and a single replication attempt was made to confirm results of each previous study, or one-to-one comparisons were made. In other replication studies, teams of researchers performed multiple replication attempts on a single original result, or many-to-one comparisons (see e.g., Klein et al., 2014 ; Hagger et al., 2016 ; and Cova et al., 2018 in Table 5-1 ).

Other measures of replicability include assessments that can provide indicators of bias, errors, and outliers, including, for example, computational data checks of reported numbers and comparison of reported values against a database of previously reported values. Such assessments can identify data that are outliers to previous measurements and may signal the need for additional investigation to understand the discrepancy. 4 Table 5-1 summarizes the direct and indirect replication studies assembled by the committee. Other sources of non-replicabilty are discussed later in this chapter in the Sources of Non-Replicability section.

4 There is risk of missing a new discovery by rejecting data outliers without further investigation.

Many direct replication studies are not reported as such. Replication—especially of surprising results or those that could have a major impact—occurs in science often without being labelled as a replication. Many scientific fields conduct reviews of articles on a specific topic—especially on new topics or topics likely to have a major impact—to assess the available data and determine which measurements and results are rigorous (see Chapter 7 ). Therefore, replicability studies included as part of the scientific literature but not cited as such add to the difficulty in assessing the extent of replication and non-replication.

One example of this phenomenon relates to research on hydrogen storage capacity. The U.S. Department of Energy (DOE) issued a target storage capacity in the mid-1990s. One group using carbon nanotubes reported surprisingly high values that met DOE’s target ( Hynek et al., 1997 ); other researchers who attempted to replicate these results could not do so. At the same time, other researchers were also reporting high values of hydrogen capacity in other experiments. In 2003, an article reviewed previous studies of hydrogen storage values and reported new research results, which were later replicated ( Broom and Hirscher, 2016 ). None of these studies was explicitly called an attempt at replication.

Based on the content of the collected studies in Table 5-1 , one can observe that the

  • majority of the studies are in the social and behavioral sciences (including economics) or in biomedical fields, and
  • methods of assessing replicability are inconsistent and the replicability percentages depend strongly on the methods used.

The replication studies such as those shown in Table 5-1 are not necessarily indicative of the actual rate of non-replicability across science for a number of reasons: the studies to be replicated were not randomly chosen, the replications had methodological shortcomings, many replication studies are not reported as such, and the reported replication studies found widely varying rates of non-replication ( Gilbert et al., 2016 ). At the same time, replication studies often provide more and better-quality evidence than most original studies alone, and they highlight such methodological features as high precision or statistical power, preregistration, and multi-site collaboration ( Nosek, 2016 ). Some would argue that focusing on replication of a single study as a way to improve the efficiency of science is ill-placed. Rather, reviews of cumulative evidence on a subject, to gauge both the overall effect size and generalizability, may be more useful ( Goodman, 2018 ; and see Chapter 7 ).

Apart from specific efforts to replicate others’ studies, investigators will typically confirm their own results, as in a laboratory experiment, prior to

TABLE 5-1 Examples of Replication Studies

NOTES: Some of the studies in this table also appear in Table 4-1 as they evaluated both reproducibility and replicability. N/A = not applicable.

a From Cova et al. (2018 , p. 14): “For studies reporting statistically significant results, we treated as successful replications for which the replication 95 percent CI [confidence interval] was not lower than the original effect size. For studies reporting null results, we treated as successful replications for which original effect sizes fell inside the bounds of the 95 percent CI.”

b From Soto (2019 , p. 7, fn. 1): “Previous large-scale replication projects have typically treated the individual study as the primary unit of analysis. Because personality-outcome studies often examine multiple trait-outcome associations, we selected the individual association as the most appropriate unit of analysis for estimating replicability in this literature.”

publication. More generally, independent investigators may replicate prior results of others before conducting, or in the course of conducting, a study to extend the original work. These types of replications are not usually published as separate replication studies.

Perspectives of Researchers Who Have Studied Replicability

Several experts who have studied replicability within and across fields of science and engineering provided their perspectives to the committee. Brian Nosek, cofounder and director of the Center for Open Science, said there was “not enough information to provide an estimate with any certainty across fields and even within individual fields.” In a recent paper discussing scientific progress and problems, Richard Shiffrin, professor of psychology and brain sciences at Indiana University, and colleagues argued that there are “no feasible methods to produce a quantitative metric, either across science or within the field” to measure the progress of science ( Shiffrin et al., 2018 , p. 2632). Skip Lupia, now serving as head of the Directorate for Social, Behavioral, and Economic Sciences at the National Science Foundation, said that there is not sufficient information to be able to definitively answer the extent of non-reproducibility and non-replicability, but there is evidence of p- hacking and publication bias (see below), which are problems. Steven Goodman, the codirector of the Meta-Research Innovation Center at Stanford University (METRICS), suggested that the focus ought not be on the rate of non-replication of individual studies, but rather on cumulative evidence provided by all studies and convergence to the truth. He suggested the proper question is “How efficient is the scientific enterprise in generating reliable knowledge, what affects that reliability, and how can we improve it?”

Surveys of scientists about issues of replicability or on scientific methods are indirect measures of non-replicability. For example, Nature published the results of a survey in 2016 in an article titled “1,500 Scientists Lift the Lid on Reproducibility ( Baker, 2016 )” 5 ; this article reported that a large percentage of researchers who responded to an online survey believe that replicability is a problem. This article has been widely cited by researchers studying subjects ranging from cardiovascular disease to crystal structures ( Warner et al., 2018 ; Ziletti et al., 2018 ). Surveys and studies have also assessed the prevalence of specific problematic research practices, such as a 2018 survey about questionable research practices in ecology and evolution

5 Nature uses the word “reproducibility” to refer to what we call “replicability.”

( Fraser et al., 2018 ). However, many of these surveys rely on poorly defined sampling frames to identify populations of scientists and do not use probability sampling techniques. The fact that nonprobability samples “rely mostly on people . . . whose selection probabilities are unknown [makes it] difficult to estimate how representative they are of the [target] population” ( Dillman, Smyth, and Christian, 2014 , pp. 70, 92). In fact, we know that people with a particular interest in or concern about a topic, such as replicability and reproducibility, are more likely to respond to surveys on the topic ( Brehm, 1993 ). As a result, we caution against using surveys based on nonprobability samples as the basis of any conclusion about the extent of non-replicability in science.

High-quality researcher surveys are expensive and pose significant challenges, including constructing exhaustive sampling frames, reaching adequate response rates, and minimizing other nonresponse biases that might differentially affect respondents at different career stages or in different professional environments or fields of study ( Corley et al., 2011 ; Peters et al., 2008 ; Scheufele et al., 2009 ). As a result, the attempts to date to gather input on topics related to replicability and reproducibility from larger numbers of scientists ( Baker, 2016 ; Boulbes et al., 2018 ) have relied on convenience samples and other methodological choices that limit the conclusions that can be made about attitudes among the larger scientific community or even for specific subfields based on the data from such surveys. More methodologically sound surveys following guidelines on adoption of open science practices and other replicability-related issues are beginning to emerge. 6 See Appendix E for a discussion of conducting reliable surveys of scientists.

Retraction Trends

Retractions of published articles may be related to their non-replicability. As noted in a recent study on retraction trends ( Brainard, 2018 , p. 392), “Overall, nearly 40% of retraction notices did not mention fraud or other kinds of misconduct. Instead, the papers were retracted because of errors, problems with reproducibility [or replicability], and other issues.” Overall, about one-half of all retractions appear to involve fabrication, falsification, or plagiarism. Journal article retractions in biomedicine increased from 50-60 per year in the mid-2000s, to 600-700 per year by the mid-2010s ( National Library of Medicine, 2018 ), and this increase attracted much commentary and analysis (see, e.g., Grieneisen and Zhang, 2012 ). A recent comprehensive review of an extensive database of 18,000 retracted papers

6 See https://cega.berkeley.edu/resource/the-state-of-social-science-betsy-levy-paluck-bitssannual-meeting-2018 .

dating back to the 1970s found that while the number of retractions has grown, the rate of increase has slowed; approximately 4 of every 10,000 papers are now retracted ( Brainard, 2018 ). Overall, the number of journals that report retractions has grown from 44 journals in 1997 to 488 journals in 2016; however, the average number of retractions per journal has remained essentially flat since 1997.

These data suggest that more journals are attending to the problem of articles that need to be retracted rather than a growing problem in any one discipline of science. Fewer than 2 percent of authors in the database account for more than one-quarter of the retracted articles, and the retractions of these frequent offenders are usually based on fraud rather than errors that lead to non-replicability. The Institute of Electrical and Electronics Engineers alone has retracted more than 7,000 abstracts from conferences that took place between 2009 and 2011, most of which had authors based in China ( McCook, 2018 ).

The body of evidence on the extent of non-replicabilty gathered by the committee is not a comprehensive assessment across all fields of science nor even within any given field of study. Such a comprehensive effort would be daunting due to the vast amount of research published each year and the diversity of scientific and engineering fields. Among studies of replication that are available, there is no uniform approach across scientific fields to gauge replication between two studies. The experts who contributed their perspectives to the committee all question the feasibility of such a science-wide assessment of non-replicability.

While the evidence base assessed by the committee may not be sufficient to permit a firm quantitative answer on the scope of non-replicability, it does support several findings and a conclusion.

FINDING 5-1: There is an uneven level of awareness of issues related to replicability across fields and even within fields of science and engineering.

FINDING 5-2: Efforts to replicate studies aimed at discerning the effect of an intervention in a study population may find a similar direction of effect, but a different (often smaller) size of effect.

FINDING 5-3: Studies that directly measure replicability take substantial time and resources.

FINDING 5-4: Comparing results across replication studies may be compromised because different replication studies may test different study attributes and rely on different standards and measures for a successful replication.

FINDING 5-5: Replication studies in the natural and clinical sciences (general biology, genetics, oncology, chemistry) and social sciences (including economics and psychology) report frequencies of replication ranging from fewer than one out of five studies to more than three out of four studies.

CONCLUSION 5-3: Because many scientists routinely conduct replication tests as part of a follow-on work and do not report replication results separately, the evidence base of non-replicability across all science and engineering research is incomplete.

SOURCES OF NON-REPLICABILITY

Non-replicability can arise from a number of sources. In some cases, non-replicability arises from the inherent characteristics of the systems under study. In others, decisions made by a researcher or researchers in study execution that reasonably differ from the original study such as judgment calls on data cleaning or selection of parameter values within a model may also result in non-replication. Other sources of non-replicability arise from conscious or unconscious bias in reporting, mistakes and errors (including misuse of statistical methods), and problems in study design, execution, or interpretation in either the original study or the replication attempt. In many instances, non-replication between two results could be due to a combination of multiple sources, but it is not generally possible to identify the source without careful examination of the two studies. Below, we review these sources of non-replicability and discuss how researchers’ choices can affect each. Unless otherwise noted, the discussion below focuses on the non-replicability between two results (i.e., a one-to-one comparison) when assessed using proximity and uncertainty of both results.

Non-Replicability That Is Potentially Helpful to Science

Non-replicability is a normal part of the scientific process and can be due to the intrinsic variation and complexity of nature, the scope of current scientific knowledge, and the limits of current technologies. Highly surprising and unexpected results are often not replicated by other researchers. In other instances, a second researcher or research team may purposefully make decisions that lead to differences in parts of the study. As long as these differences are reported with the final results, these may be reasonable actions to take yet result in non-replication. In scientific reporting, uncertainties within the study (such as the uncertainty within measurements, the potential interactions between parameters, and the variability of the

system under study) are estimated, assessed, characterized, and accounted for through uncertainty and probability analysis. When uncertainties are unknown and not accounted for, this can also lead to non-replicability. In these instances, non-replicability of results is a normal consequence of studying complex systems with imperfect knowledge and tools. When non-replication of results due to sources such as those listed above are investigated and resolved, it can lead to new insights, better uncertainty characterization, and increased knowledge about the systems under study and the methods used to study them. See Box 5-1 for examples of how investigations of non-replication have been helpful to increasing knowledge.

The susceptibility of any line of scientific inquiry to sources of non-replicability depends on many factors, including factors inherent to the system under study, such as the

  • complexity of the system under study;
  • understanding of the number and relations among variables within the system under study;
  • ability to control the variables;
  • levels of noise within the system (or signal to noise ratios);
  • mismatch of scale of the phenomena and the scale at which it can be measured;
  • stability across time and space of the underlying principles;
  • fidelity of the available measures to the underlying system under study (e.g., direct or indirect measurements); and
  • prior probability (pre-experimental plausibility) of the scientific hypothesis.

Studies that pursue lines of inquiry that are able to better estimate and analyze the uncertainties associated with the variables in the system and control the methods that will be used to conduct the experiment are more replicable. On the other end of the spectrum, studies that are more prone to non-replication often involve indirect measurement of very complex systems (e.g., human behavior) and require statistical analysis to draw conclusions. To illustrate how these characteristics can lead to results that are more or less likely to replicate, consider the attributes of complexity and controllability. The complexity and controllability of a system contribute to the underlying variance of the distribution of expected results and thus the likelihood of non-replication. 7

7 Complexity and controllability in an experimental system affect its susceptibility to non-replicability independently from the way prior odds, power, or p- values associated with hypothesis testing affect the likelihood that an experimental result represents the true state of the world.

The systems that scientists study vary in their complexity. Although all systems have some degree of intrinsic or random variability, some systems are less well understood, and their intrinsic variability is more difficult to assess or estimate. Complex systems tend to have numerous interacting components (e.g., cell biology, disease outbreaks, friction coefficient between two unknown surfaces, urban environments, complex organizations and populations, and human health). Interrelations and interactions among multiple components cannot always be predicted and neither can the resulting effects on the experimental outcomes, so an initial estimate of uncertainty may be an educated guess.

Systems under study also vary in their controllability. If the variables within a system can be known, characterized, and controlled, research on such a system tends to produce more replicable results. For example, in social sciences, a person’s response to a stimulus (e.g., a person’s behavior when placed in a specific situation) depends on a large number of variables—including social context, biological and psychological traits, verbal and nonverbal cues from researchers—all of which are difficult or impossible to control completely. In contrast, a physical object’s response to a physical stimulus (e.g., a liquid’s response to a rise in temperature) depends almost entirely on variables that can either be controlled or adjusted for, such as temperature, air pressure, and elevation. Because of these differences, one expects that studies that are conducted in the relatively more controllable systems will replicate with greater frequency than those that are in less controllable systems. Scientists seek to control the variables relevant to the system under study and the nature of the inquiry, but when these variables are more difficult to control, the likelihood of non-replicability will be higher. Figure 5-2 illustrates the combinations of complexity and controllability.

Many scientific fields have studies that span these quadrants, as demonstrated by the following examples from engineering, physics, and psychology. Veronique Kiermer, PLOS executive editor, in her briefing to the committee noted: “There is a clear correlation between the complexity of the design, the complexity of measurement tools, and the signal to noise ratio that we are trying to measure.” (See also Goodman et al., 2016 , on the complexity of statistical and inferential methods.)

Engineering . Aluminum-lithium alloys were developed by engineers because of their strength-to-weight ratio, primarily for use in aerospace engineering. The process of developing these alloys spans the four quadrants. Early generation of binary alloys was a simple system that showed high replicability (Quadrant A). Second-generation alloys had higher amounts of lithium and resulted in lower replicability that appeared as failures in manufacturing operations because the interactions of the elements were not understood (Quadrant C). The third-generation alloys contained less

Image

lithium and higher relative amounts of other alloying elements, which made it a more complex system but better controlled (Quadrant B), with improved replicability. The development of any alloy is subject to a highly controlled environment. Unknown aspects of the system, such as interactions among the components, cannot be controlled initially and can lead to failures. Once these are understood, conditions can be modified (e.g., heat treatment) to bring about higher replicability.

Physics. In physics, measurements of the electronic band gap of semiconducting and conducting materials using scanning tunneling microscopy is a highly controlled, simple system (Quadrant A). The searches for the Higgs boson and gravitational waves were separate efforts, and each required the development of large, complex experimental apparatus and careful characterization of the measurement and data analysis systems (Quadrant B). Some systems, such as radiation portal monitors, require setting thresholds for alarms without knowledge of when or if a threat will ever pass through them; the variety of potential signatures is high and there is little controllability of the system during operation (Quadrant C). Finally, a simple system with little controllability is that of precisely predicting the path of a feather dropped from a given height (Quadrant D).

Psychology. In psychology, Quadrant A includes studies of basic sensory and perceptual processes that are common to all human beings, such

as the purkinje shift (i.e., a change in sensitivity of the human eye under different levels of illumination). Quadrant D includes studies of complex social behaviors that are influenced by culture and context; for example, a study of the effects of a father’s absence on children’s ability to delay gratification revealed stronger effects among younger children ( Mischel, 1961 ).

Inherent sources of non-replicability arise in every field of science, but they can vary widely depending on the specific system undergoing study. When the sources are knowable, or arise from experimental design choices, researchers need to identify and assess these sources of uncertainty insofar as they can be estimated. Researchers need also to report on steps that were intended to reduce uncertainties inherent in the study or differ from the original study (i.e., data cleaning decisions that resulted in a different final dataset). The committee agrees with those who argue that the testing of assumptions and the characterization of the components of a study are as important to report as are the ultimate results of the study ( Plant and Hanisch, 2018 ) including studies using statistical inference and reporting p -values ( Boos and Stefanski, 2011 ). Every scientific inquiry encounters an irreducible level of uncertainty, whether this is due to random processes in the system under study, limits to researchers understanding or ability to control that system, or limitations of the ability to measure. If researchers do not adequately consider and report these uncertainties and limitations, this can contribute to non-replicability.

RECOMMENDATION 5-1: Researchers should, as applicable to the specific study, provide an accurate and appropriate characterization of relevant uncertainties when they report or publish their research. Researchers should thoughtfully communicate all recognized uncertainties and estimate or acknowledge other potential sources of uncertainty that bear on their results, including stochastic uncertainties and uncertainties in measurement, computation, knowledge, modeling, and methods of analysis.

Unhelpful Sources of Non-Replicability

Non-replicability can also be the result of human error or poor researcher choices. Shortcomings in the design, conduct, and communication of a study may all contribute to non-replicability.

These defects may arise at any point along the process of conducting research, from design and conduct to analysis and reporting, and errors may be made because the researcher was ignorant of best practices, was sloppy in carrying out research, made a simple error, or had unconscious bias toward a specific outcome. Whether arising from lack of knowledge, perverse incentives, sloppiness, or bias, these sources of non-replicability

warrant continued attention because they reduce the efficiency with which science progresses and time spent resolving non-replicablity issues that are caused by these sources do not add to scientific understanding. That is, they are unhelpful in making scientific progress. We consider here a selected set of such avoidable sources of non-replication:

  • publication bias
  • misaligned incentives
  • inappropriate statistical inference
  • poor study design
  • incomplete reporting of a study

We will discuss each source in turn.

Publication Bias

Both researchers and journals want to publish new, innovative, ground-breaking research. The publication preference for statistically significant, positive results produces a biased literature through the exclusion of statistically nonsignificant results (i.e., those that do not show an effect that is sufficiently unlikely if the null hypothesis is true). As noted in Chapter 2 , there is great pressure to publish in high-impact journals and for researchers to make new discoveries. Furthermore, it may be difficult for researchers to publish even robust nonsignificant results, except in circumstances where the results contradict what has come to be an accepted positive effect. Replication studies and studies with valuable data but inconclusive results may be similarly difficult to publish. This publication bias results in a published literature that does not reflect the full range of evidence about a research topic.

One powerful example is a set of clinical studies performed on the effectiveness of tamoxifen, a drug used to treat breast cancer. In a systematic review (see Chapter 7 ) of the drug’s effectiveness, 23 clinical trials were reviewed; the statistical significance of 22 of the 23 studies did not reach the criterion of p < 0.05, yet the cumulative review of the set of studies showed a large effect (a reduction of 16% [±3] in the odds of death among women of all ages assigned to tamoxifen treatment [ Peto et al., 1988 , p. 1684]).

Another approach to quantifying the extent of non-replicability is to model the false discovery rate—that is, the number of research results that are expected to be “false.” Ioannidis (2005) developed a simulation model to do so for studies that rely on statistical hypothesis testing, incorporating the pre-study (i.e., prior) odds, the statistical tests of significance, investigator bias, and other factors. Ioannidis concluded, and used as the title of his paper,

that “most published research findings are false.” Some researchers have criticized Ioannidis’s assumptions and mathematical argument ( Goodman and Greenland, 2007 ); others have pointed out that the takeaway message is that any initial results that are statistically significant need further confirmation and validation.

Analyzing the distribution of published results for a particular line of inquiry can offer insights into potential bias, which can relate to the rate of non-replicability. Several tools are being developed to compare a distribution of results to what that distribution would look like if all claimed effects were representative of the true distribution of effects. Figure 5-3 shows how publication bias can result in a skewed view of the body of evidence when only positive results that meet the statistical significance threshold are reported. When a new study fails to replicate the previously published results—for example, if a study finds no relationship between variables when such a relationship had been shown in previously published studies—it appears to be a case of non-replication. However, if the published literature is not an accurate reflection of the state of the evidence because only positive results are regularly published, the new study could actually have replicated previous but unpublished negative results. 8

Several techniques are available to detect and potentially adjust for publication bias, all of which are based on the examination of a body of research as a whole (i.e., cumulative evidence), rather than individual replication studies (i.e., one-on-one comparison between studies). These techniques cannot determine which of the individual studies are affected by bias (i.e., which results are false positives) or identify the particular type of bias, but they arguably allow one to identify bodies of literature that are likely to be more or less accurate representations of the evidence. The techniques, discussed below, are funnel plots, a p -curve test of excess significance, and assessing unpublished literature.

Funnel Plots. One of the most common approaches to detecting publication bias involves constructing a funnel plot that displays each effect size against its precision (e.g., sample size of study). Asymmetry in the plotted values can reveal the absence of studies with small effect sizes, especially in studies with small sample sizes—a pattern that could suggest publication/selection bias for statistically significant effects (see Figure 5-3 ). There are criticisms of funnel plots, however; some argue that the shape of a funnel plot is largely determined by the choice of method ( Tang and Liu, 2000 ),

8 Earlier in this chapter, we discuss an indirect method for assessing non-replicability in which a result is compared to previously published values; results that do not agreed with the published literature are identified as outliers. If the published literature is biased, this method would inappropriately reject valid results. This is another reason for investigating outliers before rejecting them.

Image

and others maintain that funnel plot asymmetry may not accurately reflect publication bias ( Lau et al., 2006 ).

P -Curve. One fairly new approach is to compare the distribution of results (e.g., p- values) to the expected distributions (see Simonsohn et al., 2014a , 2014b ). P- curve analysis tests whether the distribution of statistically significant p- values shows a pronounced right-skew, 9 as would be expected when the results are true effects (i.e., the null hypothesis is false), or whether the distribution is not as right-skewed (or is even flat, or, in the most extreme cases, left-skewed), as would be expected when the original results do not reflect the proportion of real effects ( Gadbury and Allison, 2012 ; Nelson et al., 2018 ; Simonsohn et al., 2014a ).

Test of Excess Significance. A closely related statistical idea for checking publication bias is the test of excess significance. This test evaluates whether the number of statistically significant results in a set of studies is improbably high given the size of the effect and the power to test it in the set of studies ( Ioannidis and Trikalinos, 2007 ), which would imply that the set of results is biased and may include exaggerated results or false positives. When there is a true effect, one expects the proportion of statistically significant results to be equal to the statistical power of the studies. If a researcher designs her studies to have 80 percent power against a given effect, then, at most, 80 percent of her studies would produce statistically significant results if the effect is at least that large (fewer if the null hypothesis is sometimes true). Schimmack (2012) has demonstrated that the proportion of statistically significant results across a set of psychology studies often far exceeds the estimated statistical power of those studies; this pattern of results that is “too good to be true” suggests that results were either not obtained following the rules of statistical inference (i.e., conducting a single statistical test that was chosen a priori ) or did not report all studies attempted (i.e., there is a “file drawer” of statistically nonsignificant studies that do not get published; or possibly the results were p -hacked or cherry picked (see Chapter 2 ).

In many fields, the proportion of published papers that report a positive (i.e., statistically significant) result is around 90 percent ( Fanelli, 2012 ). This raises concerns when combined with the observation that most studies have far less than 90 percent statistical power (i.e., would only successfully detect an effect, assuming an effect exists, far less than 90% of the time) ( Button et al., 2013 ; Fraley and Vazire, 2014 ; Szucs and Ioannidis, 2017 ; Yarkoni, 2009 ; Stanley et al., 2018 ). Some researchers believe that the

9 Distributions that have more p -values of low value than high are referred to as “right-skewed.” Similarly, “left-skewed” distributions have more p -values of high than low value.

publication of false positives is common and that reforms are needed to reduce this. Others believe that there has been an excessive focus on Type I errors (i.e., false positives) in hypothesis testing at the possible expense of an increase in Type II errors (i.e., false negatives, or failing to confirm true hypotheses) ( Fiedler et al., 2012 ; Finkel et al., 2015 ; LeBel et al., 2017 ).

Assessing Unpublished Literature. One approach to countering publication bias is to search for and include unpublished papers and results when conducting a systematic review of the literature. Such comprehensive searches are not standard practice. For medical reviews, one estimate is that only 6 percent of reviews included unpublished work ( Hartling et al., 2017 ), although another found that 50 percent of reviews did so ( Ziai et al., 2017 ). In economics, there is a large and active group of researchers collecting and sharing “grey” literature, research results outside of peer reviewed publications ( Vilhuber, 2018 ). In psychology, an estimated 75 percent of reviews included unpublished research ( Rothstein, 2006 ). Unpublished but recorded studies (such as dissertation abstracts, conference programs, and research aggregation websites) may become easier for reviewers to access with computerized databases and with the availability of preprint servers. When a review includes unpublished studies, researchers can directly compare their results with those from the published literature, thereby estimating file-drawer effects.

Misaligned Incentives

Academic incentives—such as tenure, grant money, and status—may influence scientists to compromise on good research practices ( Freeman, 2018 ). Faculty hiring, promotion, and tenure decisions are often based in large part on the “productivity” of a researcher, such as the number of publications, number of citations, and amount of grant money received ( Edwards and Roy, 2017 ). Some have suggested that these incentives can lead researchers to ignore standards of scientific conduct, rush to publish, and overemphasize positive results ( Edwards and Roy, 2017 ). Formal models have shown how these incentives can lead to high rates of non-replicable results ( Smaldino and McElreath, 2016 ). Many of these incentives may be well intentioned, but they could have the unintended consequence of reducing the quality of the science produced, and poorer quality science is less likely to be replicable.

Although it is difficult to assess how widespread the sources of non-replicability that are unhelpful to improving science are, factors such as publication bias toward results qualifying as “statistically significant” and misaligned incentives on academic scientists create conditions that favor publication of non-replicable results and inferences.

Inappropriate Statistical Inference

Confirmatory research is research that starts with a well-defined research question and a priori hypotheses before collecting data; confirmatory research can also be called hypothesis testing research. In contrast, researchers pursuing exploratory research collect data and then examine the data for potential variables of interest and relationships among variables, forming a posteriori hypotheses; as such, exploratory research can be considered hypothesis generating research. Exploratory and confirmatory analyses are often described as two different stages of the research process. Some have distinguished between the “context of discovery” and the “context of justification” ( Reichenbach, 1938 ), while others have argued that the distinction is on a spectrum rather than categorical. Regardless of the precise line between exploratory and confirmatory research, researchers’ choices between the two affects how they and others interpret the results.

A fundamental principle of hypothesis testing is that the same data that were used to generate a hypothesis cannot be used to test that hypothesis ( de Groot, 2014 ). In confirmatory research, the details of how a statistical hypothesis test will be conducted must be decided before looking at the data on which it is to be tested. When this principle is violated, significance testing, confidence intervals, and error control are compromised. Thus, it cannot be assured that false positives are controlled at a fixed rate. In short, when exploratory research is interpreted as if it were confirmatory research, there can be no legitimate statistically significant result.

Researchers often learn from their data, and some of the most important discoveries in the annals of science have come from unexpected results that did not fit any prior theory. For example, Arno Allan Penzias and Robert Woodrow Wilson found unexpected noise in data collected in the course of their work on microwave receivers for radio astronomy observations. After attempts to explain the noise failed, the “noise” was eventually determined to be cosmic microwave background radiation, and these results helped scientists to refine and confirm theories about the “big bang.” While exploratory research generates new hypotheses, confirmatory research is equally important because it tests the hypotheses generated and can give valid answers as to whether these hypotheses have any merit. Exploratory and confirmatory research are essential parts of science, but they need to be understood and communicated as two separate types of inquiry, with two different interpretations.

A well-conducted exploratory analysis can help illuminate possible hypotheses to be examined in subsequent confirmatory analyses. Even a stark result in an exploratory analysis has to be interpreted cautiously, pending further work to test the hypothesis using a new or expanded dataset. It is often unclear from publications whether the results came from an

exploratory or a confirmatory analysis. This lack of clarity can misrepresent the reliability and broad applicability of the reported results.

In Chapter 2 , we discussed the meaning, overreliance, and frequent misunderstanding of statistical significance, including misinterpreting the meaning and overstating the utility of a particular threshold, such as p < 0.05. More generally, a number of flaws in design and reporting can reduce the reliability of a study’s results.

Misuse of statistical testing often involves post hoc analyses of data already collected, making it seem as though statistically significant results provide evidence against the null hypothesis, when in fact they may have a high probability of being false positives ( John et al., 2012 ; Munafo et al., 2017 ). A study from the late-1980s gives a striking example of how such post hoc analysis can be misleading. The International Study of Infarct Survival was a large-scale, international, randomized trial that examined the potential benefit of aspirin for patients who had had a heart attack. After data collection and analysis were complete, the publishing journal asked the researchers to do additional analysis to see if certain subgroups of patients benefited more or less from aspirin. Richard Peto, one of the researchers, refused to do so because of the risk of finding invalid but seemingly significant associations. In the end, Peto relented and performed the analysis, but with a twist: he also included a post hoc analysis that divided the patients into the twelve astrological signs, and found that Geminis and Libras did not benefit from aspirin, while Capricorns benefited the most ( Peto, 2011 ). This obviously spurious relationship illustrates the dangers of analyzing data with hypotheses and subgroups that were not prespecified.

Little information is available about the prevalence of such inappropriate statistical practices as p- hacking, cherry picking, and hypothesizing after results are known (HARKing), discussed below. While surveys of researchers raise the issue—often using convenience samples—methodological shortcomings mean that they are not necessarily a reliable source for a quantitative assessment. 10

P- hacking and Cherry Picking. P- hacking is the practice of collecting, selecting, or analyzing data until a result of statistical significance is found. Different ways to p- hack include stopping data collection once p ≤ 0.05 is reached, analyzing many different relationships and only reporting those for which p ≤ 0.05, varying the exclusion and inclusion rules for data so that p ≤ 0.05, and analyzing different subgroups in order to get p ≤ 0.05. Researchers may p- hack without knowing or without understanding the consequences ( Head et al., 2015 ). This is related to the practice of cherry picking, in which researchers may (unconsciously or deliberately) pick

10 For an example of one study of this issue, see Fraser et al. (2018) .

through their data and results and selectively report those that meet criteria such as meeting a threshold of statistical significance or supporting a positive result, rather than reporting all of the results from their research.

HARKing. Confirmatory research begins with identifying a hypothesis based on observations, exploratory analysis, or building on previous research. Data are collected and analyzed to see if they support the hypothesis. HARKing applies to confirmatory research that incorrectly bases the hypothesis on the data collected and then uses that same data as evidence to support the hypothesis. It is unknown to what extent inappropriate HARKing occurs in various disciplines, but some have attempted to quantify the consequences of HARKing. For example, a 2015 article compared hypothesized effect sizes against non-hypothesized effect sizes and found that effects were significantly larger when the relationships had been hypothesized, a finding consistent with the presence of HARKing ( Bosco et al., 2015 ).

Poor Study Design

Before conducting an experiment, a researcher must make a number of decisions about study design. These decisions—which vary depending on type of study—could include the research question, the hypotheses, the variables to be studied, avoiding potential sources of bias, and the methods for collecting, classifying, and analyzing data. Researchers’ decisions at various points along this path can contribute to non-replicability. Poor study design can include not recognizing or adjusting for known biases, not following best practices in terms of randomization, poorly designing materials and tools (ranging from physical equipment to questionnaires to biological reagents), confounding in data manipulation, using poor measures, or failing to characterize and account for known uncertainties.

In 2010, economists Carmen Reinhart and Kenneth Rogoff published an article that showed if a country’s debt exceeds 90 percent of the country’s gross domestic product, economic growth slows and declines slightly (0.1%). These results were widely publicized and used to support austerity measures around the world ( Herndon et al., 2013 ). However, in 2013, with access to Reinhart and Rogoff’s original spreadsheet of data and analysis (which the authors had saved and made available for the replication effort), researchers reanalyzing the original studies found several errors in the analysis and data selection. One error was an incomplete set of countries used in the analysis that established the relationship between debt and economic growth. When data from Australia, Austria, Belgium, Canada,

and Denmark were correctly included, and other errors were corrected, the economic growth in the countries with debt above 90 percent of gross domestic product was actually +2.2 percent, rather than –0.1. In response, Reinhart and Rogoff acknowledged the errors, calling it “sobering that such an error slipped into one of our papers despite our best efforts to be consistently careful.” Reinhart and Rogoff said that while the error led to a “notable change” in the calculation of growth in one category, they did not believe it “affects in any significant way the central message of the paper.” 11

The Reinhart and Rogoff error was fairly high profile and a quick Internet search would let any interested reader know that the original paper contained errors. Many errors could go undetected or are only acknowledged through a brief correction in the publishing journal. A 2015 study looked at a sample of more than 250,000 p- values reported in eight major psychology journals over a period of 28 years. The study found that many of the p- values reported in papers were inconsistent with a recalculation of the p- value and that in one out of eight papers, this inconsistency was large enough to affect the statistical conclusion ( Nuijten et al., 2016 ).

Errors can occur at any point in the research process: measurements can be recorded inaccurately, typographical errors can occur when inputting data, and calculations can contain mistakes. If these errors affect the final results and are not caught prior to publication, the research may be non-replicable. Unfortunately, these types of errors can be difficult to detect. In the case of computational errors, transparency in data and computation may make it more likely that the errors can be caught and corrected. For other errors, such as mistakes in measurement, errors might not be detected until and unless a failed replication that does not make the same mistake indicates that something was amiss in the original study. Errors may also be made by researchers despite their best intentions (see Box 5-2 ).

Incomplete Reporting of a Study

During the course of research, researchers make numerous choices about their studies. When a study is published, some of these choices are reported in the methods section. A methods section often covers what materials were used, how participants or samples were chosen, what data collection procedures were followed, and how data were analyzed. The failure to report some aspect of the study—or to do so in sufficient detail—may make it difficult for another researcher to replicate the result. For example, if a researcher only reports that she “adjusted for comorbidities” within the study population, this does not provide sufficient information about how

11 See https://archive.nytimes.com/www.nytimes.com/interactive/2013/04/17/business/17economixresponse.html .

exactly the comorbidities were adjusted, and it does not give enough guidance for future researchers to follow the protocol. Similarly, if a researcher does not give adequate information about the biological reagents used in an experiment, a second researcher may have difficulty replicating the experiment. Even if a researcher reports all of the critical information about the conduct of a study, other seemingly inconsequential details that have an effect on the outcome could remain unreported.

Just as reproducibility requires transparent sharing of data, code, and analysis, replicability requires transparent sharing of how an experiment was conducted and the choices that were made. This allows future researchers, if they wish, to attempt replication as close to the original conditions as possible.

Fraud and Misconduct

At the extreme, sources of non-replicability that do not advance scientific knowledge—and do much to harm science—include misconduct and fraud in scientific research. Instances of fraud are uncommon, but can be sensational. Despite fraud’s infrequent occurrence and regardless of how

highly publicized cases may be, the fact that it is uniformly bad for science means that it is worthy of attention within this study.

Researchers who knowingly use questionable research practices with the intent to deceive are committing misconduct or fraud. It can be difficult in practice to differentiate between honest mistakes and deliberate misconduct because the underlying action may be the same while the intent is not.

Reproducibility and replicability emerged as general concerns in science around the same time as research misconduct and detrimental research practices were receiving renewed attention. Interest in both reproducibility and replicability as well as misconduct was spurred by some of the same trends and a small number of widely publicized cases in which discovery of fabricated or falsified data was delayed, and the practices of journals, research institutions, and individual labs were implicated in enabling such delays ( National Academies of Sciences, Engineering, and Medicine, 2017 ; Levelt Committee et al., 2012 ).

In the case of Anil Potti at Duke University, a researcher using genomic analysis on cancer patients was later found to have falsified data. This experience prompted the study and the report, Evolution of Translational Omics: Lessons Learned and the Way Forward ( Institute of Medicine, 2012 ), which in turn led to new guidelines for omics research at the National Cancer Institute. Around the same time, in a case that came to light in the Netherlands, social psychologist Diederick Stapel had gone from manipulating to fabricating data over the course of a career with dozens of fraudulent publications. Similarly, highly publicized concerns about misconduct by Cornell University professor Brian Wansink highlight how consistent failure to adhere to best practices for collecting, analyzing, and reporting data—intentional or not—can blur the line between helpful and unhelpful sources of non-replicability. In this case, a Cornell faculty committee ascribed to Wansink: “academic misconduct in his research and scholarship, including misreporting of research data, problematic statistical techniques, failure to properly document and preserve research results, and inappropriate authorship.” 12

A subsequent report, Fostering Integrity in Research ( National Academies of Sciences, Engineering, and Medicine, 2017 ), emerged in this context, and several of its central themes are relevant to questions posed in this report.

According to the definition adopted by the U.S. federal government in 2000, research misconduct is fabrication of data, falsification of data, or plagiarism “in proposing, performing, or reviewing research, or in reporting research results” ( Office of Science and Technology Policy, 2000 , p. 76262). The federal policy requires that research institutions report all

12 See http://statements.cornell.edu/2018/20180920-statement-provost-michael-kotlikoff.cfm .

allegations of misconduct in research projects supported by federal funding that have advanced from the inquiry stage to a full investigation, and to report on the results of those investigations.

Other detrimental research practices (see National Academies of Sciences, Engineering, and Medicine, 2017 ) include failing to follow sponsor requirements or disciplinary standards for retaining data, authorship misrepresentation other than plagiarism, refusing to share data or methods, and misleading statistical analysis that falls short of falsification. In addition to the behaviors of individual researchers, detrimental research practices also include actions taken by organizations, such as failure on the part of research institutions to maintain adequate policies, procedures, or capacity to foster research integrity and assess research misconduct allegations, and abusive or irresponsible publication practices by journal editors and peer review.

Just as information on rates of non-reproducibility and non-replicability in research is limited, knowledge about research misconduct and detrimental research practices is scarce. Reports of research misconduct allegations and findings are released by the National Science Foundation Office of Inspector General and the Department of Health and Human Services Office of Research Integrity (see National Science Foundation, 2018d ). As discussed above, new analyses of retraction trends have shed some light on the frequency of occurrence of fraud and misconduct. Allegations and findings of misconduct increased from the mid-2000s to the mid-2010s but may have leveled off in the past few years.

Analysis of retractions of scientific articles in journals may also shed some light on the problem ( Steen et al., 2013 ). One analysis of biomedical articles found that misconduct was responsible for more than two-thirds of retractions ( Fang et al., 2012 ). As mentioned earlier, a wider analysis of all retractions of scientific papers found about one-half attributable to misconduct or fraud ( Brainard, 2018 ). Others have found some differences according to discipline ( Grieneisen and Zhang, 2012 ).

One theme of Fostering Integrity in Research is that research misconduct and detrimental research practices are a continuum of behaviors ( National Academies of Sciences, Engineering, and Medicine, 2017 ). While current policies and institutions aimed at preventing and dealing with research misconduct are certainly necessary, detrimental research practices likely arise from some of the same causes and may cost the research enterprise more than misconduct does in terms of resources wasted on the fabricated or falsified work, resources wasted on following up this work, harm to public health due to treatments based on acceptance of incorrect clinical results, reputational harm to collaborators and institutions, and others.

No branch of science is immune to research misconduct, and the committee did not find any basis to differentiate the relative level of occurrence

in various branches of science. Some but not all researcher misconduct has been uncovered through reproducibility and replication attempts, which are the self-correcting mechanisms of science. From the available evidence, documented cases of researcher misconduct are relatively rare, as suggested by a rate of retractions in scientific papers of approximately 4 in 10,000 ( Brainard, 2018 ).

CONCLUSION 5-4: The occurrence of non-replicability is due to multiple sources, some of which impede and others of which promote progress in science. The overall extent of non-replicability is an inadequate indicator of the health of science.

This page intentionally left blank.

One of the pathways by which the scientific community confirms the validity of a new scientific discovery is by repeating the research that produced it. When a scientific effort fails to independently confirm the computations or results of a previous study, some fear that it may be a symptom of a lack of rigor in science, while others argue that such an observed inconsistency can be an important precursor to new discovery.

Concerns about reproducibility and replicability have been expressed in both scientific and popular media. As these concerns came to light, Congress requested that the National Academies of Sciences, Engineering, and Medicine conduct a study to assess the extent of issues related to reproducibility and replicability and to offer recommendations for improving rigor and transparency in scientific research.

Reproducibility and Replicability in Science defines reproducibility and replicability and examines the factors that may lead to non-reproducibility and non-replicability in research. Unlike the typical expectation of reproducibility between two computations, expectations about replicability are more nuanced, and in some cases a lack of replicability can aid the process of scientific discovery. This report provides recommendations to researchers, academic institutions, journals, and funders on steps they can take to improve reproducibility and replicability in science.

READ FREE ONLINE

Welcome to OpenBook!

You're looking at OpenBook, NAP.edu's online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.

Do you want to take a quick tour of the OpenBook's features?

Show this book's table of contents , where you can jump to any chapter by name.

...or use these buttons to go back to the previous chapter or skip to the next one.

Jump up to the previous page or down to the next one. Also, you can type in a page number and press Enter to go directly to that page in the book.

Switch between the Original Pages , where you can read the report as it appeared in print, and Text Pages for the web version, where you can highlight and search the text.

To search the entire text of this book, type in your search term here and press Enter .

Share a link to this book page on your preferred social network or via email.

View our suggested citation for this chapter.

Ready to take your reading offline? Click here to buy this book in print or download it as a free PDF, if available.

Get Email Updates

Do you enjoy reading reports from the Academies online for free ? Sign up for email notifications and we'll let you know about new publications in your areas of interest when they're released.

Reassessing Academic Plagiarism

  • Published: 24 May 2023

Cite this article

doing a replication research study is another form of plagiarism

  • James Stacey Taylor   ORCID: orcid.org/0000-0001-8020-8941 1  

474 Accesses

2 Altmetric

Explore all metrics

I argue that wrong of plagiarism does not primarily stem from the plagiarist’s illicit misappropriation of academic credit from the person she plagiarized. Instead, plagiarism is wrongful to the degree to which it runs counter to the purpose of academic work. Given that this is to increase knowledge and further understanding plagiarism will be wrongful to the extent that it impedes the achievement of these ends. This account of the wrong of plagiarism has two surprising (and related) implications. First, it follows from this account of the wrong of plagiarism that replication plagiarism might not be an academic wrong at all. (Replication plagiarism consists of the direct quotation or paraphrase of another’s work without attribution. The replication plagiarist thus plagiarizes primary sources, purloining for her own benefit the ideas of their authors). Second, even if replication plagiarism is still held to be an academic wrong, it will be a lesser wrong than bypass plagiarism. (Bypass plagiarism occurs when one quotes from, or provides a paraphrase of, a primary source, but although one cites the primary source one did not identify the quotation or provide the initial paraphrase oneself. Instead, one took the quotation, or drew upon an existing paraphrase, from a secondary source—and one did so without citing the secondary source to credit it as the source of one’s information about the primary source). Holding that bypass plagiarism is worse than replication plagiarizes reverses the usual assessment of the relative wrong of these two types of plagiarism.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price includes VAT (Russian Federation)

Instant access to the full article PDF.

Rent this article via DeepDyve

Institutional subscriptions

Similar content being viewed by others

doing a replication research study is another form of plagiarism

Plagiarism in research

doing a replication research study is another form of plagiarism

Plagiarism in Philosophy Research

doing a replication research study is another form of plagiarism

For an outline of the various types of plagiarism see the taxonomy developed by Dougherty ( 2020 ), 2-3.

Replication plagiarism can thus be either literal plagiarism or disguised plagiarism, as these are defined by Dougherty ( 2020 ), 2.

Compression plagiarism (where a plagiarizer distills “a lengthy scholarly text into a short one”; Dougherty ( 2020 ), 3) could thus be either replication plagiarism or bypass source plagiarism, depending on whether the plagiarized passage is an exegesis of another source or the expression of ideas original to it.

See, for example, John Finnis’ comments on the allegation that the (then) Supreme Court nominee Neil Gorsuch committed bypass plagiarism. Finnis is quoted by Whelan ( 2017 ).

How academic institutions make decisions concerning hiring, tenure, and promotion in Alternative Academia is mysterious, but competitive committee avoidance is said to be involved.

For an example of a recent work that was published anonymously, see Anonymous ( 2001 ).

This does not imply that plagiarism is always wrong.

In David Lodge’s novella Home Truths , Fanny Tarrant asked “‘Which writers are you thinking of?’” to which the response was given “‘The same ones that you’re thinking of’” (Lodge,  1999 , 44).

This response also indicates that there is a tension in standard thinking about plagiarism. While the credit-based account is widely accepted, the standard response to a revelation of plagiarism (i.e., that it is an offense against the academic community) is not that which would be supported by the credit-based account.

More examples of replication plagiarism are outlined in Part V.

This section is illustrative, rather than accusatory. These examples have all been previously documented. I discuss Kruse’s replication out of the available examples of this type of plagiarism as it has not yet been discussed in the academic literature.

White Flight was based on Kruse’s 2000 PhD thesis. After Magness’ allegations were made public an ad hoc committee of Princeton’s faculty was convened to investigate the charges. They determined that Kruse was not guilty of research misconduct, as his plagiarism was “the result of careless cutting and pasting; there was no attempt made to conceal an intellectual debt”. (Extracted from a confidential report; this quoted section was shared by Kruse with permission of the committee. See Bailey,  2022 ).

Magness’ claim that Kruse plagiarized this list from Bayor is strengthened by his (Magness’) observation that both this list and the sentences that preceded it in Bayor’s Race and the Shaping of Twentieth-Century Atlanta appeared in Kruse’s Cornell dissertation. (Magness,  2022 ).

Magness supports his claim that Kruse plagiarized this anecdote from Fowler by noting that other elements of Kruse’s One Nation Under God “showed similarities to Fowler's prose, with only minor changes; in total, the textual commonalities continue for more than a page” (Magness,  2022 ).

Radin’s concern was with protecting an “ideal of personhood” that “includes the ideal of sexual interaction as equal nonmonetized sharing” (Radin,  1986 , 1921) rather than one on which intimacy was a preferred attribute of sexual interaction. (“[E]qual nonmonetarized sharing” is distinct from intimacy; the former is compatible with casual sex in a way that the latter is not).

Those parts of the original that Brennan and Jaworski misquoted are in bold; words and punctuation that they inserted are in brackets in bold. Note that this passage is excerpted from a much longer one; the first elision is of 13 lines, while the second is of 9.

Almost but not exactly; they introduced a typographical error.

By contrast, while Yew-Kwang Ng ( 2019 , 30) also replicated Brennan and Jaworski’s misquotation of Hayek he cited them as his source of it. He thus avoided committing bypass quotation plagiarism.

Dougherty identifies Schulz only as “N” in his article as at the time of its writing Schulz’s work was only suspected rather than confirmed plagiarism. Subsequent to the publication of Dougherty’s article was retracted owing to its translation plagiarism. See Weinberg ( 2019 ).

The same is not true of a successful bypass plagiarist. This comparison between a forger and a plagiarist occurs frequently in discussions of plagiarism (see e.g., Ritter, 2007 , 734).

For a discussion of this in the context of comparing the wrong of an academic publishing a journal article that contains plagiarized sections and an Op Ed that contains plagiarism, see Hiller and Peters ( 2005 ).

Amatrudo, A. (2012). An intentional basis for corporate personality. International Journal of Law in Context, 8 (3), 413–430. [RETRACTED]

Article   Google Scholar  

Anonymous. (2001). A funny thing happened on the way to the web: A cautionary tale of plagiarism. Law Library Journal, 93 (2), 525–528.

Auslander, P. (2007). Theory for performance studies: A student's guide. New York: Routledge. [RETRACTED]

Bailey, J. (2022). Kruse Cleared of Plagiarism Though Details and Unclear. Plagiarism Today . Retrieved date October 18, 2022, from  https://www.plagiarismtoday.com/2022/10/18/kevin-kruse-cleared-of-plagiarism-though-questions-remain/

Bayor, R. H. (1996). Race and the Shaping of Twentieth-Century Atlanta . University of North Carolina Press.

Book   Google Scholar  

Bieliauskaitė, J. (2021). Solidarity in Academia and its Relationship to Academic Integrity. Journal of Academic Ethics, 19 , 309–322.

Brennan, J, & Jaworski, P. M. (2016). Markets without limits: Moral virtues and commercial interests . New York: Routledge.

Carter, A. M. (2019). The Case for Plagiarism. UC Irvine Law Review, 93 (3), 531–555.

Google Scholar  

Ciceu, A., Rupertus, A. (2022) Princeton professor Kevin Kruse accused of plagiarism in Cornell dissertation, ‘surprised’ by lack of citation. The Daily Princetonian . Retrieved date August 2, 2022, from  https://www.dailyprincetonian.com/article/2022/08/kevin-kruse-plagiarism-allegations-academic-princeton-investigation-phillip-magness-history

Corlett, J. A. (2014). The role of philosophy in academic ethics. Journal of Academic Ethics, 12 , 1–14.

Deal, W. E., & Beal, T. K. (2004). Theory for religious studies. New York: Routledge.

Dougherty, M. V. (2017). Correcting the Scholarly Record in the Aftermath of Plagiarism: A Snapshot of Current-Day Publishing Practices in Philosophy. Metaphilosophy, 48 (3), 258–283.

Dougherty, M. V. (2018). Correcting the scholarly record for research integrity in the aftermath of plagiarism. Cham, Switzerland.

Dougherty, M. V. (2019). The Corruption of Philosophical Communication by Translation Plagiarism. Theoria, 85 , 219–246.

Dougherty, M. V. (2020). Disguised academic plagiarism: A typology and case studies for researchers and editors. Dordrecht: Springer.

Dougherty, M. V., Harsting, P., & Friedman, R. L. (2009). 40 Cases of Plagiarism. Bulletin de Philosophie médiévale, 51 , 350–391.

Dougherty, M. V., & Hochschild, J. P. (2021). Magisterial authority and theological authorship: The harm of plagiarism in the practice of theology. Horizons, 48 (2), 404–455.

Enders, W., Hoover, G. A. (2004). Whose Line Is It? Plagiarism in Economics. Journal of Economic Literature LVII , 487–493.

Fowler, E. M. (1956). 'In God we trust’; Biography of an old American motto. New York Times .

Hannon, M., & Nguyen, J. (2022). Understanding philosophy. Inquiry: An Interdisciplinary Journal of Philosophy .  https://doi.org/10.1080/0020174X.2022.2146186

Hayek, F. A. (1960). The Constitution of Liberty . University of Chicago Press.

Hiller, M. D., & Peters, T. D. (2005). The ethics of opinion in academe: Questions for an ethical and administrative dilemma. Journal of Academic Ethics, 3 , 183–203.

Huh, S. (2010). Plagiarism. Journal of the Korean Medical Association, 53 (12), 1128–1129.

Karabag, S. F., & Berggren, C. (2012). Retraction, dishonesty and plagiarism: Analysis of a crucial issue for academic publishing, and the inadequate responses from leading journals in economics and management disciplines. Journal of Applied Economics and Business Research, 2 (3), 172–183.

Kribbe, H. (2003.) Corporate personality: A political theory of association . PhD thesis, London School of Economics and Political Science.

Kruse, K. M. (2000). White flight: Resistance to desegregation of neighborhoods, schools and businesses in Atlanta, 1946–1966 (PhD thesis, 2000). Ithaca, New York: Cornell University.

Kruse, K. M. (2005). White flight: Atlanta and the making of modern Conservatism. Princeton: Princeton University Press.

Kruse, K. M. (2015) One nation under God: How corporate America invented Christian America. New York: Basic Books.

Kumar, M. N. (2008). A Review of the Types of Scientific Misconduct in Biomedical Research. Journal of Academic Ethics, 6 , 211–228.

Lodge, D. (1999). Home Truths . Penguin.

Magness, P. W. (2022). Is Twitter-famous Princeton historian Kevin Kruse a plagiarist? Reason . Retrieved June 14, 2022, from https://reason.com/2022/06/14/twitter-famous-princeton-historian-kevin-kruse-plagiarist/

Martin, B. (1984). Plagiarism and Responsibility. Journal of Tertiary Education Administration, 6 (2), 183–190.

Mattar, M. Y. (2021). Combating Academic Corruption and Enhancing Academic Integrity through International Accreditation Standards: The Model of Qatar University. Journal of Academic Ethics . https://doi.org/10.1007/s10805-021-09392-7

Michalos, A. C. (2010). Observations on Unacknowledged Authorship from Homer to Now. Journal of Academic Ethics, 8 , 253–258.

Ng, Y. K. (2019). Markets and Morals: Justifying kidney sales and legalizing prostitution . Cambridge University Press.

Paldan, K., Sauer, H., & Wagner, N. F. (2018). Promoting inequality? Self-monitoring applications and the problem of social justice. AI & Society . Retrieved May 5, 2023, from https://doi.org/10.1007/s00146-018-0835-7

Poff, D. (2009). Reflections on Ethics in Journal Publication. Journal of Academic Ethics, 7 , 51–55.

Radin, M. J. (1986). Market-Inalienability. Harvard Law Review, 100 (8), 1849–1937.

Ritter, K. (2007). Yours, Mine, and Ours: Triangulating Plagiarism, Forgery, and Identity. JAC 27, 3/4, 731–743.

Ross, W. D. (1930/2002). The right and the good , ed. Philip Stratton-Lake. Oxford: Clarendon Press.

Satz, D. (2010). Why Some Things Should Not Be for Sale: The Moral Limits of Markets . Oxford University Press.

Schechner, R. (2009). Plagiarism, Greed, and the Dumbing Down of Performance Studies. TDR: The Drama Review, 53 (1), 7–21.

Schulz, P. (2001). Rationality as a Condition for Intercultural Understanding. Studies in Communication Sciences, 1 (2), 81–99 [RETRACTED].

Silver, Ike, & Shaw, Alex. (2018). No Harm, Still Foul: Concerns About Reputation Drive Dislike of Harmless Plagiarizers. Cognitive Science, 42 , 213–240.

Stemmer, P. (1992). Platons Dialektik: Die fruhen und mittleren Dialoge . Walter de Gruyter.

Taylor, J. S. (2022). Markets with limits: How the commodification of academia derails debate . New York: Routledge.

Teixeira, A. A. C, Fontes da Costa, M. (2010). Who Rules the Ruler? On the Misconduct of Journal Editors. Journal of Academic Ethics, 8, 111–128.

The Editors of Vivarium .(2020). Notice: The Retraction of Articles Due to Plagiarism. Vivarium, 5 , 256–274.

US House of Representatives, Committee on the Judiciary. (1964). Report on hearings on Becker amendment first week: Prayer amendment ideas aired by congressmen.

von Platz, J. (2022). Fable of the Deans: The Use of Market Norms in Academia. Reason Papers, 42 (2), 19–32.

Weinberg, J. (2019). Translation Plagiarism in Philosophy. Daily Nous . Available at: https://dailynous.com/2019/10/01/translation-plagiarism-philosophy/

Whelan, E. D. (2017). Bastardized charges. National Review . Retrieved April 5, 2023, from https://www.nationalreview.com/bench-memos/phony-plagiarism-charges-gorsuch/

Download references

Author information

Authors and affiliations.

Department of Philosophy, The College of New Jersey, 2000 Pennington Rd, Ewing, NJ, 08628, USA

James Stacey Taylor

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to James Stacey Taylor .

Ethics declarations

Conflict of interests.

The author has no conflict of interest. No funding was received to assist with the preparation of this manuscript. The author has no relevant financial or non-financial interests to disclose.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Taylor, J.S. Reassessing Academic Plagiarism. J Acad Ethics (2023). https://doi.org/10.1007/s10805-023-09478-4

Download citation

Accepted : 14 April 2023

Published : 24 May 2023

DOI : https://doi.org/10.1007/s10805-023-09478-4

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Bypass plagiarism
  • Margarat Jane Radin
  • Prostitution 
  • Find a journal
  • Publish with us
  • Track your research
  • Open access
  • Published: 09 January 2019

Replicability and replication in the humanities

  • Rik Peels   ORCID: orcid.org/0000-0001-8107-5992 1  

Research Integrity and Peer Review volume  4 , Article number:  2 ( 2019 ) Cite this article

35k Accesses

19 Citations

58 Altmetric

Metrics details

A large number of scientists and several news platforms have, over the last few years, been speaking of a replication crisis in various academic disciplines, especially the biomedical and social sciences. This paper answers the novel question of whether we should also pursue replication in the humanities. First, I create more conceptual clarity by defining, in addition to the term “humanities,” various key terms in the debate on replication, such as “reproduction” and “replicability.” In doing so, I pay attention to what is supposed to be the object of replication: certain studies, particular inferences, of specific results. After that, I spell out three reasons for thinking that replication in the humanities is not possible and argue that they are unconvincing. Subsequently, I give a more detailed case for thinking that replication in the humanities is possible. Finally, I explain why such replication in the humanities is not only possible, but also desirable.

Peer Review reports

Scientists and various news platforms have, over the last few years, increasingly been speaking of a replication crisis in various academic disciplines, especially the biomedical Footnote 1 and social sciences. Footnote 2 The main reason for this is that it turns out that large numbers of studies cannot be replicated, that is (roughly), they yield results that appear not to support, or to count against, the validity of the original finding. Footnote 3 This has been and still is an important impulse for composing and adapting various codes of research integrity. Moreover, in December 2017, the National American Academies convened the first meeting of a new study committee that will, for a period of 18 months, study “Reproducibility and Replicability in Science,” a project funded by the National Science Foundation. Footnote 4 Finally, over the last few years, various official reports on replication have been published. At least five of them come to mind:

The 2015 report by the National Science Foundation: Social, Behavioral, and Economic Sciences Perspectives on Robust and Reliable Science

The 2015 symposium report by the Academy of Medical Sciences: Reproducibility and Reliability of Biomedical Research

The 2016 workshop report by the National Academies of Sciences: Statistical Challenges in Assessing and Fostering the Reproducibility of Scientific Results [ 1 ]

The 2016 report by the Interacademy Partnership for Health: A Call for Action to Improve the Reproducibility of Biomedical Research

The 2018 advisory report by the Royal Netherlands Academy of Arts and Sciences Replication Studies Footnote 5

These documents state what the problem regarding replication is, they explain how we should think of the nature and value of replication, and they make various recommendations as to how to improve upon replicability. There are many causes for lack of replicability and failure to successfully replicate upon attempting to do so. Among them are (i) fraud, falsification, and plagiarism, (ii) questionable research practices, partly due to unhealthy research systems with perverse publication incentives, (iii) human error, (iv) changes in conditions and circumstances, (v) lack of effective peer review, and (vi) lack of rigor. Footnote 6 Thus, we also need a wide variety of measures to improve on replicability. In this article, I will take each of the reports mentioned above into consideration, but pay special attention to the KNAW report, since it is the most recent one and it has taken the findings of the other reports into account.

The issue of replicability and replication in academic research is important for various reasons. Let me mention four of them: (i) results that are consistently replicated are likely to be true, all else being equal, that is, controlling for such phenomena as publication bias and assuming that the other assumptions in the relevant theory or model are valid, (ii) replicability prevents the waste of (financial, time, etc.) resources, since studies that cannot be consistently replicated are less likely to be true, (iii) results that are not replicable are, if they are applied, more likely to cause harm to individuals, animals, and society (e.g., by leading to mistaken economic measures or medicine that is detrimental to people’s health), and (iv) if too many results turn out not to be replicable, upon attempting to replicate them, that will gradually erode public trust in science. Footnote 7

Now, reports about replication focus on various quantitative empirical sciences . Footnote 8 The KNAW Advisory Report, for instance, makes explicit that it is confined to the medical sciences, life sciences, and psychology. Footnote 9 These reports, though invite researchers from other disciplines to consider the relevance of these documents and recommendations for their own fields. That is precisely the purpose of this paper: to explore to what extent replication is possible and desirable in another important field of scholarly activity, namely the humanities . After all, many humanistic disciplines, such as history, archeology, linguistics, and art theory are thoroughly empirical : they are based on the collection of data (as opposed to the deductive lines of reasoning that we find in mathematics, logic, parts of ethics, and metaphysics). This naturally leads to the question whether replication is also possible in the humanities.

How we should think of replication in the humanities is something that has not received any attention so far, except for a couple of articles that I co-authored with Lex Bouter. Footnote 10 Maybe this is because it is questionable whether replication is even possible in the humanities. There are various reasons for this. First, the study objects in the humanities are often unique phenomena, such as historical events, so that it is not clear in what sense one could replicate a study. Second, one might think that various methods in the humanities, such as the hermeneutical method in studying a text, do not lend themselves well to replication—at least not as well as certain methods in the quantitative empirical sciences, where one can carry out an experiment with similar data under similar circumstances. Third, the objects of humanistic research, in opposition to the objects of research in the natural sciences, are often object with meaning and value , objects such as paintings, texts, statues, and buildings—in opposition to, say, such objects as atoms and viruses that are studied in the natural sciences. One might think that the inevitably normative nature of these humanistic objects makes replication impossible. It remains to be seen, though, whether these objections hold water. I return to each of them below.

Introduction

In order to answer the question of whether replication is possible and, if so, desirable in the humanities, I first create more conceptual clarity by defining, in addition to the term “humanities,” various key terms in the debate on replication, such as “reproduction” and “replicability.” In doing so, I pay attention to what is supposed to be the object of replication: certain studies, particular inferences, of specific results. After that, I lay out three reasons for thinking that replication in the humanities is not possible and argue that they are unconvincing. Subsequently, I give a more detailed case for thinking that replication in the humanities is possible. Finally, I explain why such replication in the humanities is not only possible , but also desirable .

Defining the key terms

We can be rather brief about the term “humanities.” There is a debate on what should count as a humanistic discipline and what not. Rather than entering that debate here, I will simply stipulate that, for the sake of argument, I take the following disciplines to belong to the humanities: anthropology; archeology; classics; history; linguistics and languages; law and politics; literature; the study of the performing arts, such as music, theater, and dance; the study of the visual arts, such as drawing, painting, and film; philosophy; theology; and religious studies. This captures what most people take to fall under the umbrella of “humanities” and that will do for the purposes of this paper. Footnote 11

Let us now move on to replication. There are at least two complicating factors when it comes to the issue of replication in the humanities: there is a wide variety of terms and many of these terms have no definition that is widely agreed upon. I have the following eight terms in mind: “replication study,” “replicability,” “replication,” “reproduction,” “reproducibility,” “robustness,” “reliability,” and “verifiability.” Here, I will put the final three terms, namely “robustness,” “reliability,” and “verifiability” aside, since the points I want to make about replication in the humanities do not depend on them. Footnote 12 Also, I take “replication” and “reproduction” to be synonyms, as I do “replicability” and “reproducibility.” Footnote 13 I will, therefore, focus on the three remaining terms, to wit “replication studies,” “replicability,” and “replication.”

Let us define “replication study” as follows:

Replication study

A replication study is a study that is an independent repetition of an earlier, published study, using sufficiently similar methods (along the appropriate dimensions) and conducted under sufficiently similar circumstances . Footnote 14

Clearly, this definition requires some explanation. First, it counts both studies that are meant as close or exact replication and studies designed as conceptual replication as replication studies. There are, of course, crucial differences between these kinds of replication, but they both count as replication studies and that is exactly what the above definition is meant to capture. A recent call for replication studies by the Netherlands Organization for Scientific Research (NWO), for instance, distinguishes three kinds of replication Footnote 15 :

Replication with existing data and the same research protocol and the same research question: repeated analysis of the datasets from the original study with the original research question (sometimes more narrowly referred to as a “reproduction”).

Replication with a new data collection and with the same research protocol and the same research question as the original study (often referred to as a “direct replication”).

Replication with new data and with a new or revised research protocol Footnote 16 : new data collection with a different design from the original study in which the research question remains unchanged compared to that of the original study (often referred to as a “conceptual replication”). Footnote 17

An advantage of the above definition of “replication study” is that it captures these three varieties of replication studies. It is, of course, perfectly compatible with my definition to make these further distinctions among varieties of replication studies.

Second, the definition states that the new study should in some sense be independent from the original study. Unfortunately, reports on replication usually do not define what it is for a study to be independent from an earlier one. Footnote 18 It seems to me that the right way to understand “independence” here is that the new study should not in any way depend on the results of the original study .

However, can we be more precise about how the results of the new study should not depend on those of the original study? The most obvious meaning of this phrase is that the new study should not take all the original results for granted —that is, it should not assume their truth or correctness in its line of reasoning (even though, it can of course do so merely for the sake of argument ). Dependence, however, is a matter of degree: one can, for instance, assume certain results or certain aspects of certain results in order to replicate other results or other aspects of results. Below, we return to the issue of degrees when we consider in what sense results of the new study should agree with the results of the original study.

This means that various other kinds of dependence are perfectly legitimate for a replication study. For example, the new study can depend on the same instruments as those used in the original study, on the same research protocol (e.g., in a repetition of an earlier study), and, in some cases, even on the original researchers or at least partly so in the case of a collaborative team with the original researchers and new researchers. It can perfectly well depend on these things in that it is no problem if the original study and the new study have the same instruments, the same research protocol, and consists of the same group of researchers—at least for some kinds of replication.

Third and finally, the definition states that the methods used and the circumstances in which the study is carried out should be “sufficiently similar.” That means that they need not be identical—that may be the case (or something very close to that), but that is not required for a replication study. It also means that they should not be completely different—that is excluded by its being a replication study. But exactly when are they “sufficiently similar?”

This is a complex issue that others have addressed in detail. For instance, Etienne LeBel and others provide a replication taxonomy that understands replication as a graded phenomenon: it ranges from an exact replication (all facets that are under the researchers’ control are the same) to a very far replication (independent variables (IV) or dependent variables (DV) constructs are different), with, respectively, very close replication, close replication, and far replication in-between. The design facets that their taxonomy pays attention to are such things as effect or hypothesis, IV construct, DV construct, operationalization, population (e.g., seize), IV stimuli, DV stimuli, procedural details, such as task instructions and font size, physical setting, and contextual variables (they indicate that the list can be extended). Footnote 19 What this goes to show is that replication is a matter of degree and that in assessing the epistemic status of a replication, one should try to locate it on a replication continuum.

This brings us to the second key term, “replicability.” It seems to me that this term is used in two crucially different ways, in the KNAW Advisory Report as well as in the broader literature on replication studies. In order to keep things clear, I would like to distinguish the two and will refer to the former as “replicability” and to the latter as “replication.” I define them as follows:

  • Replicability

A study is having certain features such that a replication study of it could be carried out .

  • Replication

A study is being such that a repetition of it has successfully been carried out, producing results that agree with the original study . Footnote 20

Some philosophers of science and scholars in research integrity use the term “transparency” for what I dub “replicability” here. Footnote 21 Clearly, replicability, as I understand it here, has much to do with transparency: a study can be replicated only if the researchers are sufficiently transparent about the data, the method, the inferences, and so on. Still, I prefer to use the term “replicability” rather than “transparency,” given the purposes of this paper. This is because some humanistic scholars, as we shall see below, think that studies can be perfectly transparent and yet such that they cannot be replicated. If so, they are not replicable, but not because of any scholarly shortcoming. Rather, it would be the nature of the beast (a humanistic study, or a particular kind of humanistic study, such as one about value or meaning) that prevents the possibility of replication.

Thus, replicability is a desideratum for at least many studies in the quantitative empirical sciences (I return to the humanities below): we want them to be set-up and described in such a way that, in principle, we could carry out a replication study. Precise definitions, a clear description of the methodology (in the research protocol), a clear overview of the raw data, a lucid analysis of the data, and so on, all contribute to the replicability of a study. One of the things the replication crisis has made clear is that many studies in the empirical sciences fail to meet the criterion of replicability: we cannot carry out a replication study of them, since the key terms are not even sufficiently clearly defined, the method is underdescribed, the discussion is not transparent, the raw data are not presented in a lucid way, or the analysis of the data is not clearly described.

Replicability should be clearly distinguished from replication . Replication entails replicability (you cannot replicate what is not replicable), but requires significantly more, namely that a successful replication has actually taken place, producing results that agree with the results of the original study. Thus, in a way this distinction is similar to Karl Popper’s famous distinction between falsifiability and falsification. Footnote 22 Falsifiability is a desideratum for any scientific theory: very roughly, a theory should be such that it is in principle falsifiable. Falsification entails falsifiability, but goes a step further, because a falsified theory is a theory that is not only falsifiable, but that has in fact also been falsified. I said “roughly,” because, as Brian Earp has argued in more detail, things are never so simple when it comes to falsification: even if an attempt at falsification has taken place and the new data seem to count against the original hypothesis, one might often just as well, say, question an auxiliary assumption, consider whether a mistake was made in the original study, or wonder whether perhaps the original effect is a genuine effect but one that can only be obtained under specific conditions. Footnote 23 Nevertheless, falsification is often still considered as a useful heuristic in judging the strength of a hypothesis. Footnote 24 Now, the obvious difference with the issue at hand is that, even though both falsifiability and replicability are desiderata, replication is a good thing, because it makes it, all else being equal, likely that results are true, whereas falsification is in a sense a bad thing, because it makes it likely that a theory is false. Footnote 25

A replication study, then, is a study that aims at replication. Such replication may fail either because the original study turns out not to be replicable in the first place or because, even though it is replicable, a successful replication does not occur. A successful replication occurs if the results of the new study agree with those of the original study or, slightly more precisely, if the results of the two studies are commensurate. Exactly what is it, though, for results to be commensurate? As several reports on replication point out Footnote 26 it is not required that the results are identical—that would be too demanding in, say, many biomedical sciences. Again, it seems that “agreeing” is a property of results that comes in degrees . More precisely, we can distinguish at least the following senses, in order of increasing strength:

The studies’ conclusions have the same direction (e.g., both studies show a positive correlation between X and Y);

The studies’ conclusions have the same direction and the studies have a similar effect size (e.g., in both studies, Y is three times as large with X as it is with non-X; in some disciplines: the relative risk is three (RR = 3));

The studies’ conclusions have the same direction , and the studies have a similar effect size and a similar p value , confidence interval , or Bayes factor (e.g., for both studies, RR = 3 (1.5–5.0)). Footnote 27

The stronger the criterion for the sense in which studies results “agree,” the lower—ceteris paribus—the percentage of successful replications will be, at least when it comes to quantitative empirical research.

Now, what does a typical replication study look like? The aforementioned KNAW Advisory Report sketches four characteristics: it “(a) is carried out by a team of independent investigators; (b) generates new data; (c) follows the original protocol closely and justifies any deviations; and (d) attempts to explain the resulting degree of reproducibility.”. Footnote 28 Thus, even though, as I pointed out above, independence does not require that the replication study be carried out by different researchers than the original study, this is nonetheless often the case. Below, we will explore to what extent we encounter the combination of these characteristics in the humanities.

Before we move on to replicability and replication in the humanities, I would like to make two preliminary points. First, we should note that it follows from the definitions of “replicability” and “replication” given in this section that both replicability and replication are a matter of degree . Footnote 29 Replication studies can be pretty much identical to the original study, but very often there are slight or even somewhat larger alterations in samples, instruments, conditions, researcher skills, the body of researchers, and sometimes even changes in the method. One can change the method, for instance, in order to explore whether a similar finding can be obtained by way of a rather different method, or a finding that would similarly support one of the relevant underlying hypotheses, at least if the auxiliary assumptions are also met. Every replication study can be located on a continuum that goes from being a replication almost identical to the original study to hardly being a replication at all. The closer the replication study topic is to the topic of the original study, the more it counts as a replication study, and, similarly, for method, samples, conditions, and so on. How we ought to balance these various factors in assessing how much of a replication a particular study is, is a complicated matter that we need not settle here; all we need to realize is that replication is something that comes in degrees. As I briefly spelled out above, in laying out Etienne LeBel’s replication taxonomy, a study can be more or less of a replication of an original study. Footnote 30

Second, exactly what is it that should be replicable in a good replication study? There are at least three candidates here: the study as a whole, the inferences involved in the study, and the results of the study. Footnote 31 I will focus on the replicability of a study’s results . After all, as suggested in our discussion above, we want to leave room for the possibility of a direct replication (which uses new data, so that the study as a whole is not replicated), and a conceptual replication (which uses new data and a new research protocol, so that neither the study as a whole nor its specific inferences are replicated). This means that a study is replicable if a new study can be carried out, producing results that might agree with those of the original study in the sense specified above.

Potential obstacles to replication in the humanities

Now, one might think that, in opposition to the quantitative empirical sciences, such as the biomedical sciences, the humanities are not really suited for the phenomenon of replication. In this section, I discuss three arguments in support of this claim.

1. The first objection to the idea that replication is possible in the humanities is that, frequently, the study object in the humanities is unique Footnote 32 : there was one French Revolution in 1789–1799, there is one novel of Virginia Woolf named To the Lighthouse (1928), pieces of architecture, such as Magdalen College’s library in Oxford, are unique, and so on. Viruses, atoms, leg fractures, Borneo’s rhinos, economic measures, and many other study objects in the empirical sciences, have multiple instances. In a replication study one can investigate a different instance or token than the one studied in the original study; an instance or token of the same type.

However, this objection fails for two reasons. On the one hand, many study objects in the humanities do have multiple instances. On the other hand, quite a few study objects in the empirical sciences are unique. As to the former: Virginia Woolf’s To the Lighthouse is unique, but it is also one of many instances of novels using a stream-of-consciousness-narrative technique; the French Revolution is unique, but it is an instance of a social revolution, of which the American Revolution in 1775–1783 and the Russian Revolution in 1917 are other examples. Magdalen College library can be compared to other college libraries in Oxford, to other libraries across the country, and to other buildings in the late fifteenth century. And so on. Parts of linguistics study grammatical structures that, by definition, have many instances, as will be clear from any introduction to morphosyntax. Footnote 33 As to the quantitative empirical sciences: the big bang, the coming into existence of life on earth, space-time itself, and many other phenomena studied in the empirical sciences are unique phenomena: there is only one instance of them. Thus, the idea that the empirical sciences study phenomena that have multiple instances, whereas the humanities study unique phenomena is, as a general claim, untenable.

Second and more importantly, whether or not the object of study is unique or not is immaterial to the issue of the replicability of a study on that object. After all, one may study an object several times and studying it several times may even generate new data (a typical property of many replication studies, as we noted in the previous section). For example, even though the French revolution was a unique historical event (or a unique series of events), that event comprises so many data, laid down in artifacts, literary accounts, paintings, and so on, that it is possible to repeat a particular method—say, studying a text—and even discover new things about that unique event. Footnote 34

2. A second argument against the idea that replication is possible in the humanities is that many methodologies that are employed in the humanities do not lend themselves well to replication. By replicating an empirical study, say, on whether or not patients with incident migraine, in comparison with the general population, have higher absolute risks of suffering from myocardial infarction, stroke, peripheral artery disease, atrial fibrillation, and heart failure Footnote 35 one can, in principle, apply the same method or a similar method to new patients (say, a population from a different country). One can generate new data , thus making it likely—if replications consistently deliver sufficiently similar results—that the original results are true. One might think that no such thing takes place when one employs the methods of the humanities.

In response to this objection, I think it is important to note that there is a wide variety of methods used in the humanities. Among them are: more or less formal logic (in philosophy, theology, and law), literary analysis (in literary studies, philosophy, and theology), historical analysis (in historical studies, philosophy, and theology) and various narrative approaches Footnote 36 (in historical studies), constructivism (in art theory, for instance), Socratic questioning (in philosophy), methods involving empathy (in literary studies and art studies), conceptual analysis (in philosophy and theology), the hermeneutical method (in any humanistic discipline that involves careful reading of texts, such as law, history, and theology), interviews (e.g., in anthropology), and phenomenology (in philosophy). This is important to note, because, as I pointed out above, I only want to argue that replication is possible in the humanities to the extent that they are empirical . Replication may not be possible in disciplines that primarily use a deductive method and that do not collect and analyze data, such as logic, mathematics, certain parts of ethics, and metaphysics. This leaves plenty of room for replication in disciplines that are empirical, such as literary studies, linguistics, history, and the study of the arts.

Take the hermeneutical method. Does reading a text again make it, all else being equal, likely that one’s interpretation is correct? It seems to me the answer here has to be positive. There are at least two reasons for that. First, one may have made certain mistakes in one’s original reading and interpretation: faulty reading, sloppy analysis, forgetting relevant passages, and so on, on the first occasion may play a role. If one’s second interpretation differs from the first, one will normally realize that and revisit the relevant passage, comparing which of the two interpretations is more plausible. This will generally increase the likelihood that one comes to a correct interpretation of, say, the relevant passage in Ovid. Second, if one re-reads certain passages that will be with new background beliefs, given that humanistic scholars gradually acquire more knowledge in the course of their lives. That may lead to a new interpretation. Unless one thinks that new beliefs are as likely to be false as true—which seems implausible—carefully re-reading a passage with relevant new background beliefs and coming to the same result increases the likelihood of truth of one’s interpretation. These two points apply a forteriori when other rather than the same humanistic scholars apply the same method of interpretation (the hermeneutical approach or a historical-critical methodology) to the same text. They will come to an interpretation and compare it with the original one; if it differs, they are likely to revisit relevant passages and, thereby, filter out forgetting, sloppiness, and mistakes. Footnote 37 And, of course, they bring new background knowledge to a text. That as well makes it likely that when a study is consistently replicated, then, all else being equal, the original study results are likely to be true.

3. A third objection to the idea that replication is possible in the humanities, is that many of the study objects in the humanities are normative in the sense that they are objects of value and meaning, whereas this is not the case in many of the natural and biomedical sciences. René van Woudenberg, for instance, has argued in a recent paper that the objects of the humanities are such meaningful and/or valuable things as words, sentences, perlocutionary acts, buildings and paintings, music, and all sorts of artifacts. Molecules, laws of nature, diseases, and the like lack that specific sort of meaning and value. Footnote 38

In reply, let me say that I will grant the assumption that the humanities are concerned with objects of value and meaning, whereas the sciences are not (or at least not with those aspects of those objects). I think this is not entirely true: some humanistic disciplines, such as metaphysics, are also concerned with objects that do not have meaning or value, such as numbers or the nature of space-time. It will still be true for most humanistic disciplines, though.

However, this point is not relevant for the issue of replication. This can be seen by considering, on the one hand, a scenario in which knowledge about value and meaning is not possible and, on the other, a scenario in which knowledge about value and meaning is possible. First, imagine that it is impossible to uncover knowledge about objects with value and meaning and specifically about those aspects of those objects that concern value and meaning. One may think, for instance, that there are no such facts about value and meaning Footnote 39 or that they are all socially constructed, so that it would not be right to say that the humanities can uncover them. Footnote 40 This is, of course, a controversial issue. Here, I will not delve into this complex issue, which would merit a paper or more of its own. Rather, I would like to point out that if it is indeed impossible to uncover knowledge about value and meaning, then that is a problem for the humanities in general , and not specifically for the issue of replication in the humanities. For, if there is no value and meaning, or if all value and meaning is socially constructed and the humanities can, therefore, not truly uncover value and meaning, one may rightly wonder to what extent humanistic scholarship as an academic discipline is still possible.

Now, imagine, on the other hand, that it is possible to uncover knowledge about objects with value and meaning and even about those aspects of those objects that specifically concern value and meaning. Then, it seems possible to uncover such knowledge and understanding about the aspects that involve value and meaning multiple times for the same or similar objects. And that would mean that in that case, it would very well be possible to carry out a replication study that involves conclusions about value and meaning. Of course, given the fact that the objects have value and meaning, it might sometimes be harder to reach agreement among scholars. After all, background assumptions bear heavily on issues concerning value and meaning. However, as several examples below show, agreement about issues concerning value and meaning is still quite often possible in the humanities.

I conclude that three main reasons for thinking that replication is not possible in the humanities do not hold water.

A positive case for the possibility of replication in the humanities

So far, I have primarily deflected three objections to the possibility of replication in the humanities. Is there actually also a more detailed, positive case to be made for the possibility of replication in the humanities? Yes. In this section, I shall provide such a case.

My positive, more substantive case is an inductive one: there are many cases of replication studies in the humanities in the sense stipulated above: a study’s being such that a replication of it has successfully been carried out, producing results that agree with the original study. Moreover, they often meet the four stereotypical properties mentioned above: (a) they are carried out by a team of independent investigators; (b) they generate new data; (c) they follow the original protocol (or, at least, method description) closely and justify any deviations; and (d) attempt to explain the resulting degree of reproducibility.

Here is an example: re-interpreting Aurelius Augustine’s (354–430 AD) writings in order to see to what extent he continued to embrace or rejected Gnosticism. Using the hermeneutical method Footnote 41 —with such principles as that one should generally opt for interpretations of passages that make the text internally coherent, that one should, in interpreting a text, take its genre into account, and so on—and relevant historical background knowledge, it has time and again been confirmed that Augustine came to reject the basic tenets of Gnosticism—such as the Manicheistic idea that good and evil are two equally powerful forces in the world, but that it continued to exercise influence upon his thought—for instance, when it comes to his assessment of the extent to which we can enjoy things in themselves ( frui ) or merely for the sake of some higher good, namely God ( uti ). Footnote 42 Various independent researchers have argued this, in doing so they came up with new data (new passages or new historical background knowledge), they used the same hermeneutical or historical-critical method, and explained the consonance with the original results (and thus the successful replication, even though they would not have used that word) by sketching a larger picture of Augustine’s thought that made sense of his relation to Gnosticism.

Here is another example of a study that employs the hermeneutical method. The crucial difference with the previous example is that this is still a hotly debated issue and that it is not clear exactly what counts as a replication, since it is not clear that advocates and opponents share enough background beliefs in order to properly execute a replication study; only the future will tell us whether that is indeed the case. What I have in mind is the so-called New Perspective on Paul in New Testament theology. Since the 1960s, Protestant scholars started to interpret the New Testament letters of Paul differently from how they had been understood by Protestants so far. Historically, Lutherans and Reformed theologians had understood Paul as arguing that the good works of faith do not factor into their salvation—only faith itself would (in a slogan: sola fide ). The New Perspective, advocated by Ed Parish Sanders and Tom Wright, Footnote 43 however, has it that Paul was not so much addressing good works in general, but specific Jewish laws regarding circumcision, dietary laws, Sabbath laws, and other laws the observance of which set Jews apart from other nations. The New Perspective has been embraced by most Roman Catholic and Orthodox theologians and a substantial number of Protestants theologians, but is still very much under debate. Thus, we should not conclude from the fact that some studies that employ the hermeneutic method are replicable that all of them are: some of them may involve too many controversial background assumptions in order for a fairly straightforward replication to be possible.

However, it is easy to add examples of studies from other humanistic fields that meet the criterion of replicability. Here are two of them that use a different method than the hermeneutical one:

The granodiorite stele that was named the Rosetta Stone and that was found in 1799, has texts both in Ancient Egyptian, using hieroglyphic and Demotic script, and an Ancient Greek text. The differences in the content of these three texts are minor. The stone has turned out be the key in deciphering Egyptian hieroglyphs. A large number of scholars have studied the stone in detail and the most important results have been replicated multiple times. Footnote 44

It was established in 2013 by way of various methods—such as study of the materials, chemical composition, painting style, and a study of his letters—that the painting Sunset at Montmajour is a true Van Gogh. It was painted on July 4, 1888. If one has the right background knowledge and skills, one can fairly easily study the same data or collect further data in order to replicate this study. Footnote 45

I take the examples given so far to be representative and, therefore, to provide an inductive argument for the possibility of replication in the humanities: it turns out that in a variety of humanistic fields that employ different methods replication is possible.

Now, the KNAW Advisory Report Replication Studies mentions three things to pay attention to in carrying out a replication study: (i) look at the raw data, the final outcomes (results/conclusions) and/or everything in between, (ii) take a rigorous statistical approach or a more qualitative approach in making the comparison between the original study and the replication study, and (iii) define how much similarity is required for a successful replication. This is important, for it means that even the specific way in which a replication study is supposed to be carried out can be copied in a replication study in the humanities. After all, it is possible (i) to compare the original data (say, certain texts, archeological findings, the occurrence of certain verbs, and so on), the conclusions of the original study and the replication study, and everything in between, (ii) to take a qualitative approach and sometimes even, if not a rigorous statistical approach, at least a more quantitative approach, e.g., by counting the number of verbs in Shakespeare’s plays that end in “th” or “st,” and (iii) to define how much similarity between the original results and the results in the replication study is required for something’s being a successful replication, even though this will be harder or impossible to quantify , in opposition to many studies in, say, psychology and economics.

The desirability of replicability and replication in the humanities

It is widely agreed that replicability is a desideratum and replication an epistemically positive feature of a study in the quantitative empirical sciences. Given that, as we have seen in the preceding sections, replication is possible in the humanities, is it something we should pursue? Should we desire that studies be replicable and that a significant number of them be replicated, that is, that they are indeed replicated with a positive, confirming outcome?

The answer has to be: Yes. After all, if, as I argued, replication is possible in the humanities and consistent replication makes it likely that the results of the original study are true, then carrying out such replication studies contributes to such core epistemic aims of the academic enterprise as knowledge, insight, and understanding—which all require truth. Of course, one will have to find the right balance between carrying out new research—with, possibly or likely, stumbling upon new truths, never found before—and replicating a study and thereby making it likely that the original study results are true. However, there is nothing special about the humanities when it comes to the fact that we need to find the right balance between various intellectual goals: we need to find the right balance in any discipline—medicine, psychology, and economics included. This is not to deny that there may be important differences between various fields. Research indicates that as much as 70% of studies in social psychology turn out not to be replicated upon attempting to replicated them. Footnote 46 This gives us both epistemic reason—it decreases the likelihood of truth of the original study—and pragmatic reason—it defeats public trust in science as a source of knowledge—to carry out more replication studies. Thus, how much replication is needed depends on the epistemic state a particular discipline is in.

Certainly, it is not at all common to speak of a “replication crisis” in the case of the humanities, in contrast to some of the quantitative empirical sciences. As various philosophers, such as Martha Nussbaum, Footnote 47 have argued, though, there is at least a crisis in the humanities in the sense that they are relatively widely thought of as having a low epistemic status. They are thought to be not nearly as reliable as the sciences and not to provide any robust knowledge. To give just one example, according to American philosopher of science Alex Rosenberg:

When it comes to real understanding, the humanities are nothing we have to take seriously, except as symptoms. But they are everything we need to take seriously when it comes to entertainment, enjoyment, and psychological satisfaction. Just don’t treat them as knowledge or wisdom. Footnote 48

Another well-known example is the recent so-called grievance studies affair (or hoax). This was an attempt in 2017–2018 by three scholars—James Lindsay, Peter Boghossian, and Helen Pluckrose—to test the editorial and peer review process of various fields in the humanities. They did so by trying to get bogus papers published in influential academic journals in fields such as feminism studies, gender studies, race studies, and sexuality studies. They managed to publish a significant number of papers (which were all retracted after the hoax was revealed), and got an even larger number accepted (without yet being published). However, it is rather controversial exactly what this hoax shows about the epistemic status of these fields in the humanities. Footnote 49 Some have argued that the results would have been similar in pretty much any other empirical discipline, Footnote 50 and still others that we cannot conclude anything from this hoax, since there was no control group. Footnote 51

In any case, there may well be a crisis in how the humanities are perceived. Yet, there does not seem to be a replication crisis —at least, it is usually not framed as such. There may, therefore, be somewhat less of a social and epistemic urge to carry out replication studies in the humanities. However, given the epistemic and pragmatic reasons to do so, carrying out at least some replication studies would be good for the humanities and for how they are publicly perceived.

We should also realize that one of the reasons that people started to talk about a replication crisis in certain empirical sciences in the first place was that, apart from problems with replicability (some studies did not even meet that desideratum), for some studies an attempt at replication took place but was unsuccessful, so they met replicability as a desideratum, but not the positive property of replication. That showed the need for more replication studies. Thus, one way to discover the need for replication studies is, paradoxically, to carry out such replication studies. This means that, in order to establish the extent to which replication studies are needed in various fields in the humanities, we should simply carry them out.

Before we move on, I would like to discuss an objection against the desirability of replication in the humanities. The objection is that even though replication may well be possible in the humanities, it is not particularly desirable—not something to aim at or invest research money on—because there is simply too much disagreement in the humanities for there to be a successful replication sufficiently often. Thus, even though many humanistic studies would be replicable, carrying out a replication study would in the majority of cases lead to different results. In philosophy, for instance, there is a rather radical divide between scholars in the analytic tradition and scholars in the continental tradition. One might think it likely that a replication of any study by members of the one group would lead to substantially different results if carried out by members of the other group.

We should not forget, though, that we find radically different sorts of schools within, say, economics or physics. In economics, for instance, we find the economics of the Saltwater school, the economics of the Freshwater school, and, more rarely, institutional economics, Austrian economics, feminist economics, Marxian economics, and ecological economics. In quantum mechanics, we find a wide variety of different interpretations with different ideas about randomness and determinacy, the nature of measurement, and which elements in quantum mechanics can be considered real: the Standard or Copenhagen interpretation, the consistent histories interpretation, the many worlds interpretation, the transactional interpretation, and so on.

The problem that this objection draws our attention to, then, is a general one: if a study from one school of thought is replicated by members of a different school of thought, it is much more likely that relevant background assumptions will be different and various auxiliary hypotheses will play an important role. This may make it easier for the researchers of the original study to reject the results of the new study if they differ from those of the original one: they may point to different background assumptions and different auxiliary hypotheses. That does not necessarily undermine the value of those replication studies, though revision in background assumptions or change in auxiliary hypotheses may be widely considered to be an improvement in comparison with the original study or a legitimate change for other reasons. Also, even if the study’s background assumptions are different and various auxiliary hypotheses differ, the study may still be successfully replicated.

The most important point to note here, though, is that, to the extent that this is a problem (and we have seen that it is not necessarily a problem at all), it is a general problem and not one that is unique to the humanities.

This is not to deny that there may be situations in which there is too much divergence on background assumptions, method, relevant auxiliary hypotheses, and so on, to carry out a replication study. This will be the case for some humanistic studies and research groups, as it will be the case for some scientific studies and research groups. What this means is that in some humanistic disciplines, replicability is still a desideratum and replicability surely is still a positive property, but the absence of replicability because of severe limits on the possibility of replication is not necessarily a reason to discard that study. In other words, in balancing the theoretical virtues of various hypotheses and studies in the humanities, replicability will sometimes not be weighed as heavily as, say, consistency with background knowledge, simplicity, internal coherence, and other intellectual virtues. That is, of course, as such not a problem at all, as the weight of various intellectual virtues differs from discipline to discipline anyway; predictive power, for instance, is crucial in much of physics, but carries much less weight in economics and evolutionary biology.

Conclusions

I conclude that replication is possible in the humanities. By that, I mean that empirical studies in the humanities are often such that an independent repetition of it, using similar or different methods and conducted under similar circumstances, can be carried out. I also conclude that replicability is desirable in the humanities: by that, I mean that many empirical studies in the humanities should indeed be such that an independent repetition of it, using similar or different methods and conducted under similar circumstances, can be carried out. And I conclude that carrying out replication studies in the humanities is desirable: we should actually frequently carry out such independent repetitions of published studies. Exactly how desirable replication in the humanities is remains to be seen; paradoxically, carrying out replication studies in the humanities may tell us more about exactly how desirable doing so is.

See Begley [ 2 ].

See Open Science Collaboration [ 3 ].

See Baker [ 4 ]; Ioannidis [ 5 ]; Nuzzo [ 6 ]; Munafò and Smith [ 7 ].

See http://www8.nationalacademies.org/cp/projectview.aspx?key=49906 , last visited May 1, 2018.

For full bibliographical details, see the list in the “References” section.

For overviews of such causes, see AMS [ 8 ], 5, 16–21; IAP [ 9 ], 1; KNAW, 23–24 [ 10 ]; Munafò et al. [ 11 ], 2. Bouter [ 12 ] further analyzes the causes for various kinds of questionable research practices. For the issue of lack of effective peer review, see, for instance, [ 13 ]. In this paper, Smith argues that there is actually no systematically acquired evidence for thinking that peer review is a good quality assurance mechanism and that we do have good evidence for thinking that peer review has a large number of downsides.

For these points, see also KNAW [ 10 ], 4, 20–22.

Note the KNAW Advisory Report’s subtitle: Improving Reproducibility in the Empirical Sciences .

KNAW [ 10 ], 16.

See [ 14 , 15 ], and a recent co-authored blog: [ 16 ]. The paper at hand provides a much more in-depth exploration of the ideas about replication in the humanities advocated in these three pieces.

The humanities are to be distinguished from the sciences , where I take the sciences to include the applied sciences, such as medicine, engineering, computer science, and applied physics, the formal sciences, such as decision theory, statistics, systems theory, theoretical computer science, and mathematics, the natural sciences, such as physics, chemistry, earth science, ecology, oceanography, geology, meteorology, astronomy, life science, biology, zoology, and botany, and the social sciences, such as criminology, economy, and psychology.

I would be happy, though, to embrace the definitions given of these terms in the KNAW Advisory Report, viz. for “robustness”: the extent to which the conclusions depend on minor changes in the procedures and assumptions, for “reliability of measurements”: the measurement error due to variation, and for “verifiability of results”: the extent to which the study documentation provides enough information on how results have been attained to assess compliance with relevant standards (cf. [ 10 ], 19). As will become clear from what follows in this section, the phenomena of robustness, reliability, and verifiability, thus understood, are in interesting ways related to , but nevertheless clearly conceptually distinct from replication, replicability, reproduction, and reproducibility.

Some people use the word “reproduction” somewhat more narrowly, namely merely for a study that re-analyzes the same data of the original study and scrutinizes whether they lead to the same results. I will use a broader definition here.

For a similar definition, see KNAW [ 10 ], 18; NSF [ 17 ], 4–5. IAP [ 9 ], unfortunately, provides no definition.

A fourth option, not mentioned in the report, is to carry out a replication with the same data and a new or revised research protocol .

For the purposes of the paper, I take a “research protocol” to be primarily a description of the study design: a description of which data are taken to be relevant and which method is used.

Italics are mine. See https://www.nwo.nl/en/funding/our-funding-instruments/sgw/replication-studies/replication-studies.html , last visited August 30, 2018. For the different kinds of replication, see also [ 18 ].

The KNAW [ 10 ] report, for instance, does not.

See LeBel et al. [ 19 ], 9; see also Earp and Trafimow [ 20 ].

Thus, for instance, KNAW [ 10 ], 18: “reproducibility concerns the extent to which the results of a replication study agree with those of the earlier study.”

For example, LeBel et al. [ 19 ].

See, for instance, Popper [ 21 ].

See [ 22 ]. As to the role of auxiliary assumptions, such as ones about the role of language, he also gives a particular example that illustrates this claim—one about walking speed in response to being primed with the elderly stereotype (the original study being [ 23 ]). For further illustrations of the fact that direct falsification of a theory is virtually impossible, see [ 20 , 24 ].

See Earp and Trafimov [ 20 ]

This is not to deny that Popper himself thought falsification to be a good thing, since he believed scientific progress to consist of instances of falsification (see [ 25 ], 215–250).

For example, AMS [ 8 ], 9.

This approach to agreement on results squares well with that of [ 26 ], and that of [ 27 ]. I thank Lex Bouter for helpful suggestions on this point.

KNAW [ 10 ], 33

This is also noted in the KNAW [ 10 ] Report, 18, 25, and spelled out in more detail in [ 19 ], 14. For more on the nature of degrees, that is, for what I take it to be for something to come in degrees, see [ 28 ].

See LeBel et al. [ 19 ]

Thus, also KNAW [ 10 ], 4, 19.

This worry is also found in the KNAW Advisory Report: KNAW [ 10 ], 17, 29, even though it is pointed out that this worry might not be decisive. We find the same idea among certain neo-Kantians; see, for instance, [ 29 ].

See, for instance, Payne [ 30 ].

Moreover, one may wonder whether there are such things as unique historical events studied by historians. One might think, for instance, that the French revolution is not a unique historical event, but just a series of (virtually) infinitely many smaller events, and that history always studies a combination of those events rather than a single, unique event.

This was concluded by a recent study; see [ 31 ].

For example, Lorenz [ 32 ].

In a way, then, replication—including replication in the humanities—is like what mathematicians do in checking a proof and lay people in checking a particular calculation (say, splitting the bill in a restaurant); if a large number of competent people come to the same result, then, all else being equal, the result is likely to be true.

See [ 33 ], 112–122. This is not to deny that they may have meaning or significance in some sense; the double-helix structure of DNA may be of special significance to, say, James Watson, Francis Crick, and Rosalind Franklin.

For a defense of this position, see [ 34 ].

For an exploration and discussion, see, for instance, [ 35 ].

For a more detailed exposition and discussion of the hermeneutical method, see [ 36 , 37 ].

For this interpretation, see various essays in Van [ 38 ]; and [ 39 ].

See Sander [ 40 ] and Wright [ 41 ].

For an overview of much research on the Rosetta stone, see [ 42 ].

See Van Tilborgh, Meedendorp, Van Maanen [ 43 ].

See, for instance, Klein [ 44 ].

See Nussbaum [ 45 ], chapter 1.

Rosenberg [ 34 ], 307.

See Lindsay et al. [ 46 ]. For a more positive assessment, see Mounk 2018 [ 47 ].

See Engber 2018 [ 48 ].

See [ 49 ]. For a defense of the hoax on these two points, see Essig, Moorti 2018 [ 50 ].

NAS: National Academies of Sciences, Engineering, and Medicine. Statistical Challenges in Assessing and Fostering the Reproducibility of Scientific Results: Summary of a Workshop: National Academies Press; 2016. https://www.nap.edu/catalog/21915/statistical-challenges-in-assessing-and-fostering-the-reproducibility-of-scientific-results , last visited May 1 st 2018

Begley E. Raise Standards for Preclinical Cancer Research. Nature. 2012;483:531–3.

Article   Google Scholar  

Open Science Collaboration. Estimating the reproducibility of psychological science. Science. 2015;349:6351. https://doi.org/10.1126/science.aac4716.

Baker M. Is there a replicability crisis? Nature. 2016;533:452–4.

Ioannidis JP. Why most published research findings are false. PLoS Med. 2005;2(8):e124.

Nuzzo R. Fooling ourselves. Nature. 2015;526:182–5.

Munafò MR, Smith D. Repeating experiments is not enough. Nature. 2018;553:399–401.

AMS: The Academy of Medical Sciences. Reproducibility and reliability of biomedical research: improving research practice. Symposium report. 2015 https://acmedsci.ac.uk/file-download/38189-56531416e2949.pdf, last visited May 1 st 2018 .

IAP: Interacademy Partnership for Health. A call for action to improve the reproducibility of biomedical research. 2016 http://www.interacademies.org/39535/Improving-the-reproducibility-of-biomedical-research-a-call-for-action , last visited May 1 st 2018.

KNAW: Royal Dutch Academy of Arts and Sciences. Replication studies: improving reproducibility in the empirical sciences, Amsterdam. 2018 https://knaw.nl/en/news/publications/replication-studies , last visited May 1 st 2018.

Munafò MR, et al. A Manifesto for Reproducible Science. Nat Hum Behav. 2017;1(art. 0021):1–9. https://doi.org/10.1038/s41562-016-0021 .

Bouter LM. Fostering responsible research practices is a shared responsibility of multiple stakeholders. J Clin Epidemiol. 2018;96:143–6.

Smith R. Classical peer review: an empty gun. Breast Cancer Res. 2010;12(4):S13.

Peels R, Bouter L. Replication drive for humanities. Nature. 2018a;558:372.

Peels R, Bouter L. The possibility and desirability for replication in the humanities. Palgrave Commun. 2018b;4:95. https://doi.org/10.1057/s41599-018-0149-x .

Peels, Rik, Lex Bouter. Replication is both possible and desirable in the humanities, just as it is in the sciences, London School of Economics and Political Science Impact Blog, 10 October. 2018c http://blogs.lse.ac.uk/impactofsocialsciences/2018/10/01/replication-is-both-possible-and-desirable-in-the-humanities-just-as-it-is-in-the-sciences/ .

NSF: National Science Foundation. (2015). Social, Behavioral, and Economic Sciences Perspectives on Robust and Reliable Science: Report of the Subcommittee on Replicability in Science Advisory Committee to the National Science Foundation Directorate for Social, Behavioral, and Economic Sciences, https://www.nsf.gov/sbe/AC_Materials/SBE_Robust_and_Reliable_Research_Report.pd , last visited May 1 st 2018.

Google Scholar  

Radder H. The material realization of science: from Habermas to experimentation and referential realism. Dordrecht: Springer; 2012.

Book   Google Scholar  

LeBel EP, McCarthy RJ, Earp BD, Elson M, Vanpaemel W. A unified framework to quantify the credibility of scientific findings. Adv Methods Pract Psychol Sci. 2018 forthcoming. https://doi.org/10.1177/2515245918787489 .

Earp BD, Trafimow D. Replication, falsification, and the crisis of confidence in social psychology. Front Psychol. 2015;6:621.

Popper KR. Zwei Bedeutungen von Falsifizierbarkeit [Two Meanings of Falsifiability]. In: Seiffert H, Radnitzky G, editors. Handlexikon der Wissenschaftstheorie. München: Deutscher Taschenbuch Verlag; 1994. p. 82–5.

Earp BD. Falsification: How Does It Relate to Reproducibility? In: Morin J-F, Olsson C, Atikcan EO, editors. Key Concepts in Research Methods. Abingdon, New York: Routledge; 2018. Available online ahead of print at https://www.academia.edu/36659820/Falsification_How_does_it_relate_to_reproducibility/ .

Bargh JA, Chen M, Burrows L. Automaticity of social behavior: direct effects of trait construct and stereotype activation on action. J Pers Soc Psychol. 1996;71(2):230–44.

Trafimow D, Earp BD. Badly specified theories are not responsible for the replication crisis in social psychology: comment on Klein. Theory Psychol. 2016;26(4):540–8.

Popper KR. Conjectures and Refutations. New York: Harper; 1965. p. 1965.

Goodman SN, Fanelli D, Ioannidis JPA. What does reproducibility really mean? Sci Transl Med. 2016;8(341):ps12.

Nosek BA, Errington TM. Making sense of replications. eLIFE. 2017;6:e23383.

Van Woudenberg R, Peels R. The metaphysics of degrees. Eur J Philos. 2018;26(1):46–65.

Windelband W, Oakes G. History and Natural Science. History and Theory. 1980;19(2):165–8 (originally published in 1924).

Payne T. Describing Morphosyntax: a guide for field linguists. Cambridge: Cambridge University; 1997.

Adelborg K, et al. Migraine and risk of cardiovascular diseases: Danish population based matched cohort study. Br Med J. 2018;360:k96. https://doi.org/10.1136/bmj.k96 published January 31 st .

Lorenz C. Constructing the past. Princeton: Princeton University Press; 2008.

Van Woudenberg R. The nature of the humanities. Philosophy. 2017;93(1):109–40.

Rosenberg A. The Atheist’s guide to reality. New York: Norton; 2012.

Kukla A. Social Constructivism and the Philosophy of Science. Oxford: Routledge; 2000.

Malpas J, Gander H-H. The Routledge Companion to Hermeneutics. New York: Routledge; 2015.

Nial K, Lawn C, editors. The Blackwell companion to hermeneutics. Oxford: Blackwell; 2016.

Van den Berg, Albert J, Kotzé A, Nicklas T, Scopello M. In Search of Truth: Augustine, Manichaeism and other Gnosticism: Studies for Johannes van Oort at Sixty, Nag Hammadi and Manichaean Studies 74. Leiden: Brill; 2010.

Meconi DV, Stump E, editors. The Cambridge Companion to Augustine. Cambridge: Cambridge University Press; 2014.

Sander EP. Paul and Palestinian Judaism: a comparison of patterns of religion. Philadelphia: Fortress Press; 1977.

Wright NT. Paul and his recent interpreters. Minneapolis: Augsburg Fortress; 2014.

Ray JD. The Rosetta Stone and the Rebirth of Ancient Egypt. Cambridge, Mass: Harvard University Press; 2007.

Van Tilborgh, L, T Meedendorp, O van Maanen. ‘Sunset at Montmajour’: a newly discovered painting by Vincent van Gogh, Burlingt Mag. 2013 155 (no. 1327).

Klein RA, Ratliff KA, Vianello M, Adams RB Jr, Bahnik S, Bernstein MJ, Bocian K, Bary Kappes H, Nosek BA. Investigating variation in replicability. Soc Psychol. 2014;45:142–52.

Nussbaum M. Not for profit: why democracy needs the humanities. Princeton: Princeton University Press; 2010.

Lindsay, JA., P Boghossian, H Pluckrose. Academic Grievance Studies and the Corruption of Scholarship, Areo Magazine, October 2 nd . 2018 https://areomagazine.com/2018/10/02/academic-grievance-studies-and-the-corruption-of-scholarship/ .

Mounk, Y. The Circling of the Academic Wagons, The Chronicle of Higher Education, 9 October. 2018 https://web.archive.org/web/20181010122828/ ; https://www.chronicle.com/article/What-the-Grievance/244753 .

Engber, Daniel. What the “Grievance Studies” Hoax Actually Reveals. Slate. 2018. https://slate.com/technology/2018/10/grievance-studieshoax-not-academic-scandal.html .

Hughes, V, P Aldhous. Here’s what critics say about that big new hoax on gender studies, Buzzfeed News, 10-09-2018. 2018 https://www.buzzfeednews.com/article/virginiahughes/grievance-studies-sokal-hoax .

Essig L, Moorti S. Only a Rube Would Believe Gender Studies Has Produced Nothing of Value: The Chronicle of Higher Education; 2018.

Download references

Acknowledgements

For their helpful comments on an earlier version of this paper, I would like to thank Lieke Asma, Valentin Arts, Wout Bisschop, Lex Bouter, Jeroen de Ridder, Tamarinde Haven, Thirza Lagewaard, Chris Ranalli, Joeri Tijdink, and René van Woudenberg. I also thank various audience members for their constructive suggestions at the KNAW Royal Netherlands Academy of Sciences meeting on replicability ( Reproduceerbaarheid van wetenschappelijk onderzoek: Wetenschapsbreed van belang? ) on March 5, 2018. Finally, I thank Brian Nosek and an anonymous referee for their constructive review of the paper for this journal.

This publication was made possible through the support of a grant from the Templeton World Charity Foundation: “The Epistemic Responsibilities of the University” (2016–2019). The opinions expressed in this publication are those of the author and do not necessarily reflect the views of the Templeton World Charity Foundation.

Availability of data and materials

Not applicable

Author information

Authors and affiliations.

Philosophy Department, Faculty of Humanities, Vrije Universiteit Amsterdam, De Boelelaan 1105, 1081 HV, Amsterdam, The Netherlands

You can also search for this author in PubMed   Google Scholar

Contributions

The author read and approved the final manuscript.

Corresponding author

Correspondence to Rik Peels .

Ethics declarations

Ethics approval and consent to participate, consent for publication, competing interests.

The author declares that he has no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License ( http://creativecommons.org/licenses/by/4.0/ ), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Cite this article.

Peels, R. Replicability and replication in the humanities. Res Integr Peer Rev 4 , 2 (2019). https://doi.org/10.1186/s41073-018-0060-4

Download citation

Received : 03 October 2018

Accepted : 13 December 2018

Published : 09 January 2019

DOI : https://doi.org/10.1186/s41073-018-0060-4

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Normativity
  • Replication crisis

Research Integrity and Peer Review

ISSN: 2058-8615

doing a replication research study is another form of plagiarism

Enago Academy

Research Reproducibility and Credibility — Replication Studies

' src=

A Self-Correcting Process!

Scientific research has never been a perfect process of discovery. We have never expected to get the results right the first time, and so we have developed multiple methodologies to test different hypotheses to sequentially build a more comprehensive understanding of the world around us.

One of the foundational assumptions of that process has been the knowledge that science will always self-correct any errors in inductive or deductive reasoning with an established process of validation through reproduction of results. If the results of a study prove to be irreproducible, we take that as a sign that the prior study should be re-examined.

A Credibility Gap

Re-examination of a failure to replicate a study results still starts with the assumption that the initial study is solid. Scientists will double and triple check the protocol of their replication study first, before moving on to examine the original study for errors in methodology and/or analysis. Unfortunately, the rise in journal retractions implies that this assumption of credibility is undeserved. It would be arrogant to assume that your replication study is automatically flawless if the results can’t be reproduced, but given the current trend in academic publishing to only publish new research, the limited likelihood of getting your replication study published would suggest that your checking efforts should be focused on building a robust case against the original study.

Paying Lip Service

We now seem to pursue prestige over credibility. Papers are cited more frequently from the prestigious journals in each field, which conveniently raises the prestige of those same journals, based on the volume of citations. Research is assumed to be solid based on the reputation of the journal and all that implies. It has become implicit that leading journals have first-class peer review processes that would catch any errors, such that replication would not be necessary. This misplaced confidence is only paying lip service to credibility. The reputation of any journal is only as good as the absence of evidence of misconduct. Any evidence of a conflict of interest, or unethical conduct can do irreparable harm to that reputation in just a few days.

Searching for a Stamp of Approval

With increasing numbers of researchers chasing a declining number of publishing opportunities in journals that have enough perceived prestige to boost the scholarly value of those eager researchers, the credibility of reproducibility has fallen by the wayside. Validation of a study, especially one that produces counterintuitive results, should come from replication of the results of that study. That world has changed.

Counterintuitive results attract a lot of attention, but a lot of that attention will come from fellow researchers criticizing the protocol and the results. However, that criticism is likely to be based on theoretical disagreements alone, because it would be very unlikely that any of those critics would be using replication data. Replication studies take time and resources, and the window of opportunity to speak out against the latest study will have closed by then. This leaves reproducibility in a quandary.

If solid evidence of flawed results can’t be produced while interest in the topic is at its height, by the time those results are made available, journals will have moved on to the next big thing. If there’s no interest in publication, it’s unlikely that the replication study would get funded in the first place, and even if it did get funding and produced results that directly challenged the integrity of the original study, the lack of interest may leave that original study without a retraction or any type of response from the journal in which it was published.

Rate this article Cancel Reply

Your email address will not be published.

doing a replication research study is another form of plagiarism

Enago Academy's Most Popular Articles

Best AI Content Detection Tools to Uphold Academic Integrity

  • Publishing Research
  • Reporting Research

Upholding Academic Integrity in the Age of AI: Challenges and solutions

In today’s academic landscape, one of the most pressing challenges is ensuring academic integrity. With…

doing a replication research study is another form of plagiarism

  • AI in Academia

Disclosing the Use of Generative AI: Best practices for authors in manuscript preparation

The rapid proliferation of generative and other AI-based tools in research writing has ignited an…

doing a replication research study is another form of plagiarism

  • Understanding Ethics

Mitigating Survivorship Bias in Scholarly Research: 10 tips to enhance data integrity

In the realm of research, uncovering of truth requires a vigilant eye and an unwavering…

reproducibility in research

Top 5 Factors Affecting Reproducibility in Research

How can science go wrong? This thought often comes to researchers. When you discuss with…

Negative Results

  • Selecting Journals

Top 10 Journals to Publish Your Negative Results

Failure is the stepping stone to success. Most technological advancements have stemmed from initial failures.…

Meta-Analysis of Research: Getting the Bigger Picture

doing a replication research study is another form of plagiarism

Sign-up to read more

Subscribe for free to get unrestricted access to all our resources on research writing and academic publishing including:

  • 2000+ blog articles
  • 50+ Webinars
  • 10+ Expert podcasts
  • 50+ Infographics
  • 10+ Checklists
  • Research Guides

We hate spam too. We promise to protect your privacy and never spam you.

I am looking for Editing/ Proofreading services for my manuscript Tentative date of next journal submission:

doing a replication research study is another form of plagiarism

What should universities' stance be on AI tools in research and academic writing?

  • Foundations
  • Write Paper

Search form

  • Experiments
  • Anthropology
  • Self-Esteem
  • Social Anxiety

doing a replication research study is another form of plagiarism

Replication Study

A replication study involves repeating a study using the same methods but with different subjects and experimenters.

This article is a part of the guide:

  • Validity and Reliability
  • Types of Validity
  • Definition of Reliability
  • Content Validity
  • Construct Validity

Browse Full Outline

  • 1 Validity and Reliability
  • 2 Types of Validity
  • 3.1 Population Validity
  • 3.2 Ecological Validity
  • 4 Internal Validity
  • 5.1.1 Concurrent Validity
  • 5.1.2 Predictive Validity
  • 6 Content Validity
  • 7.1 Convergent and Discriminant Validity
  • 8 Face Validity
  • 9 Definition of Reliability
  • 10.1 Reproducibility
  • 10.2 Replication Study
  • 11 Interrater Reliability
  • 12 Internal Consistency Reliability
  • 13 Instrument Reliability

The researchers will apply the existing theory to new situations in order to determine generalizability to different subjects , age groups, races, locations, cultures or any such variables .

The main determinants of this study include:

To assure that results are reliable and valid

To determine the role of extraneous variables

To apply the previous results to new situations

To inspire new research combing previous findings from related studies

Suppose you are part of a healthcare team facing a problem, for instance, regarding use and efficacy of certain "pain killer medicine" in patients before surgery. You search the literature for same problem and indentify an article exactly addressing "this" problem.

Now question arise that how can you be sure that the results of this study in hand are applicable and transferable into "your" clinical setting? Therefore you decide to focus on preparation and implementation of a replication study. You will perform the deliberate repetition of previous research procedures in your clinical setting and thus will be able to strengthen the evidence of previous research finding, and correct limitations, and thus overall results may be in favor of the results of previous study or you may find completely different results.

A question may arise that how to decide if a replication study can be carried out or not? Following are the guidelines or criteria proposed to replicate an original study:

A replication study is possible and should be carried out, when

The original research question is important and can contribute to the body of information supporting the discipline

The existing literature and policies relating to the topic are supporting the topic for its relevance

The replication study if carried out carries the potential to empirically support the results of the original study, either by clarifying issues raised by the original study or extending its generalizability .

The team of researchers has all expertise in the subject area and also has the access to adequate information related to original study to be able to design and execute a replication.

Any extension or modifications of the original study can be based on current knowledge in the same field.

Lastly, the replication of the same rigor as was in original study is possible.

In field conditions, more opportunities are available to researchers that are not open to investigations in laboratory settings.

Also, laboratory investigators commonly have only small number of potential participants in their research trials. However in applied settings such as schools, classrooms, patients at hospitals or other settings with large proportion of participants are often generously available in field settings.

It is therefore possible in field settings to repeat or replicate a research on large scale and more than once too.

  • Psychology 101
  • Flags and Countries
  • Capitals and Countries

Explorable.com (Jun 12, 2009). Replication Study. Retrieved Apr 25, 2024 from Explorable.com: https://explorable.com/replication-study

You Are Allowed To Copy The Text

The text in this article is licensed under the Creative Commons-License Attribution 4.0 International (CC BY 4.0) .

This means you're free to copy, share and adapt any parts (or all) of the text in the article, as long as you give appropriate credit and provide a link/reference to this page.

That is it. You don't need our permission to copy the article; just include a link/reference back to this page. You can use it freely (with some kind of link), and we're also okay with people reprinting in publications like books, blogs, newsletters, course-material, papers, wikipedia and presentations (with clear attribution).

doing a replication research study is another form of plagiarism

Want to stay up to date? Follow us!

Save this course for later.

Don't have time for it all now? No problem, save it as a course and come back to it later.

Footer bottom

  • Privacy Policy

doing a replication research study is another form of plagiarism

  • Subscribe to our RSS Feed
  • Like us on Facebook
  • Follow us on Twitter

Stanford University

Rigorous research practices improve scientific replication

Science has suffered a crisis of replication—too few scientific studies can be repeated by peers. A new study from Stanford and three leading research universities shows that using rigorous research practices can boost the replication rate of studies.

Science has a replication problem. In recent years, it has come to light that the findings of many studies, particularly those in social psychology, cannot be reproduced by other scientists. When this happens, the data, methods, and interpretation of the study’s results are often called into question, creating a crisis of confidence.

“When people don’t trust science, that’s bad for society,” said Jon Krosnick , the Frederic O. Glover Professor of Humanities and Social Sciences in the Stanford School of Humanities and Sciences. Krosnick is one of four co-principal investigators on a study that explored ways scientists in fields ranging from physics to psychology can improve the replicability of their research. The study, published Nov. 9 in Nature Human Behavior , found that using rigorous methodology can yield near-perfect rates of replication.

Image of Jon Krosnick

“Replicating others’ scientific results is fundamental to the scientific process,” Krosnick argues. According to a paper published in 2015 in Science , fewer than half of findings of psychology studies could be replicated—and only 30 percent for studies in the field of social psychology. Such findings “damage the credibility of all scientists, not just those whose findings cannot be replicated,” Krosnick explained.

Publish or perish

“Scientists are people, too,” said Krosnick, who is a professor of communication and of political science in H&S and of social sciences in the Stanford Doerr School of Sustainability. “Researchers want to make their funders happy and to publish head-turning results. Sometimes, that inspires researchers to make up or misrepresent data.

Almost every day, I see a new story about a published study being retracted—in physics, neuroscience, medicine, you name it. Showing that scientific findings can be replicated is the only pathway to solving the credibility problem.”

Accordingly, Krosnick added that the publish-or-perish environment creates the temptation to fake the data or to analyze and reanalyze the data with various methods until a desired result finally pops out, which is not actually real—a practice known as p-hacking.

Image of Bo MacInnis

In an effort to assess the true potential of rigorous social science findings to be replicated, Krosnick’s lab at Stanford and labs at the University of California, Santa Barbara; the University of Virginia; and the University of California, Berkeley set out to discover new experimental effects using best practices and to assess how often they could be reproduced. The four teams attempted to replicate the results of 16 studies using rigor-enhancing practices.

“The results reassure me that painstakingly rigorous methods pay off,” said Bo MacInnis , a Stanford lecturer and study co-author whose research on political communication was conducted under the parameters of the replicability study. “Scientific researchers can effectively and reliably govern themselves in a way that deserves and preserves the public’s highest trust.”

Matthew DeBell , director of operations at the American National Election Studies program at the Stanford Institute for Research in the Social Sciences is also a co-author.

“The quality of scientific evidence depends on the quality of the research methods,” DeBell said. “Research findings do hold up when everything is done as well as possible, underscoring the importance of adhering to the highest standards in science.”

Image of Matthew DeBell

Transparent methods

In the end, the team found that when four “rigor-enhancing” practices were implemented, the replication rate was almost 90 percent. Although the recommended steps place additional burdens on the researchers, those practices are relatively straightforward and not particularly onerous.

These practices call for researchers to run confirmatory tests on their own studies to corroborate results prior to publication. Data should be collected from a sufficiently large sample of participants. Scientists should preregister all studies, committing to the hypotheses to be tested and the methods to be used to test them before data are collected, to guard against p-hacking. And researchers must fully document their procedures to ensure that peers can precisely repeat them.

The four labs conducted original research using these recommended rigor-enhancing practices. Then they submitted their work to the other labs for replication. Overall, of the 16 studies produced by the four labs during the five-year project, replication was successful in 86 percent of the attempts.

“The bottom line in this study is that when science is done well, it produces believable, replicable, and generalizable findings,” Krosnick said. “What I and the other authors of this study hope will be the takeaway is a wake-up call to other disciplines to doubt their own work, to develop and adopt their own best practices, and to change how we all publish by building in replication routinely. If we do these things, we can restore confidence in the scientific process and in scientific findings.”

Acknowledgements

Krosnick is also a professor, by courtesy, of psychology in H&S. Additional authors include lead author John Protzko of Central Connecticut State University; Leif Nelson, a principal investigator from the University of California, Berkeley; Brian Nosek, a principal investigator from the University of Virginia; Jordan Axt of McGill University ; Matt Berent of Matt Berent Consulting; Nicholas Buttrick and Charles R. Ebersole of the University of Virginia; Sebastian Lundmark of the University of Gothenburg, Gothenburg, Sweden; Michael O’Donnell of Georgetown University; Hannah Perfecto of Washington University, St. Louis; James E. Pustejovsky of the University of Wisconsin, Madison; Scott Roeder of the University of South Carolina ; Jan Walleczek of the Fetzer Franklin Fund; and senior author and project principal investigator author Jonathan Schooler of the University of California, Santa Barbara.

This research was funded by the Fetzer Franklin Fund of the John E. Fetzer Memorial Trust.

Competing Interests

Nosek is the executive director of the nonprofit Center for Open Science. Walleczek was the scientific director of the Fetzer Franklin Fund that sponsored this research, and Nosek was on the fund’s scientific advisory board. Walleczek made substantive contributions to the design and execution of this research but as a funder did not have controlling interest in the decision to publish or not. All other authors declared no conflicts of interest.

  • Social Sciences

Arthur Barnes

Stanford music professor Arthur P. Barnes dies at 93

Image of author Ana Raquel Minian and the cover of their book In the Shadows of Liberty

In the Shadow of Liberty by Ana Raquel Minian, History

Image of Rachel Jean-Baptiste

Rachel Jean-Baptiste awarded 2024 David H. Pinkney Prize

  • Interdisciplinary

Image of Joel Cabrita

Joel Cabrita receives National Institute for the Humanities and Social Sciences book award

Stanford University

© Stanford University, Stanford, California 94305

Why is Replication in Research Important?

Replication in research is important because it allows for the verification and validation of study findings, building confidence in their reliability and generalizability. It also fosters scientific progress by promoting the discovery of new evidence, expanding understanding, and challenging existing theories or claims.

Updated on June 30, 2023

researchers replicating a study

Often viewed as a cornerstone of science , replication builds confidence in the scientific merit of a study’s results. The philosopher Karl Popper argued that, “we do not take even our own observations quite seriously, or accept them as scientific observations, until we have repeated and tested them.”

As such, creating the potential for replication is a common goal for researchers. The methods section of scientific manuscripts is vital to this process as it details exactly how the study was conducted. From this information, other researchers can replicate the study and evaluate its quality.

This article discusses replication as a rational concept integral to the philosophy of science and as a process validating the continuous loop of the scientific method. By considering both the ethical and practical implications, we may better understand why replication is important in research.

What is replication in research?

As a fundamental tool for building confidence in the value of a study’s results, replication has power. Some would say it has the power to make or break a scientific claim when, in reality, it is simply part of the scientific process, neither good nor bad.

When Nosek and Errington propose that replication is a study for which any outcome would be considered diagnostic evidence about a claim from prior research, they revive its neutrality. The true purpose of replication, therefore, is to advance scientific discovery and theory by introducing new evidence that broadens the current understanding of a given question.

Why is replication important in research?

The great philosopher and scientist, Aristotle , asserted that a science is possible if and only if there are knowable objects involved. There cannot be a science of unicorns, for example, because unicorns do not exist. Therefore, a ‘science’ of unicorns lacks knowable objects and is not a ‘science’.

This philosophical foundation of science perfectly illustrates why replication is important in research. Basically, when an outcome is not replicable, it is not knowable and does not truly exist. Which means that each time replication of a study or a result is possible, its credibility and validity expands.

The lack of replicability is just as vital to the scientific process. It pushes researchers in new and creative directions, compelling them to continue asking questions and to never become complacent. Replication is as much a part of the scientific method as formulating a hypothesis or making observations.

Types of replication

Historically, replication has been divided into two broad categories: 

  • Direct replication : performing a new study that follows a previous study’s original methods and then comparing the results. While direct replication follows the protocols from the original study, the samples and conditions, time of day or year, lab space, research team, etc. are necessarily different. In this way, a direct replication uses empirical testing to reflect the prevailing beliefs about what is needed to produce a particular finding.
  • Conceptual replication : performing a study that employs different methodologies to test the same hypothesis as an existing study. By applying diverse manipulations and measures, conceptual replication aims to operationalize a study’s underlying theoretical variables. In doing so, conceptual replication promotes collaborative research and explanations that are not based on a single methodology.

Though these general divisions provide a helpful starting point for both conducting and understanding replication studies, they are not polar opposites. There are nuances that produce countless subcategories such as:

  • Internal replication : when the same research team conducts the same study while taking negative and positive factors into account
  • Microreplication : conducting partial replications of the findings of other research groups
  • Constructive replication : both manipulations and measures are varied
  • Participant replication : changes only the participants

Many researchers agree these labels should be confined to study design, as direction for the research team, not a preconceived notion. In fact, Nosek and Errington conclude that distinctions between “direct” and “conceptual” are at least irrelevant and possibly counterproductive for understanding replication and its role in advancing knowledge.

How do researchers replicate a study?

Like all research studies, replication studies require careful planning. The Open Science Framework (OSF) offers a practical guide which details the following steps:

  • Identify a study that is feasible to replicate given the time, expertise, and resources available to the research team.
  • Determine and obtain the materials used in the original study.
  • Develop a plan that details the type of replication study and research design intended.
  • Outline and implement the study’s best practices.
  • Conduct the replication study, analyze the data, and share the results.

These broad guidelines are expanded in Brown’s and Wood’s article , “Which tests not witch hunts: a diagnostic approach for conducting replication research.” Their findings are further condensed by Brown into a blog outlining four main procedural categories:

  • Assumptions : identifying the contextual assumptions of the original study and research team
  • Data transformations : using the study data to answer questions about data transformation choices by the original team
  • Estimation : determining if the most appropriate estimation methods were used in the original study and if the replication can benefit from additional methods
  • Heterogeneous outcomes : establishing whether the data from an original study lends itself to exploring separate heterogeneous outcomes

At the suggestion of peer reviewers from the e-journal Economics, Brown elaborates with a discussion of what not to do when conducting a replication study that includes:

  • Do not use critiques of the original study’s design as  a basis for replication findings.
  • Do not perform robustness testing before completing a direct replication study.
  • Do not omit communicating with the original authors, before, during, and after the replication.
  • Do not label the original findings as errors solely based on different outcomes in the replication.

Again, replication studies are full blown, legitimate research endeavors that acutely contribute to scientific knowledge. They require the same levels of planning and dedication as any other study.

What happens when replication fails?

There are some obvious and agreed upon contextual factors that can result in the failure of a replication study such as: 

  • The detection of unknown effects
  • Inconsistencies in the system
  • The inherent nature of complex variables
  • Substandard research practices
  • Pure chance

While these variables affect all research studies, they have particular impact on replication as the outcomes in question are not novel but predetermined.

The constant flux of contexts and variables makes assessing replicability, determining success or failure, very tricky. A publication from the National Academy of Sciences points out that replicability is obtaining consistent , not identical, results across studies aimed at answering the same scientific question. They further provide eight core principles that are applicable to all disciplines.

While there is no straightforward criteria for determining if a replication is a failure or a success, the National Library of Science and the Open Science Collaboration suggest asking some key questions, such as:

  • Does the replication produce a statistically significant effect in the same direction as the original?
  • Is the effect size in the replication similar to the effect size in the original?
  • Does the original effect size fall within the confidence or prediction interval of the replication?
  • Does a meta-analytic combination of results from the original experiment and the replication yield a statistically significant effect?
  • Do the results of the original experiment and the replication appear to be consistent?

While many clearly have an opinion about how and why replication fails, it is at best a null statement and at worst an unfair accusation. It misses the point, sidesteps the role of replication as a mechanism to further scientific endeavor by presenting new evidence to an existing question.

Can the replication process be improved?

The need to both restructure the definition of replication to account for variations in scientific fields and to recognize the degrees of potential outcomes when comparing the original data, comes in response to the replication crisis . Listen to this Hidden Brain podcast from NPR for an intriguing case study on this phenomenon.

Considered academia’s self-made disaster, the replication crisis is spurring other improvements in the replication process. Most broadly, it has prompted the resurgence and expansion of metascience , a field with roots in both philosophy and science that is widely referred to as "research on research" and "the science of science." By holding a mirror up to the scientific method, metascience is not only elucidating the purpose of replication but also guiding the rigors of its techniques.

Further efforts to improve replication are threaded throughout the industry, from updated research practices and study design to revised publication practices and oversight organizations, such as:

  • Requiring full transparency of the materials and methods used in a study
  • Pushing for statistical reform , including redefining the significance of the p-value
  • Using pre registration reports that present the study’s plan for methods and analysis
  • Adopting result-blind peer review allowing journals to accept a study based on its methodological design and justifications, not its results
  • Founding organizations like the EQUATOR Network that promotes transparent and accurate reporting

Final thoughts

In the realm of scientific research, replication is a form of checks and balances. Neither the probability of a finding nor prominence of a scientist makes a study immune to the process.

And, while a single replication does not validate or nullify the original study’s outcomes, accumulating evidence from multiple replications does boost the credibility of its claims. At the very least, the findings offer insight to other researchers and enhance the pool of scientific knowledge.

After exploring the philosophy and the mechanisms behind replication, it is clear that the process is not perfect, but evolving. Its value lies within the irreplaceable role it plays in the scientific method. Replication is no more or less important than the other parts, simply necessary to perpetuate the infinite loop of scientific discovery.

Charla Viera, MS

See our "Privacy Policy"

Submit search

Reflections on Plagiarism, Part 1: A Guide for the Perplexed

Peter Charles Hoffer | Feb 1, 2004

William J. Cronon, vice president of the AHA's Professional Division, writes: The AHA's Professional Division is commissioning a series of essays and advisory documents about common challenges historians face in their work. Although these essays will be reviewed and edited by members of the Professional Division, and although they will appear in Perspectives and on the AHA web site, they should not be regarded as official statements of either the Professional Division or the AHA. Instead, their goal is to offer wise counsel by thoughtful members of our guild in an effort to promote wide-ranging conversations among historians about our professional practice. Because plagiarism has generated so much public comment and controversy in recent years, we have focused some of our earliest efforts on this critical issue. We are most grateful to Peter Hoffer, an eminent legal historian at the University of Georgia and a member of the Professional Division, for producing the following "Reflections on Plagiarism" (the concluding part, " The Object of Trials ," can be found in the March 2004 issue of Perspectives ).

There are seven causes of inconsistencies and contradictions to be met with in a literary work. The first cause arises from the fact that the author collects the opinions of various men, each differing from the other, but neglects to mention the name of the author of any particular opinion. 1 —Maimonides, A Guide for the Perplexed

For writers, readers, and teachers of history, as for Maimonides long ago, plagiarism is rightly both a mortifying and perplexing form of professional misconduct. It is mortifying because it is a species of crime—the theft of another person's contribution to knowledge—that educated, respectable people commit. It is perplexing, because, despite the public shame that invariably accompanies revelations of plagiarism, it continues to occur at every level of the profession, from prizewinning historians to students just beginning their careers. While many of these infractions have come to light because readers and writers of history are keen-eyed and implacable critics of the offense, additional cases may be avoided if authors and reviewers knew more about the offense. That is the purpose of this essay, the first installment of two on the subject of plagiarism.

Plagiarism is commonly defined as the appropriation of another's work as one's own. 2 Some definitions add the purposive element of gaining an advantage of some kind. 3 Others include the codicil, "with the intent to deceive." 4 The historical profession has adopted a broad and stern definition of plagiarism, based upon ethical rather than purely legal conceptions. Its definition of plagiarism is the "expropriation of another author's text, and the presentation of it as one's own." 5 It does not require that the act be intentional, nor that the offender gain some advantage from it. 6 Nor for historians is the ultimate sanction against the offense a legal one, but instead the public infamy that accompanies egregious misconduct. As historians, we know, in the words of Lord Acton, the "undying penalty which history has the power to inflict on wrong." 7

Plagiarism may include copyright violations, but the two are conceptually independent. Massive plagiarism may not involve a single instance of copyright infringement. Copyright is a property right defined by statute. In general, copyrighted materials can only be reproduced with permission of the copyright holder, but the "fair use exception" in the law permits quotation from most scholarly works. Plagiarism is first and foremost an ethical matter, and whether or not permission is required or obtained for use of another's work, the rules for source references and against impermissible copying or borrowing apply whether or not the source is under copyright protection.

The Statement on Standards of Professional Conduct prepared by the AHA reminds us that plagiarism "takes many forms." These may include "the limited borrowing, without attribution, of another person's distinctive and significant research findings . . . or an extended borrowing even with attribution." 8 The bottom line is: work presented as original must be original; phrasings and research findings derived from others must be credited to others or the entire scholarly enterprise is undermined.

Historians adhere to these standards with the full knowledge that not everyone has the same attitude toward plagiarism as the historical profession. Some observers have noted that plagiarism may not only be common in painting, architecture, music, literature, and other forms of fine artistic expression; it is often regarded as a form of compliment. Uncredited borrowing occurs in popular art forms with disconcerting regularity. One best selling mystery-adventure novel based on a supposed code in the work of Leonardo Da Vinci relies upon and repeats the discoveries of many art historians, but neither mentions nor cites any of them. 9 Folk music artists routinely rewrite and rearrange their predecessor's tunes in what Pete Seeger has called, according to Arlo Guthrie, "'the folk process.'" 10 Legal writing is more accommodating to plagiarism than historians are. According to one leading legal scholar, "an individual act of plagiarism usually does little or no harm," 11 a perspective perhaps influenced by the fact that appellate judges' legal opinions are supposed to be derivative, based on lines of precedent elucidated in earlier judicial decisions.

At the same time, any realistic treatment of this highly complex and long-lived issue within the historical profession must include descriptive as well as prescriptive language. That is to say, one should not ignore the existence of long accepted usages, conventions, and occupationally mandated variations, nor the evolution of our standards in this area of professional ethics. 12

I. Avoiding Plagiarism

The first line of defense against plagiarism is the author. Even the most original historical scholarship rests in part upon earlier (secondary) studies. Historians should always give credit to those whose work they have consulted and to those who render assistance in the course of our work. Whether the form of circulation of historians' work is an article, book, museum exhibit, or other kind of publication, historians recognize scholarly debts in three ways: exact quotation, paraphrase, and general citations to works consulted. By their care and integrity in crediting these sources and by limiting the extent and monitoring the form of their copying or borrowing from these sources historians both avert the suspicion of plagiarism and avoid its commission.

All exact reproductions of another's words (direct quotations) should appear within quotation marks, or if in a block quotation, set off at the margins. All missing material from within the quotation must be indicated by ellipses. No words may be added except in square brackets. The order of the passages in the original may be altered by the author of the new work for literary or argumentative purposes, so long as the reference notes indicate the order of the passages in the original. The source of every direct quotation must either be cited in the text and fully described in a "works cited" section at the end of the piece (MLA style), or referenced with foot or end notes ( Chicago Manual of Style ). Publications without in-text reference apparatus (most textbooks, for example) should report all secondary sources in the text or a bibliographical essay. Failure to put the borrowing of exact words in quotation marks; failure to cite the source of the quotation in the reference notes with sufficient precision for a reader to check the quotation; and changing a few words in a nearly exact replication of another's text and then not giving any reference, whether inadvertently, through negligence, or intentionally, may be read as plagiarism. But even with full and correct references to the source, historians must take care not to borrow or copy excessively from any one source or group of sources.

A special case arises when an author quotes from a primary source quoted in part or fully in a secondary source. If the author relies on the secondary source for a portion or the entire text of the primary source, citation of the latter should take something like the form "A [the primary source], quoted in B [the secondary source]." This alerts the reader that the author has borrowed the quotation from the secondary source and has not consulted the original source. The author should not simply cite the primary source. If the author, however, guided by a secondary source, finds the entire primary source in the original, reads it, and then uses some portion of it, there is no need to cite the secondary source in which it was initially encountered. The purpose of scholarly citation of primary sources is to enable other readers to find and examine them for themselves. By contrast, in no case whatsoever should an author simply reproduce another author's documentary evidence with or without that author's reference notes, without fully crediting the author, giving the impression that the borrower had done the research. This is another form of plagiarism.

The second common form of indebtedness to another work is the paraphrase, the rephrasing of another's arguments or findings in one's own language. When in doubt, one should always prefer quotations to paraphrases, but there are reasons for preferring paraphrase to quotation including the inelegance of the prose in the original, the author's desire to avoid stringing together a series of long quotations, and the need to blend into a single paragraph the arguments of many secondary sources. Authors must paraphrase with great care if they are to avoid falling into plagiarism, for paraphrasing lends itself to a wide range of errors. In particular, a paraphrase, particularly after some time has passed in the course of research, may be mistaken by the author for his or her own idea or language and reappear in the author's piece without any attribution. Mosaic paraphrases patching together quotations from a variety of secondary sources, and close paraphrases, wherein the author changes a word or two and reuses a passage from another author without quotation marks, also constitute plagiarism.

In print, all paraphrases, no matter how long or how many works are paraphrased, must be followed by citations to the sources that are as clear and precise as those provided for a direct quotation. The citation should refer to the exact page(s) from which the material was taken, rather than a block of pages or a list of pages containing the material somewhere. If the material comes from a web site (for example another teacher's original lecture notes on an open web site), citation should include the entire web address and the date that it was accessed.

The third common manner of giving credit for a scholarly debt is the general citation to work in the field. Sometimes this will follow the author's summary of arguments or evidence from a number of works. In textbooks, a single paragraph may encapsulate three or four prior publications on the topic. All works an author consults should be either cited in the reference apparatus or in the bibliography. If particular pages were consulted, these should appear in references. By contrast, works not consulted by the author, even though they may be relevant to the topic, should not be cited. Such a citation would give the false impression that the author had used the work. By the same token, when an author makes a general citation to a work that contradicts the author's findings or conclusions, that fact should be noted in the citation.

If an author employs research assistants, their errors—for example the omission of quotation marks around a direct quotation or the omission of a reference at the end of a paraphrase—become the author's responsibility. The general rule that the supervisor is responsible for the acts of the employee applies here. What is more, the author had the chance, before publication, to review the entire text, and with that last clear chance goes the onus for all errors.

II. Conventions and Usages

Often it is hard to determine where plagiarism has occurred. Readers may disagree whether and how often an author has crossed the line between the permissible and the impermissible. Another way to formulate this general issue is that historians' use of others' work lies along a spectrum, a "continuum of intellectual indebtedness" in the words of William Cronon, in which possible misconduct in each work must be weighed on its own merits.

I would add to this another dimension ruled along an axis of long-established usages and conventions. As the AHA Statement on Standards reminds us, "historical knowledge is cumulative, and thus in some contexts—such as textbooks, encyclopedia articles, or broad syntheses—the form of attribution, and the permissible extent of dependence on prior scholarship, citation, and other forms of attribution, will differ from what is expected in more limited monographs." 13

The advice on avoiding plagiarism in part I of the essay and the suggestions on dealing with plagiarism in part II apply with particular force to those works whose authors promulgate them as original contributions to scholarship, offering new findings, interpretations, and approaches. 14 Conversely, personal letters, working documents, or in-house memos thus rarely exhibit the formalities of citation. Victoria Harden of the AHA Task Force on Public History suggests that this is particularly true when they are prepared as précis or summaries of existing scholarship by subordinates for their superiors, often on short notice. If at some time the author presents these reports or statements as original contributions to knowledge, or offers them as credentials for hiring, promotion, employment benefits, fellowships, or prizes, they must give credit to all sources consulted.

Certain kinds of historical writing or oral presentation of historical materials for general public consumption also commonly omit reference notes. Such materials may include guidebooks, captions at museum exhibitions, pamphlets distributed at historical sites, and talks or performances by re-enactors or historical interpreters. In the context in which these works are used or performed, their utility might be impaired if their authors or presenters were required to credit their scholarly debts. At the same time, it would be ideal if print or electronic versions of these materials include recognition, in some form, of the contributions of individuals to them and the scholarly sources on which they relied.

Lectures by history teachers to their classes and speeches at public meetings rarely include explicit references to the secondary sources on which the lecturer relies, particularly if the lecture is not presented as an original work of independent scholarship and the materials borrowed from others constitute only a small portion of the whole. The debt that teachers of history owe to their own teachers is pervasive and often results in lectures that borrow structure and theme from those mentors. While acknowledgment of this debt will never go out of fashion, it is commonly omitted.

Textbooks, like lectures to classes, are assumed to be cumulative and synthetic. In fact authors of textbooks rarely quote or cite precisely each secondary source they have used, and the topical structure and rhetorical formulae of new textbooks bear a remarkable similarity to older ones. It would be best if textbook authors limited their borrowing from any one secondary source and cited in a bibliography all the sources used. It is mandatory that any direct quotation from another work (excluding of course prior editions of the same textbook) be correctly identified.

What is generally termed popular history—journalistic accounts, memoirs and autobiographies, and articles by professional historians in general or popular journals of opinion, for example—rarely conforms to the same standards of citation as scholarly monographs and interpretive essays. Many popular histories, for example, have only a short list of works consulted. But wholesale borrowing from another work, even with attribution, is unacceptable. Ideas themselves cannot be plagiarized, but authors may not claim as their own the full-dress presentation, according to the AHA Statement on Standards , of "another person's distinctive and significant research findings, hypotheses, theories, rhetorical strategies, or interpretations." 15

A particular case of book-length scholarship without footnotes or endnotes arises in some noteworthy series of books—for example, the Library of American Biography from Penguin/Viking, and the Landmark Law Cases and American Society from the University Press of Kansas. These are original contributions to knowledge by leading scholars designed primarily for classroom use. Series authors and editors are nevertheless very careful to observe the rules against plagiarism in these books to avert even the suspicion of surreptitious borrowing or copying from other works.

Common understandings, widely shared ideas, dates, names, places, and events in history do not need to be referenced, even if they were obtained from a particular source in print. For example, one does not need to cite a source to say that Washington was a Virginian, or that the Declaration of Independence was signed in 1776. Similarly, catch phrases, "conventional wisdom" (itself a phrase coined by the prizewinning economist, John Kenneth Galbraith), and even longer quotations borrowed from sources in common currency, like Shakespeare, the Bible, and Monty Python can be repeated without references. 16 But if an argument or thesis is unique to a secondary source, and not a matter of general currency, the source should be cited.

The foregoing paragraphs in section II refer to secondary or scholarly sources. The appropriate use and citation of primary sources raises slightly different questions. It is plagiarism to take as one's own work portions of primary sources without citation, or through excessive borrowing even with citation; but when these are clearly indicated as the work of another, formal citation of the place where the author found the primary source (a printed documentary edition or the archive, for instance) is sometimes omitted. I believe that historians should treat primary sources with the same scholarly care as they apply to secondary sources, indicating exactly where they found a primary source (so that those who follow them can find it as well). Failure to fully credit a primary source (failing, for example, to give the title of the docket book or the file paper collection as well as the courthouse when citing a legal document) may not only lead to confusion among readers and suspicions of research misconduct, it lends itself to plagiarism.

A final, somewhat special case involves work for hire. If an author hires research assistants or "ghost writers," and by the terms of their contracts the latter agree that their names will not appear on the work as its author, they cannot argue that the final product plagiarizes their work. As Linda Shopes of the Task Force on Public History suggests, it is always good practice, however, for any supervisor who uses in his or her own work the research or writing of an employee to credit that employee by name. When an author relies upon the research of others not hired for that purpose—students in the author's class or individuals whose graduate studies the author is directing, for example—and those researchers' own language (as opposed to the documents or other evidence they find) is adopted or adapted by the author, it is unethical not to give credit to the researcher. This may be done in the acknowledgments section of the publication, in a note or notes, or in the text. If the author has depended upon the researcher to write up the results of the research and then uses these reports verbatim, the researcher should be given co-authorship. 17

III. Detecting Plagiarism

Both academic and lay readers rely on the integrity of scholars. Authors owe it to their audiences as well as to themselves to avoid even a hint of plagiarism and are the best detectors of inadvertent mistakes in attribution or excessive copying even with references. This is true from the inception of the research to the closing stages of preparation of manuscripts for publication. Before any piece of scholarly research is presented orally, circulated, or submitted to a publisher, the author should review it carefully for plagiarism. Returning to the research notes and laying them against the text may reveal errors. The author should look for omissions and commissions that might have slipped into the successive drafts over time.

All scholarly journals and academic presses will send the work out to readers ("referees") to advise on publication, but referees cannot catch every instance of questionable use of secondary sources; nor should referees be held responsible if plagiarism slips past them. In particular, citation checking is not ordinarily part of their job. If the publisher (as is true of most trade houses) does not employ outside readers, the author has to be doubly careful. Book reviewers (or referees asked to help with hiring or tenure and promotion decisions) may uncover instances of plagiarism, but because that is not the primary reason for which they are reading the author's work, one cannot expect them to catch plagiarists in the act.

Despite all the reasons for which authors should and can avoid plagiarism, it occurs. The suspicion that a work contains plagiarism and its subsequent exposure are not pleasant occasions. Historical scholarship depends upon trust. Readers and publishers both rely on authors' claims of originality (indeed, book publishing contracts require authors to "warranty" that they have not plagiarized any other work).

In all cases of suspected plagiarism, the single most effective method of detection is the meticulous, side-by-side comparison of texts. This parallel reading of source (original) text and target (new) text will not absolutely prove plagiarism except in the most egregious cases, but it can raise or allay the level of suspicion. A reader comparing texts should not just look for similar words or phrases (for example, groups of three or four words) as these may in fact come from more than one author using the same primary sources or from the argot of a specific field. Instead, the reader should concentrate on unusual phrasing, for example uncommon verbs and unique combinations of modifiers. An example of parallel text comparison appears on the AHA web site. 18

If the reader of parallel texts finds a few examples of questionable practices in a long work, they may, with the profession's accustomed charity, be attributed to mere coincidence. In the uncovering of plagiarism, as in all misconduct, one presumes innocence. But discovery of plagiarism throughout a manuscript or plagiarism in a series of publications suggests wanton and cynical disregard of ethical and professional standards, and will not be forgiven. The discovery may take years, but plagiarism is an offense that cannot be hidden forever.

—Peter Hoffer, Distinguished Research Professor at the University of Georgia, is a member of the Professional Division of the AHA.

Acknowledgments

The author is grateful to Maureen Murphy Nutting, Susan Mosher Stuard, and Denise Youngblood, members of the AHA's Professional Division, and William Cronon, its vice president; Arnita Jones, executive director of the AHA; Stanley N. Katz, chair of the AHA Task Force on Intellectual Property, and its members Michael Les Benedict and Michael Grossberg; the AHA Task Force on Public History, its chair, Linda Shopes, and its members Victoria A. Harden and Jamil Zainaldin; James Grossman, vice president for research and education at the Newberry Library; Nan McMurry, history and social science acquisitions librarian, University of Georgia Libraries; Lewis Bateman, senior acquisitions editor, Cambridge University Press; Fred Woodward, director, and Michael Briggs, editor in chief at the University Press of Kansas; Charles Grench, assistant director, and Amanda McMillan, assistant editor, University of North Carolina Press; Ashley Dodge, senior editor, Longman Publishing, College Division; Robert Brugger, senior editor, Johns Hopkins University Press; Williamjames Hoffer, Seton Hall University; and the members of the University of Georgia history colloquium for assistance in the preparation of this document.

Part 2 of this essay will be published in the March 2004 issue of Perspectives .

1. Maimonides [Moses Ben Maimon], "Introductory Remarks on Method," The Guide for the Perplexed trans. M. Friedlander 2nd rev. ed. ([1904] reprinted New York: Dover, 1956), 9.

2. Black's Law Dictionary , 7th ed., Bryan A. Garner, ed., 1170.

3. Joseph Gibaldi, MLA Handbook for Writers of Research Papers, 6th ed. (New York, 2003), 66.

4. Black's Law Dictionary, 1170, quoting Paul Goldstein, Copyright's Highway, 12.

5. The Statement on Standards of Professional Conduct (Washington, D.C.: American Historical Association, 2003), 10.

6. In this we are in accord with the Modern Language Association; see Gibaldi, MLA Handbook , 66.

7. John Edward Emerich Acton, A Lecture on the Study of History, Delivered at Cambridge , June 11, 1895 (London: Macmillan, 1895), 63.

8. Statement on Standards , 10. See below for examples of these.

9. Dan Brown, The Da Vinci Code: A Novel (New York: Doubleday, 2003). The book's "Acknowledgments" (n.p.) has only this: "My thanks also to Water Street Book Store for tracking down so many of my research books" and does not mention the individual titles and authors.

10. Arlo Guthrie quoted in Jon Pareles, "Critic's Notebook: Honoring Alan Lomax, Folk Music Crusader," the New York Times April 14, 2003, E3.

11. Richard A. Posner, "The Truth About Plagiarism," Newsday , May 18, 2003, reprinted at www.law.uchicago.edu/news/ posner-r-plagiarism.html (accessed May 1, 2003).

12. See, for example, Anthony Grafton, The Footnote: A Curious History (Cambridge, Mass.: Harvard University Press, 1997), 190–91.

13. AHA, Statement on Standards , 10. Older usages and conventions of citation were often not as precise or complete as those for citation in use today. For example, Oscar Handlin's Pulitzer Prize-winning The Uprooted (Boston: Little, Brown, 1937), Daniel Boorstin's Bancroft Prize-winning The Americans: The Colonial Experience (New York: Knopf, 1958) and his Parkman Prize-winning The Americans: The National Experience (New York: Knopf, 1965) did not have any notes. Neither did Perry Miller's much admired The New England Mind: From Colony to Province (Boston: Beacon, 1954). They merely had detailed bibliographies without page references to the quotations in the text.

14. This essay does not consider the question of falsification of research findings. A good survey of the issues raised in these cases appears in Ellen Altman and Peter Hernon, eds., Research Misconduct: Issues, Implications, and Strategies (Greenwich, Ct., 1997).

15. Statement on Standards, 10.

16. Maurice Isserman, "Plagiarism: A Lie of the Mind," The Chronicle Review: Chronicle of Higher Education , May 2, 2003, B12–B13. "No one expects the Spanish Inquisition."

17. In the area of electronic publishing, editors are taking an increasingly active role in areas that traditionally were categorized as "authorship." See Kate Wittenberg, "Scholarly Editing in the Digital Age," Chronicle of Higher Education , June 20, 2003, B12. It is not clear to what extent this development will continue, nor whether it will raise questions of proprietorship of electronically published scholarship.

18. See Susan Mosher Stuard and William Cronon, "How to Detect and Demonstrate Plagiarism."

Tags: Viewpoints

The American Historical Association welcomes comments in the discussion area below, at AHA Communities , and in letters to the editor . Please read our commenting and letters policy before submitting.

Please read our commenting and letters policy before submitting.

Facebook

IMAGES

  1. The 7 types of plagiarism

    doing a replication research study is another form of plagiarism

  2. Common Types of Plagiarism

    doing a replication research study is another form of plagiarism

  3. What is Plagiarism?

    doing a replication research study is another form of plagiarism

  4. 10 Common Types of Plagiarism

    doing a replication research study is another form of plagiarism

  5. Plagiarism Report

    doing a replication research study is another form of plagiarism

  6. A complete guide on plagiarism with effective examples

    doing a replication research study is another form of plagiarism

VIDEO

  1. Analysis Plagiarism (Research Methodology And IPR)

  2. Plagiarism And It's Types And Avoiding Plagiarism (ENGLISH FOR RESEARCH PAPER WRITING)

  3. Replica Method

  4. Replication Meaning

  5. Difference between Similarity and Plagiarism? What is plagiarism. How to remove plagiarism?

  6. Agile Effort Estimation Have We Solved the Problem Yet Insights From Another Replication Study GPT

COMMENTS

  1. Ten simple rules for designing and conducting undergraduate replication projects

    In scientific research, a replication is commonly defined as a study that is conducted using the same or similar methods as the original investigation, in order to evaluate whether consistent results can be obtained [].Often carried out by researchers independent from the original investigators, replications are designed to assess the robustness and generalizability of the original findings [2,3].

  2. The Challenge of Repeating Methods While Avoiding Plagiarism

    Legal. For researchers, the methods section of a paper presents a unique set of challenges when trying to be original. The reason is that, even when doing original research, authors will inevitably find themselves repeating steps that they or others have taken before. Whether they are using an established technique to study something new or ...

  3. How and Why to Conduct a Replication Study

    Replication is a research methodology used to verify, consolidate, and advance knowledge and understanding within empirical fields of study. A replication study works toward this goal by repeating a study's methodology with or without changes followed by systematic comparison to better understand the nature, repeatability, and generalizability of its findings.

  4. Insight into modern-day plagiarism: The science of pseudo research

    Complete plagiarism or stealing. This is a type of extreme intellectual theft, in which the plagiarist takes research, an unpublished manuscript or work of another researcher and submits claiming his/her own [ 16, 18 ]. In today's digital world of internet, plagiarism had crept to extreme extents.

  5. Replication of the methods section in biosciences papers: is it plagiarism?

    To find out whether replication of methods section in biosciences papers is a kind of plagiarism, the authors firstly surveyed the behavior of authors when writing the methods section in their published papers. Then the descriptions of one well-established method in randomly selected papers published in eight top journals were analyzed using CrossCheck to identify the extent of duplication ...

  6. What is replication?

    According to common understanding, replication is repeating a study's procedure and observing whether the prior finding recurs [ 7 ]. This definition of replication is intuitive, easy to apply, and incorrect. The problem is this definition's emphasis on repetition of the technical methods—the procedure, protocol, or manipulated and ...

  7. 5 Replicability

    5. Replicability. Replicability is a subtle and nuanced topic, especially when discussed broadly across scientific and engineering research. An attempt by a second researcher to replicate a previous study is an effort to determine whether applying the same methods to the same scientific question produces similar results.

  8. Seven principles of effective replication studies ...

    To be fair, the claim that most published management research is false or nonreplicable is probably an exaggeration. However, without replication in particular, the social sciences (like management research) are vulnerable to overestimating effect sizes (Camerer et al. 2018).While in management research it is uncommon to conduct explicit replication studies, many original findings are ...

  9. Reassessing Academic Plagiarism

    Replication plagiarism (e.g., "cut and paste" plagiarism) occurs when a person uses another's words verbatim or near-verbatim, without providing the original author with adequate credit (Dougherty, 2020, 3). Bypass plagiarism occurs when a person (C) takes another person's (B's) account of the views of a third party (A) and passes it ...

  10. Replicability and replication in the humanities

    A large number of scientists and several news platforms have, over the last few years, been speaking of a replication crisis in various academic disciplines, especially the biomedical and social sciences. This paper answers the novel question of whether we should also pursue replication in the humanities. First, I create more conceptual clarity by defining, in addition to the term ...

  11. Research Reproducibility and Credibility

    A Credibility Gap. Re-examination of a failure to replicate a study results still starts with the assumption that the initial study is solid. Scientists will double and triple check the protocol of their replication study first, before moving on to examine the original study for errors in methodology and/or analysis.

  12. Replication Study

    A replication study involves repeating a study using the same methods but with different subjects and experimenters. The researchers will apply the existing theory to new situations in order to determine generalizability to different subjects, age groups, races, locations, cultures or any such variables. The main determinants of this study include:

  13. Is replication relevant for qualitative research?

    Replication, broadly defined as the repetition of a research study, generally among different subjects and/or situations, is commonly conducted in quantitative research with the aim of determining whether the basic findings of the original study can be generalized to other circumstances. Qualitative researchers have for many years objected to the notion of replicability, seeing it as being ...

  14. Reproducibility vs Replicability

    Reproducibility vs Replicability | Difference & Examples. Published on August 19, 2022 by Kassiani Nikolopoulou.Revised on June 22, 2023. The terms reproducibility, repeatability, and replicability are sometimes used interchangeably, but they mean different things.. A research study is reproducible when the existing data is reanalysed using the same research methods and yields the same results.

  15. Why, when, and how to Replicate Research

    Summary This chapter contains sections titled: Background How to Conduct Replications Project Ideas and Resources Study Questions References

  16. Rigorous research practices improve scientific replication

    The four labs conducted original research using these recommended rigor-enhancing practices. Then they submitted their work to the other labs for replication. Overall, of the 16 studies produced by the four labs during the five-year project, replication was successful in 86 percent of the attempts.

  17. Why is Replication in Research Important?

    Replication in research is important because it allows for the verification and validation of study findings, building confidence in their reliability and generalizability. It also fosters scientific progress by promoting the discovery of new evidence, expanding understanding, and challenging existing theories or claims. Updated on June 30, 2023.

  18. Reflections on Plagiarism, Part 1: A Guide for the Perplexed

    That is the purpose of this essay, the first installment of two on the subject of plagiarism. Plagiarism is commonly defined as the appropriation of another's work as one's own. 2 Some definitions add the purposive element of gaining an advantage of some kind. 3 Others include the codicil, "with the intent to deceive."

  19. Doing Replication Research in Applied Linguistics

    Description. Doing Replication Research in Applied Linguistics is the only book available to specifically discuss the applied aspects of how to carry out replication studies in Applied Linguistics. This text takes the reader from seeking out a suitable study for replication, through deciding on the most valuable form of replication approach, to ...

  20. Publishing a replication study

    Answer: Yes, you can definitely conduct research that has been previously published and write a paper to validate the findings of the original work. Such studies are called replication studies, and they are considered important for the progress of science. Replication studies help in confirming that the findings of the original study are ...

  21. Research methods. Ch 14. Replication, Generalization, and the Real

    The three major types of replication studies. 1. direct replication. 2. conceptual replication. 3. replication-plus-extension. direct replication. a replication study in which researchers repeat the original study as closely as possible to see whether the original effect shows up in the newly collected data.

  22. Doing a replication research study is another form of

    4. Doing a replication research study is another form of plagiarism. a.)True b.)False. b. False. 5. Which of the following would not be considered an appropriate problem to research for our employability / graduateness project? a.)Is there a difference in the employability of HMEMS80 students studying full-time and students studying part-time ...

  23. Nursing Research Quiz 2 Flashcards

    a. due to concerns of plagiarism, research study should never be a replication of another study. b. Research is an ongoing process that builds on previous knowledge. c. Classic information rarely supports new research projects. d. And original research study does not need a literature review. Click the card to flip 👆. b.