science made simple logo

The Scientific Method by Science Made Simple

Understanding and using the scientific method.

The Scientific Method is a process used to design and perform experiments. It's important to minimize experimental errors and bias, and increase confidence in the accuracy of your results.

science experiment

In the previous sections, we talked about how to pick a good topic and specific question to investigate. Now we will discuss how to carry out your investigation.

Steps of the Scientific Method

  • Observation/Research
  • Experimentation

Now that you have settled on the question you want to ask, it's time to use the Scientific Method to design an experiment to answer that question.

If your experiment isn't designed well, you may not get the correct answer. You may not even get any definitive answer at all!

The Scientific Method is a logical and rational order of steps by which scientists come to conclusions about the world around them. The Scientific Method helps to organize thoughts and procedures so that scientists can be confident in the answers they find.

OBSERVATION is first step, so that you know how you want to go about your research.

HYPOTHESIS is the answer you think you'll find.

PREDICTION is your specific belief about the scientific idea: If my hypothesis is true, then I predict we will discover this.

EXPERIMENT is the tool that you invent to answer the question, and

CONCLUSION is the answer that the experiment gives.

Don't worry, it isn't that complicated. Let's take a closer look at each one of these steps. Then you can understand the tools scientists use for their science experiments, and use them for your own.

OBSERVATION

observation  magnifying glass

This step could also be called "research." It is the first stage in understanding the problem.

After you decide on topic, and narrow it down to a specific question, you will need to research everything that you can find about it. You can collect information from your own experiences, books, the internet, or even smaller "unofficial" experiments.

Let's continue the example of a science fair idea about tomatoes in the garden. You like to garden, and notice that some tomatoes are bigger than others and wonder why.

Because of this personal experience and an interest in the problem, you decide to learn more about what makes plants grow.

For this stage of the Scientific Method, it's important to use as many sources as you can find. The more information you have on your science fair topic, the better the design of your experiment is going to be, and the better your science fair project is going to be overall.

Also try to get information from your teachers or librarians, or professionals who know something about your science fair project. They can help to guide you to a solid experimental setup.

research science fair topic

The next stage of the Scientific Method is known as the "hypothesis." This word basically means "a possible solution to a problem, based on knowledge and research."

The hypothesis is a simple statement that defines what you think the outcome of your experiment will be.

All of the first stage of the Scientific Method -- the observation, or research stage -- is designed to help you express a problem in a single question ("Does the amount of sunlight in a garden affect tomato size?") and propose an answer to the question based on what you know. The experiment that you will design is done to test the hypothesis.

Using the example of the tomato experiment, here is an example of a hypothesis:

TOPIC: "Does the amount of sunlight a tomato plant receives affect the size of the tomatoes?"

HYPOTHESIS: "I believe that the more sunlight a tomato plant receives, the larger the tomatoes will grow.

This hypothesis is based on:

(1) Tomato plants need sunshine to make food through photosynthesis, and logically, more sun means more food, and;

(2) Through informal, exploratory observations of plants in a garden, those with more sunlight appear to grow bigger.

science fair project ideas

The hypothesis is your general statement of how you think the scientific phenomenon in question works.

Your prediction lets you get specific -- how will you demonstrate that your hypothesis is true? The experiment that you will design is done to test the prediction.

An important thing to remember during this stage of the scientific method is that once you develop a hypothesis and a prediction, you shouldn't change it, even if the results of your experiment show that you were wrong.

An incorrect prediction does NOT mean that you "failed." It just means that the experiment brought some new facts to light that maybe you hadn't thought about before.

Continuing our tomato plant example, a good prediction would be: Increasing the amount of sunlight tomato plants in my experiment receive will cause an increase in their size compared to identical plants that received the same care but less light.

This is the part of the scientific method that tests your hypothesis. An experiment is a tool that you design to find out if your ideas about your topic are right or wrong.

It is absolutely necessary to design a science fair experiment that will accurately test your hypothesis. The experiment is the most important part of the scientific method. It's the logical process that lets scientists learn about the world.

On the next page, we'll discuss the ways that you can go about designing a science fair experiment idea.

The final step in the scientific method is the conclusion. This is a summary of the experiment's results, and how those results match up to your hypothesis.

You have two options for your conclusions: based on your results, either:

(1) YOU CAN REJECT the hypothesis, or

(2) YOU CAN NOT REJECT the hypothesis.

This is an important point!

You can not PROVE the hypothesis with a single experiment, because there is a chance that you made an error somewhere along the way.

What you can say is that your results SUPPORT the original hypothesis.

If your original hypothesis didn't match up with the final results of your experiment, don't change the hypothesis.

Instead, try to explain what might have been wrong with your original hypothesis. What information were you missing when you made your prediction? What are the possible reasons the hypothesis and experimental results didn't match up?

Remember, a science fair experiment isn't a failure simply because does not agree with your hypothesis. No one will take points off if your prediction wasn't accurate. Many important scientific discoveries were made as a result of experiments gone wrong!

A science fair experiment is only a failure if its design is flawed. A flawed experiment is one that (1) doesn't keep its variables under control, and (2) doesn't sufficiently answer the question that you asked of it.

Search This Site:

Science Fairs

  • Introduction
  • Project Ideas
  • Types of Projects
  • Pick a Topic
  • Scientific Method
  • Design Your Experiment
  • Present Your Project
  • What Judges Want
  • Parent Info

Recommended *

  • Sample Science Projects - botany, ecology, microbiology, nutrition

scientific method book

* This site contains affiliate links to carefully chosen, high quality products. We may receive a commission for purchases made through these links.

  • Terms of Service

Copyright © 2006 - 2023, Science Made Simple, Inc. All Rights Reserved.

The science fair projects & ideas, science articles and all other material on this website are covered by copyright laws and may not be reproduced without permission.

When you choose to publish with PLOS, your research makes an impact. Make your work accessible to all, without restrictions, and accelerate scientific discovery with options like preprints and published peer review that make your work more Open.

  • PLOS Biology
  • PLOS Climate
  • PLOS Complex Systems
  • PLOS Computational Biology
  • PLOS Digital Health
  • PLOS Genetics
  • PLOS Global Public Health
  • PLOS Medicine
  • PLOS Mental Health
  • PLOS Neglected Tropical Diseases
  • PLOS Pathogens
  • PLOS Sustainability and Transformation
  • PLOS Collections
  • How to Write Discussions and Conclusions

How to Write Discussions and Conclusions

The discussion section contains the results and outcomes of a study. An effective discussion informs readers what can be learned from your experiment and provides context for the results.

What makes an effective discussion?

When you’re ready to write your discussion, you’ve already introduced the purpose of your study and provided an in-depth description of the methodology. The discussion informs readers about the larger implications of your study based on the results. Highlighting these implications while not overstating the findings can be challenging, especially when you’re submitting to a journal that selects articles based on novelty or potential impact. Regardless of what journal you are submitting to, the discussion section always serves the same purpose: concluding what your study results actually mean.

A successful discussion section puts your findings in context. It should include:

  • the results of your research,
  • a discussion of related research, and
  • a comparison between your results and initial hypothesis.

Tip: Not all journals share the same naming conventions.

You can apply the advice in this article to the conclusion, results or discussion sections of your manuscript.

Our Early Career Researcher community tells us that the conclusion is often considered the most difficult aspect of a manuscript to write. To help, this guide provides questions to ask yourself, a basic structure to model your discussion off of and examples from published manuscripts. 

similarities between hypothesis and conclusion

Questions to ask yourself:

  • Was my hypothesis correct?
  • If my hypothesis is partially correct or entirely different, what can be learned from the results? 
  • How do the conclusions reshape or add onto the existing knowledge in the field? What does previous research say about the topic? 
  • Why are the results important or relevant to your audience? Do they add further evidence to a scientific consensus or disprove prior studies? 
  • How can future research build on these observations? What are the key experiments that must be done? 
  • What is the “take-home” message you want your reader to leave with?

How to structure a discussion

Trying to fit a complete discussion into a single paragraph can add unnecessary stress to the writing process. If possible, you’ll want to give yourself two or three paragraphs to give the reader a comprehensive understanding of your study as a whole. Here’s one way to structure an effective discussion:

similarities between hypothesis and conclusion

Writing Tips

While the above sections can help you brainstorm and structure your discussion, there are many common mistakes that writers revert to when having difficulties with their paper. Writing a discussion can be a delicate balance between summarizing your results, providing proper context for your research and avoiding introducing new information. Remember that your paper should be both confident and honest about the results! 

What to do

  • Read the journal’s guidelines on the discussion and conclusion sections. If possible, learn about the guidelines before writing the discussion to ensure you’re writing to meet their expectations. 
  • Begin with a clear statement of the principal findings. This will reinforce the main take-away for the reader and set up the rest of the discussion. 
  • Explain why the outcomes of your study are important to the reader. Discuss the implications of your findings realistically based on previous literature, highlighting both the strengths and limitations of the research. 
  • State whether the results prove or disprove your hypothesis. If your hypothesis was disproved, what might be the reasons? 
  • Introduce new or expanded ways to think about the research question. Indicate what next steps can be taken to further pursue any unresolved questions. 
  • If dealing with a contemporary or ongoing problem, such as climate change, discuss possible consequences if the problem is avoided. 
  • Be concise. Adding unnecessary detail can distract from the main findings. 

What not to do

Don’t

  • Rewrite your abstract. Statements with “we investigated” or “we studied” generally do not belong in the discussion. 
  • Include new arguments or evidence not previously discussed. Necessary information and evidence should be introduced in the main body of the paper. 
  • Apologize. Even if your research contains significant limitations, don’t undermine your authority by including statements that doubt your methodology or execution. 
  • Shy away from speaking on limitations or negative results. Including limitations and negative results will give readers a complete understanding of the presented research. Potential limitations include sources of potential bias, threats to internal or external validity, barriers to implementing an intervention and other issues inherent to the study design. 
  • Overstate the importance of your findings. Making grand statements about how a study will fully resolve large questions can lead readers to doubt the success of the research. 

Snippets of Effective Discussions:

Consumer-based actions to reduce plastic pollution in rivers: A multi-criteria decision analysis approach

Identifying reliable indicators of fitness in polar bears

  • How to Write a Great Title
  • How to Write an Abstract
  • How to Write Your Methods
  • How to Report Statistics
  • How to Edit Your Work

The contents of the Peer Review Center are also available as a live, interactive training session, complete with slides, talking points, and activities. …

The contents of the Writing Center are also available as a live, interactive training session, complete with slides, talking points, and activities. …

There’s a lot to consider when deciding where to submit your work. Learn how to choose a journal that will help your study reach its audience, while reflecting your values as a researcher…

User Preferences

Content preview.

Arcu felis bibendum ut tristique et egestas quis:

  • Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris
  • Duis aute irure dolor in reprehenderit in voluptate
  • Excepteur sint occaecat cupidatat non proident

Keyboard Shortcuts

6.6 - confidence intervals & hypothesis testing.

Confidence intervals and hypothesis tests are similar in that they are both inferential methods that rely on an approximated sampling distribution. Confidence intervals use data from a sample to estimate a population parameter. Hypothesis tests use data from a sample to test a specified hypothesis. Hypothesis testing requires that we have a hypothesized parameter. 

The simulation methods used to construct bootstrap distributions and randomization distributions are similar. One primary difference is a bootstrap distribution is centered on the observed sample statistic while a randomization distribution is centered on the value in the null hypothesis. 

In Lesson 4, we learned confidence intervals contain a range of reasonable estimates of the population parameter. All of the confidence intervals we constructed in this course were two-tailed. These two-tailed confidence intervals go hand-in-hand with the two-tailed hypothesis tests we learned in Lesson 5. The conclusion drawn from a two-tailed confidence interval is usually the same as the conclusion drawn from a two-tailed hypothesis test. In other words, if the the 95% confidence interval contains the hypothesized parameter, then a hypothesis test at the 0.05 \(\alpha\) level will almost always fail to reject the null hypothesis. If the 95% confidence interval does not contain the hypothesize parameter, then a hypothesis test at the 0.05 \(\alpha\) level will almost always reject the null hypothesis.

Example: Mean Section  

This example uses the Body Temperature dataset built in to StatKey for constructing a  bootstrap confidence interval and conducting a randomization test . 

Let's start by constructing a 95% confidence interval using the percentile method in StatKey:

  

The 95% confidence interval for the mean body temperature in the population is [98.044, 98.474].

Now, what if we want to know if there is enough evidence that the mean body temperature is different from 98.6 degrees? We can conduct a hypothesis test. Because 98.6 is not contained within the 95% confidence interval, it is not a reasonable estimate of the population mean. We should expect to have a p value less than 0.05 and to reject the null hypothesis.

\(H_0: \mu=98.6\)

\(H_a: \mu \ne 98.6\)

\(p = 2*0.00080=0.00160\)

\(p \leq 0.05\), reject the null hypothesis

There is evidence that the population mean is different from 98.6 degrees. 

Selecting the Appropriate Procedure Section  

The decision of whether to use a confidence interval or a hypothesis test depends on the research question. If we want to estimate a population parameter, we use a confidence interval. If we are given a specific population parameter (i.e., hypothesized value), and want to determine the likelihood that a population with that parameter would produce a sample as different as our sample, we use a hypothesis test. Below are a few examples of selecting the appropriate procedure. 

Example: Cheese Consumption Section  

Research question: How much cheese (in pounds) does an average American adult consume annually? 

What is the appropriate inferential procedure? 

Cheese consumption, in pounds, is a quantitative variable. We have one group: American adults. We are not given a specific value to test, so the appropriate procedure here is a  confidence interval for a single mean .

Example: Age Section  

Research question:  Is the average age in the population of all STAT 200 students greater than 30 years?

There is one group: STAT 200 students. The variable of interest is age in years, which is quantitative. The research question includes a specific population parameter to test: 30 years. The appropriate procedure is a  hypothesis test for a single mean .

Try it! Section  

For each research question, identify the variables, the parameter of interest and decide on the the appropriate inferential procedure.

Research question:  How strong is the correlation between height (in inches) and weight (in pounds) in American teenagers?

There are two variables of interest: (1) height in inches and (2) weight in pounds. Both are quantitative variables. The parameter of interest is the correlation between these two variables.

We are not given a specific correlation to test. We are being asked to estimate the strength of the correlation. The appropriate procedure here is a  confidence interval for a correlation . 

Research question:  Are the majority of registered voters planning to vote in the next presidential election?

The parameter that is being tested here is a single proportion. We have one group: registered voters. "The majority" would be more than 50%, or p>0.50. This is a specific parameter that we are testing. The appropriate procedure here is a  hypothesis test for a single proportion .

Research question:  On average, are STAT 200 students younger than STAT 500 students?

We have two independent groups: STAT 200 students and STAT 500 students. We are comparing them in terms of average (i.e., mean) age.

If STAT 200 students are younger than STAT 500 students, that translates to \(\mu_{200}<\mu_{500}\) which is an alternative hypothesis. This could also be written as \(\mu_{200}-\mu_{500}<0\), where 0 is a specific population parameter that we are testing. 

The appropriate procedure here is a  hypothesis test for the difference in two means .

Research question:  On average, how much taller are adult male giraffes compared to adult female giraffes?

There are two groups: males and females. The response variable is height, which is quantitative. We are not given a specific parameter to test, instead we are asked to estimate "how much" taller males are than females. The appropriate procedure is a  confidence interval for the difference in two means .

Research question:  Are STAT 500 students more likely than STAT 200 students to be employed full-time?

There are two independent groups: STAT 500 students and STAT 200 students. The response variable is full-time employment status which is categorical with two levels: yes/no.

If STAT 500 students are more likely than STAT 200 students to be employed full-time, that translates to \(p_{500}>p_{200}\) which is an alternative hypothesis. This could also be written as \(p_{500}-p_{200}>0\), where 0 is a specific parameter that we are testing. The appropriate procedure is a  hypothesis test for the difference in two proportions.

Research question:  Is there is a relationship between outdoor temperature (in Fahrenheit) and coffee sales (in cups per day)?

There are two variables here: (1) temperature in Fahrenheit and (2) cups of coffee sold in a day. Both variables are quantitative. The parameter of interest is the correlation between these two variables.

If there is a relationship between the variables, that means that the correlation is different from zero. This is a specific parameter that we are testing. The appropriate procedure is a  hypothesis test for a correlation . 

This is the Difference Between a Hypothesis and a Theory

What to Know A hypothesis is an assumption made before any research has been done. It is formed so that it can be tested to see if it might be true. A theory is a principle formed to explain the things already shown in data. Because of the rigors of experiment and control, it is much more likely that a theory will be true than a hypothesis.

As anyone who has worked in a laboratory or out in the field can tell you, science is about process: that of observing, making inferences about those observations, and then performing tests to see if the truth value of those inferences holds up. The scientific method is designed to be a rigorous procedure for acquiring knowledge about the world around us.

hypothesis

In scientific reasoning, a hypothesis is constructed before any applicable research has been done. A theory, on the other hand, is supported by evidence: it's a principle formed as an attempt to explain things that have already been substantiated by data.

Toward that end, science employs a particular vocabulary for describing how ideas are proposed, tested, and supported or disproven. And that's where we see the difference between a hypothesis and a theory .

A hypothesis is an assumption, something proposed for the sake of argument so that it can be tested to see if it might be true.

In the scientific method, the hypothesis is constructed before any applicable research has been done, apart from a basic background review. You ask a question, read up on what has been studied before, and then form a hypothesis.

What is a Hypothesis?

A hypothesis is usually tentative, an assumption or suggestion made strictly for the objective of being tested.

When a character which has been lost in a breed, reappears after a great number of generations, the most probable hypothesis is, not that the offspring suddenly takes after an ancestor some hundred generations distant, but that in each successive generation there has been a tendency to reproduce the character in question, which at last, under unknown favourable conditions, gains an ascendancy. Charles Darwin, On the Origin of Species , 1859 According to one widely reported hypothesis , cell-phone transmissions were disrupting the bees' navigational abilities. (Few experts took the cell-phone conjecture seriously; as one scientist said to me, "If that were the case, Dave Hackenberg's hives would have been dead a long time ago.") Elizabeth Kolbert, The New Yorker , 6 Aug. 2007

What is a Theory?

A theory , in contrast, is a principle that has been formed as an attempt to explain things that have already been substantiated by data. It is used in the names of a number of principles accepted in the scientific community, such as the Big Bang Theory . Because of the rigors of experimentation and control, its likelihood as truth is much higher than that of a hypothesis.

It is evident, on our theory , that coasts merely fringed by reefs cannot have subsided to any perceptible amount; and therefore they must, since the growth of their corals, either have remained stationary or have been upheaved. Now, it is remarkable how generally it can be shown, by the presence of upraised organic remains, that the fringed islands have been elevated: and so far, this is indirect evidence in favour of our theory . Charles Darwin, The Voyage of the Beagle , 1839 An example of a fundamental principle in physics, first proposed by Galileo in 1632 and extended by Einstein in 1905, is the following: All observers traveling at constant velocity relative to one another, should witness identical laws of nature. From this principle, Einstein derived his theory of special relativity. Alan Lightman, Harper's , December 2011

Non-Scientific Use

In non-scientific use, however, hypothesis and theory are often used interchangeably to mean simply an idea, speculation, or hunch (though theory is more common in this regard):

The theory of the teacher with all these immigrant kids was that if you spoke English loudly enough they would eventually understand. E. L. Doctorow, Loon Lake , 1979 Chicago is famous for asking questions for which there can be no boilerplate answers. Example: given the probability that the federal tax code, nondairy creamer, Dennis Rodman and the art of mime all came from outer space, name something else that has extraterrestrial origins and defend your hypothesis . John McCormick, Newsweek , 5 Apr. 1999 In his mind's eye, Miller saw his case suddenly taking form: Richard Bailey had Helen Brach killed because she was threatening to sue him over the horses she had purchased. It was, he realized, only a theory , but it was one he felt certain he could, in time, prove. Full of urgency, a man with a mission now that he had a hypothesis to guide him, he issued new orders to his troops: Find out everything you can about Richard Bailey and his crowd. Howard Blum, Vanity Fair , January 1995

And sometimes one term is used as a genus, or a means for defining the other:

Laplace's popular version of his astronomy, the Système du monde , was famous for introducing what came to be known as the nebular hypothesis , the theory that the solar system was formed by the condensation, through gradual cooling, of the gaseous atmosphere (the nebulae) surrounding the sun. Louis Menand, The Metaphysical Club , 2001 Researchers use this information to support the gateway drug theory — the hypothesis that using one intoxicating substance leads to future use of another. Jordy Byrd, The Pacific Northwest Inlander , 6 May 2015 Fox, the business and economics columnist for Time magazine, tells the story of the professors who enabled those abuses under the banner of the financial theory known as the efficient market hypothesis . Paul Krugman, The New York Times Book Review , 9 Aug. 2009

Incorrect Interpretations of "Theory"

Since this casual use does away with the distinctions upheld by the scientific community, hypothesis and theory are prone to being wrongly interpreted even when they are encountered in scientific contexts—or at least, contexts that allude to scientific study without making the critical distinction that scientists employ when weighing hypotheses and theories.

The most common occurrence is when theory is interpreted—and sometimes even gleefully seized upon—to mean something having less truth value than other scientific principles. (The word law applies to principles so firmly established that they are almost never questioned, such as the law of gravity.)

This mistake is one of projection: since we use theory in general use to mean something lightly speculated, then it's implied that scientists must be talking about the same level of uncertainty when they use theory to refer to their well-tested and reasoned principles.

The distinction has come to the forefront particularly on occasions when the content of science curricula in schools has been challenged—notably, when a school board in Georgia put stickers on textbooks stating that evolution was "a theory, not a fact, regarding the origin of living things." As Kenneth R. Miller, a cell biologist at Brown University, has said , a theory "doesn’t mean a hunch or a guess. A theory is a system of explanations that ties together a whole bunch of facts. It not only explains those facts, but predicts what you ought to find from other observations and experiments.”

While theories are never completely infallible, they form the basis of scientific reasoning because, as Miller said "to the best of our ability, we’ve tested them, and they’ve held up."

More Differences Explained

  • Epidemic vs. Pandemic
  • Diagnosis vs. Prognosis
  • Treatment vs. Cure

Word of the Day

See Definitions and Examples »

Get Word of the Day daily email!

Games & Quizzes

Play Quordle: Guess all four words in a limited number of tries.  Each of your guesses must be a real 5-letter word.

Commonly Confused

'canceled' or 'cancelled', is it 'home in' or 'hone in', the difference between 'race' and 'ethnicity', on 'biweekly' and 'bimonthly', 'insure' vs. 'ensure' vs. 'assure', grammar & usage, words commonly mispronounced, more commonly misspelled words, is 'irregardless' a real word, 8 grammar terms you used to know, but forgot, homophones, homographs, and homonyms, great big list of beautiful and useless words, vol. 3, even more words that sound like insults but aren't, the words of the week - mar. 1, 'blue moon,' 'wolf moon,' and other moons to look for throughout the year.

SEP home page

  • Table of Contents
  • Random Entry
  • Chronological
  • Editorial Information
  • About the SEP
  • Editorial Board
  • How to Cite the SEP
  • Special Characters
  • Advanced Tools
  • Support the SEP
  • PDFs for SEP Friends
  • Make a Donation
  • SEPIA for Libraries
  • Entry Contents

Bibliography

Academic tools.

  • Friends PDF Preview
  • Author and Citation Info
  • Back to Top

Analogy and Analogical Reasoning

An analogy is a comparison between two objects, or systems of objects, that highlights respects in which they are thought to be similar. Analogical reasoning is any type of thinking that relies upon an analogy. An analogical argument is an explicit representation of a form of analogical reasoning that cites accepted similarities between two systems to support the conclusion that some further similarity exists. In general (but not always), such arguments belong in the category of ampliative reasoning, since their conclusions do not follow with certainty but are only supported with varying degrees of strength. However, the proper characterization of analogical arguments is subject to debate (see §2.2 ).

Analogical reasoning is fundamental to human thought and, arguably, to some nonhuman animals as well. Historically, analogical reasoning has played an important, but sometimes mysterious, role in a wide range of problem-solving contexts. The explicit use of analogical arguments, since antiquity, has been a distinctive feature of scientific, philosophical and legal reasoning. This article focuses primarily on the nature, evaluation and justification of analogical arguments. Related topics include metaphor , models in science , and precedent and analogy in legal reasoning .

1. Introduction: the many roles of analogy

2.1 examples, 2.2 characterization, 2.3 plausibility, 2.4 analogical inference rules, 3.1 commonsense guidelines, 3.2 aristotle’s theory, 3.3 material criteria: hesse’s theory, 3.4 formal criteria: the structure-mapping theory, 3.5 other theories, 3.6 practice-based approaches, 4.1 deductive justification, 4.2 inductive justification, 4.3 a priori justification, 4.4 pragmatic justification, 5.1 analogy and confirmation, 5.2 conceptual change and theory development, online manuscript, related entries.

Analogies are widely recognized as playing an important heuristic role, as aids to discovery. They have been employed, in a wide variety of settings and with considerable success, to generate insight and to formulate possible solutions to problems. According to Joseph Priestley, a pioneer in chemistry and electricity,

analogy is our best guide in all philosophical investigations; and all discoveries, which were not made by mere accident, have been made by the help of it. (1769/1966: 14)

Priestley may be over-stating the case, but there is no doubt that analogies have suggested fruitful lines of inquiry in many fields. Because of their heuristic value, analogies and analogical reasoning have been a particular focus of AI research. Hájek (2018) examines analogy as a heuristic tool in philosophy.

Example 1 . Hydrodynamic analogies exploit mathematical similarities between the equations governing ideal fluid flow and torsional problems. To predict stresses in a planned structure, one can construct a fluid model, i.e., a system of pipes through which water passes (Timoshenko and Goodier 1970). Within the limits of idealization, such analogies allow us to make demonstrative inferences, for example, from a measured quantity in the fluid model to the analogous value in the torsional problem. In practice, there are numerous complications (Sterrett 2006).

At the other extreme, an analogical argument may provide very weak support for its conclusion, establishing no more than minimal plausibility. Consider:

Example 2 . Thomas Reid’s (1785) argument for the existence of life on other planets (Stebbing 1933; Mill 1843/1930; Robinson 1930; Copi 1961). Reid notes a number of similarities between Earth and the other planets in our solar system: all orbit and are illuminated by the sun; several have moons; all revolve on an axis. In consequence, he concludes, it is “not unreasonable to think, that those planets may, like our earth, be the habitation of various orders of living creatures” (1785: 24).

Such modesty is not uncommon. Often the point of an analogical argument is just to persuade people to take an idea seriously. For instance:

Example 3 . Darwin takes himself to be using an analogy between artificial and natural selection to argue for the plausibility of the latter:

Why may I not invent the hypothesis of Natural Selection (which from the analogy of domestic productions, and from what we know of the struggle of existence and of the variability of organic beings, is, in some very slight degree, in itself probable) and try whether this hypothesis of Natural Selection does not explain (as I think it does) a large number of facts…. ( Letter to Henslow , May 1860 in Darwin 1903)

Here it appears, by Darwin’s own admission, that his analogy is employed to show that the hypothesis is probable to some “slight degree” and thus merits further investigation. Some, however, reject this characterization of Darwin’s reasoning (Richards 1997; Gildenhuys 2004).

Sometimes analogical reasoning is the only available form of justification for a hypothesis. The method of ethnographic analogy is used to interpret

the nonobservable behaviour of the ancient inhabitants of an archaeological site (or ancient culture) based on the similarity of their artifacts to those used by living peoples. (Hunter and Whitten 1976: 147)

For example:

Example 4 . Shelley (1999, 2003) describes how ethnographic analogy was used to determine the probable significance of odd markings on the necks of Moche clay pots found in the Peruvian Andes. Contemporary potters in Peru use these marks (called sígnales ) to indicate ownership; the marks enable them to reclaim their work when several potters share a kiln or storage facility. Analogical reasoning may be the only avenue of inference to the past in such cases, though this point is subject to dispute (Gould and Watson 1982; Wylie 1982, 1985). Analogical reasoning may have similar significance for cosmological phenomena that are inaccessible due to limits on observation (Dardashti et al. 2017). See §5.1 for further discussion.

As philosophers and historians such as Kuhn (1996) have repeatedly pointed out, there is not always a clear separation between the two roles that we have identified, discovery and justification. Indeed, the two functions are blended in what we might call the programmatic (or paradigmatic ) role of analogy: over a period of time, an analogy can shape the development of a program of research. For example:

Example 5 . An ‘acoustical analogy’ was employed for many years by certain nineteenth-century physicists investigating spectral lines. Discrete spectra were thought to be

completely analogous to the acoustical situation, with atoms (and/or molecules) serving as oscillators originating or absorbing the vibrations in the manner of resonant tuning forks. (Maier 1981: 51)

Guided by this analogy, physicists looked for groups of spectral lines that exhibited frequency patterns characteristic of a harmonic oscillator. This analogy served not only to underwrite the plausibility of conjectures, but also to guide and limit discovery by pointing scientists in certain directions.

More generally, analogies can play an important programmatic role by guiding conceptual development (see §5.2 ). In some cases, a programmatic analogy culminates in the theoretical unification of two different areas of inquiry.

Example 6 . Descartes’s (1637/1954) correlation between geometry and algebra provided methods for systematically handling geometrical problems that had long been recognized as analogous. A very different relationship between analogy and discovery exists when a programmatic analogy breaks down, as was the ultimate fate of the acoustical analogy. That atomic spectra have an entirely different explanation became clear with the advent of quantum theory. In this case, novel discoveries emerged against background expectations shaped by the guiding analogy. There is a third possibility: an unproductive or misleading programmatic analogy may simply become entrenched and self-perpetuating as it leads us to “construct… data that conform to it” (Stepan 1996: 133). Arguably, the danger of this third possibility provides strong motivation for developing a critical account of analogical reasoning and analogical arguments.

Analogical cognition , which embraces all cognitive processes involved in discovering, constructing and using analogies, is broader than analogical reasoning (Hofstadter 2001; Hofstadter and Sander 2013). Understanding these processes is an important objective of current cognitive science research, and an objective that generates many questions. How do humans identify analogies? Do nonhuman animals use analogies in ways similar to humans? How do analogies and metaphors influence concept formation?

This entry, however, concentrates specifically on analogical arguments. Specifically, it focuses on three central epistemological questions:

  • What criteria should we use to evaluate analogical arguments?
  • What philosophical justification can be provided for analogical inferences?
  • How do analogical arguments fit into a broader inferential context (i.e., how do we combine them with other forms of inference), especially theoretical confirmation?

Following a preliminary discussion of the basic structure of analogical arguments, the entry reviews selected attempts to provide answers to these three questions. To find such answers would constitute an important first step towards understanding the nature of analogical reasoning. To isolate these questions, however, is to make the non-trivial assumption that there can be a theory of analogical arguments —an assumption which, as we shall see, is attacked in different ways by both philosophers and cognitive scientists.

2. Analogical arguments

Analogical arguments vary greatly in subject matter, strength and logical structure. In order to appreciate this variety, it is helpful to increase our stock of examples. First, a geometric example:

Example 7 (Rectangles and boxes). Suppose that you have established that of all rectangles with a fixed perimeter, the square has maximum area. By analogy, you conjecture that of all boxes with a fixed surface area, the cube has maximum volume.

Two examples from the history of science:

Example 8 (Morphine and meperidine). In 1934, the pharmacologist Schaumann was testing synthetic compounds for their anti-spasmodic effect. These drugs had a chemical structure similar to morphine. He observed that one of the compounds— meperidine , also known as Demerol —had a physical effect on mice that was previously observed only with morphine: it induced an S-shaped tail curvature. By analogy, he conjectured that the drug might also share morphine’s narcotic effects. Testing on rats, rabbits, dogs and eventually humans showed that meperidine, like morphine, was an effective pain-killer (Lembeck 1989: 11; Reynolds and Randall 1975: 273).

Example 9 (Priestley on electrostatic force). In 1769, Priestley suggested that the absence of electrical influence inside a hollow charged spherical shell was evidence that charges attract and repel with an inverse square force. He supported his hypothesis by appealing to the analogous situation of zero gravitational force inside a hollow shell of uniform density.

Finally, an example from legal reasoning:

Example 10 (Duty of reasonable care). In a much-cited case ( Donoghue v. Stevenson 1932 AC 562), the United Kingdom House of Lords found the manufacturer of a bottle of ginger beer liable for damages to a consumer who became ill as a result of a dead snail in the bottle. The court argued that the manufacturer had a duty to take “reasonable care” in creating a product that could foreseeably result in harm to the consumer in the absence of such care, and where the consumer had no possibility of intermediate examination. The principle articulated in this famous case was extended, by analogy, to allow recovery for harm against an engineering firm whose negligent repair work caused the collapse of a lift ( Haseldine v. CA Daw & Son Ltd. 1941 2 KB 343). By contrast, the principle was not applicable to a case where a workman was injured by a defective crane, since the workman had opportunity to examine the crane and was even aware of the defects ( Farr v. Butters Brothers & Co. 1932 2 KB 606).

What, if anything, do all of these examples have in common? We begin with a simple, quasi-formal characterization. Similar formulations are found in elementary critical thinking texts (e.g., Copi and Cohen 2005) and in the literature on argumentation theory (e.g., Govier 1999, Guarini 2004, Walton and Hyra 2018). An analogical argument has the following form:

  • \(S\) is similar to \(T\) in certain (known) respects.
  • \(S\) has some further feature \(Q\).
  • Therefore, \(T\) also has the feature \(Q\), or some feature \(Q^*\) similar to \(Q\).

(1) and (2) are premises. (3) is the conclusion of the argument. The argument form is ampliative ; the conclusion is not guaranteed to follow from the premises.

\(S\) and \(T\) are referred to as the source domain and target domain , respectively. A domain is a set of objects, properties, relations and functions, together with a set of accepted statements about those objects, properties, relations and functions. More formally, a domain consists of a set of objects and an interpreted set of statements about them. The statements need not belong to a first-order language, but to keep things simple, any formalizations employed here will be first-order. We use unstarred symbols \((a, P, R, f)\) to refer to items in the source domain and starred symbols \((a^*, P^*, R^*, f^*)\) to refer to corresponding items in the target domain. In Example 9 , the source domain items pertain to gravitation; the target items pertain to electrostatic attraction.

Formally, an analogy between \(S\) and \(T\) is a one-to-one mapping between objects, properties, relations and functions in \(S\) and those in \(T\). Not all of the items in \(S\) and \(T\) need to be placed in correspondence. Commonly, the analogy only identifies correspondences between a select set of items. In practice, we specify an analogy simply by indicating the most significant similarities (and sometimes differences).

We can improve on this preliminary characterization of the argument from analogy by introducing the tabular representation found in Hesse (1966). We place corresponding objects, properties, relations and propositions side-by-side in a table of two columns, one for each domain. For instance, Reid’s argument ( Example 2 ) can be represented as follows (using \(\Rightarrow\) for the analogical inference):

Hesse introduced useful terminology based on this tabular representation. The horizontal relations in an analogy are the relations of similarity (and difference) in the mapping between domains, while the vertical relations are those between the objects, relations and properties within each domain. The correspondence (similarity) between earth’s having a moon and Mars’ having moons is a horizontal relation; the causal relation between having a moon and supporting life is a vertical relation within the source domain (with the possibility of a distinct such relation existing in the target as well).

In an earlier discussion of analogy, Keynes (1921) introduced some terminology that is also helpful.

Positive analogy . Let \(P\) stand for a list of accepted propositions \(P_1 , \ldots ,P_n\) about the source domain \(S\). Suppose that the corresponding propositions \(P^*_1 , \ldots ,P^*_n\), abbreviated as \(P^*\), are all accepted as holding for the target domain \(T\), so that \(P\) and \(P^*\) represent accepted (or known) similarities. Then we refer to \(P\) as the positive analogy .

Negative analogy . Let \(A\) stand for a list of propositions \(A_1 , \ldots ,A_r\) accepted as holding in \(S\), and \(B^*\) for a list \(B_1^*, \ldots ,B_s^*\) of propositions holding in \(T\). Suppose that the analogous propositions \(A^* = A_1^*, \ldots ,A_r^*\) fail to hold in \(T\), and similarly the propositions \(B = B_1 , \ldots ,B_s\) fail to hold in \(S\), so that \(A, {\sim}A^*\) and \({\sim}B, B^*\) represent accepted (or known) differences. Then we refer to \(A\) and \(B\) as the negative analogy .

Neutral analogy . The neutral analogy consists of accepted propositions about \(S\) for which it is not known whether an analogue holds in \(T\).

Finally we have:

Hypothetical analogy . The hypothetical analogy is simply the proposition \(Q\) in the neutral analogy that is the focus of our attention.

These concepts allow us to provide a characterization for an individual analogical argument that is somewhat richer than the original one.

An analogical argument may thus be summarized:

It is plausible that \(Q^*\) holds in the target, because of certain known (or accepted) similarities with the source domain, despite certain known (or accepted) differences.

In order for this characterization to be meaningful, we need to say something about the meaning of ‘plausibly.’ To ensure broad applicability over analogical arguments that vary greatly in strength, we interpret plausibility rather liberally as meaning ‘with some degree of support’. In general, judgments of plausibility are made after a claim has been formulated, but prior to rigorous testing or proof. The next sub-section provides further discussion.

Note that this characterization is incomplete in a number of ways. The manner in which we list similarities and differences, the nature of the correspondences between domains: these things are left unspecified. Nor does this characterization accommodate reasoning with multiple analogies (i.e., multiple source domains), which is ubiquitous in legal reasoning and common elsewhere. To characterize the argument form more fully, however, is not possible without either taking a step towards a substantive theory of analogical reasoning or restricting attention to certain classes of analogical arguments.

Arguments by analogy are extensively discussed within argumentation theory. There is considerable debate about whether they constitute a species of deductive inference (Govier 1999; Waller 2001; Guarini 2004; Kraus 2015). Argumentation theorists also make use of tools such as speech act theory (Bermejo-Luque 2012), argumentation schemes and dialogue types (Macagno et al. 2017; Walton and Hyra 2018) to distinguish different types of analogical argument.

Arguments by analogy are also discussed in the vast literature on scientific models and model-based reasoning, following the lead of Hesse (1966). Bailer-Jones (2002) draws a helpful distinction between analogies and models. While “many models have their roots in an analogy” (2002: 113) and analogy “can act as a catalyst to aid modeling,” Bailer-Jones observes that “the aim of modeling has nothing intrinsically to do with analogy.” In brief, models are tools for prediction and explanation, whereas analogical arguments aim at establishing plausibility. An analogy is evaluated in terms of source-target similarity, while a model is evaluated on how successfully it “provides access to a phenomenon in that it interprets the available empirical data about the phenomenon.” If we broaden our perspective beyond analogical arguments , however, the connection between models and analogies is restored. Nersessian (2009), for instance, stresses the role of analog models in concept-formation and other cognitive processes.

To say that a hypothesis is plausible is to convey that it has epistemic support: we have some reason to believe it, even prior to testing. An assertion of plausibility within the context of an inquiry typically has pragmatic connotations as well: to say that a hypothesis is plausible suggests that we have some reason to investigate it further. For example, a mathematician working on a proof regards a conjecture as plausible if it “has some chances of success” (Polya 1954 (v. 2): 148). On both points, there is ambiguity as to whether an assertion of plausibility is categorical or a matter of degree. These observations point to the existence of two distinct conceptions of plausibility, probabilistic and modal , either of which may reflect the intended conclusion of an analogical argument.

On the probabilistic conception, plausibility is naturally identified with rational credence (rational subjective degree of belief) and is typically represented as a probability. A classic expression may be found in Mill’s analysis of the argument from analogy in A System of Logic :

There can be no doubt that every resemblance [not known to be irrelevant] affords some degree of probability, beyond what would otherwise exist, in favour of the conclusion. (Mill 1843/1930: 333)

In the terminology introduced in §2.2, Mill’s idea is that each element of the positive analogy boosts the probability of the conclusion. Contemporary ‘structure-mapping’ theories ( §3.4 ) employ a restricted version: each structural similarity between two domains contributes to the overall measure of similarity, and hence to the strength of the analogical argument.

On the alternative modal conception, ‘it is plausible that \(p\)’ is not a matter of degree. The meaning, roughly speaking, is that there are sufficient initial grounds for taking \(p\) seriously, i.e., for further investigation (subject to feasibility and interest). Informally: \(p\) passes an initial screening procedure. There is no assertion of degree. Instead, ‘It is plausible that’ may be regarded as an epistemic modal operator that aims to capture a notion, prima facie plausibility, that is somewhat stronger than ordinary epistemic possibility. The intent is to single out \(p\) from an undifferentiated mass of ideas that remain bare epistemic possibilities. To illustrate: in 1769, Priestley’s argument ( Example 9 ), if successful, would establish the prima facie plausibility of an inverse square law for electrostatic attraction. The set of epistemic possibilities—hypotheses about electrostatic attraction compatible with knowledge of the day—was much larger. Individual analogical arguments in mathematics (such as Example 7 ) are almost invariably directed towards prima facie plausibility.

The modal conception figures importantly in some discussions of analogical reasoning. The physicist N. R. Campbell (1957) writes:

But in order that a theory may be valuable it must … display an analogy. The propositions of the hypothesis must be analogous to some known laws…. (1957: 129)

Commenting on the role of analogy in Fourier’s theory of heat conduction, Campbell writes:

Some analogy is essential to it; for it is only this analogy which distinguishes the theory from the multitude of others… which might also be proposed to explain the same laws. (1957: 142)

The interesting notion here is that of a “valuable” theory. We may not agree with Campbell that the existence of analogy is “essential” for a novel theory to be “valuable.” But consider the weaker thesis that an acceptable analogy is sufficient to establish that a theory is “valuable”, or (to qualify still further) that an acceptable analogy provides defeasible grounds for taking the theory seriously. (Possible defeaters might include internal inconsistency, inconsistency with accepted theory, or the existence of a (clearly superior) rival analogical argument.) The point is that Campbell, following the lead of 19 th century philosopher-scientists such as Herschel and Whewell, thinks that analogies can establish this sort of prima facie plausibility. Snyder (2006) provides a detailed discussion of the latter two thinkers and their ideas about the role of analogies in science.

In general, analogical arguments may be directed at establishing either sort of plausibility for their conclusions; they can have a probabilistic use or a modal use. Examples 7 through 9 are best interpreted as supporting modal conclusions. In those arguments, an analogy is used to show that a conjecture is worth taking seriously. To insist on putting the conclusion in probabilistic terms distracts attention from the point of the argument. The conclusion might be modeled (by a Bayesian) as having a certain probability value because it is deemed prima facie plausible, but not vice versa. Example 2 , perhaps, might be regarded as directed primarily towards a probabilistic conclusion.

There should be connections between the two conceptions. Indeed, we might think that the same analogical argument can establish both prima facie plausibility and a degree of probability for a hypothesis. But it is difficult to translate between epistemic modal concepts and probabilities (Cohen 1980; Douven and Williamson 2006; Huber 2009; Spohn 2009, 2012). We cannot simply take the probabilistic notion as the primitive one. It seems wise to keep the two conceptions of plausibility separate.

Schema (4) is a template that represents all analogical arguments, good and bad. It is not an inference rule. Despite the confidence with which particular analogical arguments are advanced, nobody has ever formulated an acceptable rule, or set of rules, for valid analogical inferences. There is not even a plausible candidate. This situation is in marked contrast not only with deductive reasoning, but also with elementary forms of inductive reasoning, such as induction by enumeration.

Of course, it is difficult to show that no successful analogical inference rule will ever be proposed. But consider the following candidate, formulated using the concepts of schema (4) and taking us only a short step beyond that basic characterization.

Rule (5) is modeled on the straight rule for enumerative induction and inspired by Mill’s view of analogical inference, as described in §2.3. We use the generic phrase ‘degree of support’ in place of probability, since other factors besides the analogical argument may influence our probability assignment for \(Q^*\).

It is pretty clear that (5) is a non-starter. The main problem is that the rule justifies too much. The only substantive requirement introduced by (5) is that there be a nonempty positive analogy. Plainly, there are analogical arguments that satisfy this condition but establish no prima facie plausibility and no measure of support for their conclusions.

Here is a simple illustration. Achinstein (1964: 328) observes that there is a formal analogy between swans and line segments if we take the relation ‘has the same color as’ to correspond to ‘is congruent with’. Both relations are reflexive, symmetric, and transitive. Yet it would be absurd to find positive support from this analogy for the idea that we are likely to find congruent lines clustered in groups of two or more, just because swans of the same color are commonly found in groups. The positive analogy is antecedently known to be irrelevant to the hypothetical analogy. In such a case, the analogical inference should be utterly rejected. Yet rule (5) would wrongly assign non-zero degree of support.

To generalize the difficulty: not every similarity increases the probability of the conclusion and not every difference decreases it. Some similarities and differences are known to be (or accepted as being) utterly irrelevant and should have no influence whatsoever on our probability judgments. To be viable, rule (5) would need to be supplemented with considerations of relevance , which depend upon the subject matter, historical context and logical details particular to each analogical argument. To search for a simple rule of analogical inference thus appears futile.

Carnap and his followers (Carnap 1980; Kuipers 1988; Niiniluoto 1988; Maher 2000; Romeijn 2006) have formulated principles of analogy for inductive logic, using Carnapian \(\lambda \gamma\) rules. Generally, this body of work relates to “analogy by similarity”, rather than the type of analogical reasoning discussed here. Romeijn (2006) maintains that there is a relation between Carnap’s concept of analogy and analogical prediction. His approach is a hybrid of Carnap-style inductive rules and a Bayesian model. Such an approach would need to be generalized to handle the kinds of arguments described in §2.1 . It remains unclear that the Carnapian approach can provide a general rule for analogical inference.

Norton (2010, and 2018—see Other Internet Resources) has argued that the project of formalizing inductive reasoning in terms of one or more simple formal schemata is doomed. His criticisms seem especially apt when applied to analogical reasoning. He writes:

If analogical reasoning is required to conform only to a simple formal schema, the restriction is too permissive. Inferences are authorized that clearly should not pass muster… The natural response has been to develop more elaborate formal templates… The familiar difficulty is that these embellished schema never seem to be quite embellished enough; there always seems to be some part of the analysis that must be handled intuitively without guidance from strict formal rules. (2018: 1)

Norton takes the point one step further, in keeping with his “material theory” of inductive inference. He argues that there is no universal logical principle that “powers” analogical inference “by asserting that things that share some properties must share others.” Rather, each analogical inference is warranted by some local constellation of facts about the target system that he terms “the fact of analogy”. These local facts are to be determined and investigated on a case by case basis.

To embrace a purely formal approach to analogy and to abjure formalization entirely are two extremes in a spectrum of strategies. There are intermediate positions. Most recent analyses (both philosophical and computational) have been directed towards elucidating criteria and procedures, rather than formal rules, for reasoning by analogy. So long as these are not intended to provide a universal ‘logic’ of analogy, there is room for such criteria even if one accepts Norton’s basic point. The next section discusses some of these criteria and procedures.

3. Criteria for evaluating analogical arguments

Logicians and philosophers of science have identified ‘textbook-style’ general guidelines for evaluating analogical arguments (Mill 1843/1930; Keynes 1921; Robinson 1930; Stebbing 1933; Copi and Cohen 2005; Moore and Parker 1998; Woods, Irvine, and Walton 2004). Here are some of the most important ones:

These principles can be helpful, but are frequently too vague to provide much insight. How do we count similarities and differences in applying (G1) and (G2)? Why are the structural and causal analogies mentioned in (G5) and (G6) especially important, and which structural and causal features merit attention? More generally, in connection with the all-important (G7): how do we determine which similarities and differences are relevant to the conclusion? Furthermore, what are we to say about similarities and differences that have been omitted from an analogical argument but might still be relevant?

An additional problem is that the criteria can pull in different directions. To illustrate, consider Reid’s argument for life on other planets ( Example 2 ). Stebbing (1933) finds Reid’s argument “suggestive” and “not unplausible” because the conclusion is weak (G4), while Mill (1843/1930) appears to reject the argument on account of our vast ignorance of properties that might be relevant (G3).

There is a further problem that relates to the distinction just made (in §2.3 ) between two kinds of plausibility. Each of the above criteria apart from (G7) is expressed in terms of the strength of the argument, i.e., the degree of support for the conclusion. The criteria thus appear to presuppose the probabilistic interpretation of plausibility. The problem is that a great many analogical arguments aim to establish prima facie plausibility rather than any degree of probability. Most of the guidelines are not directly applicable to such arguments.

Aristotle sets the stage for all later theories of analogical reasoning. In his theoretical reflections on analogy and in his most judicious examples, we find a sober account that lays the foundation both for the commonsense guidelines noted above and for more sophisticated analyses.

Although Aristotle employs the term analogy ( analogia ) and discusses analogical predication , he never talks about analogical reasoning or analogical arguments per se . He does, however, identify two argument forms, the argument from example ( paradeigma ) and the argument from likeness ( homoiotes ), both closely related to what would we now recognize as an analogical argument.

The argument from example ( paradeigma ) is described in the Rhetoric and the Prior Analytics :

Enthymemes based upon example are those which proceed from one or more similar cases, arrive at a general proposition, and then argue deductively to a particular inference. ( Rhetoric 1402b15) Let \(A\) be evil, \(B\) making war against neighbours, \(C\) Athenians against Thebans, \(D\) Thebans against Phocians. If then we wish to prove that to fight with the Thebans is an evil, we must assume that to fight against neighbours is an evil. Conviction of this is obtained from similar cases, e.g., that the war against the Phocians was an evil to the Thebans. Since then to fight against neighbours is an evil, and to fight against the Thebans is to fight against neighbours, it is clear that to fight against the Thebans is an evil. ( Pr. An. 69a1)

Aristotle notes two differences between this argument form and induction (69a15ff.): it “does not draw its proof from all the particular cases” (i.e., it is not a “complete” induction), and it requires an additional (deductively valid) syllogism as the final step. The argument from example thus amounts to single-case induction followed by deductive inference. It has the following structure (using \(\supset\) for the conditional):

[a tree diagram where S is source domain and T is target domain. First node is P(S)&Q(S) in the lower left corner. It is connected by a dashed arrow to (x)(P(x) superset Q(x)) in the top middle which in turn connects by a solid arrow to P(T) and on the next line P(T) superset Q(T) in the lower right. It in turn is connected by a solid arrow to Q(T) below it.]

In the terminology of §2.2, \(P\) is the positive analogy and \(Q\) is the hypothetical analogy. In Aristotle’s example, \(S\) (the source) is war between Phocians and Thebans, \(T\) (the target) is war between Athenians and Thebans, \(P\) is war between neighbours, and \(Q\) is evil. The first inference (dashed arrow) is inductive; the second and third (solid arrows) are deductively valid.

The paradeigma has an interesting feature: it is amenable to an alternative analysis as a purely deductive argument form. Let us concentrate on Aristotle’s assertion, “we must assume that to fight against neighbours is an evil,” represented as \(\forall x(P(x) \supset Q(x))\). Instead of regarding this intermediate step as something reached by induction from a single case, we might instead regard it as a hidden presupposition. This transforms the paradeigma into a syllogistic argument with a missing or enthymematic premise, and our attention shifts to possible means for establishing that premise (with single-case induction as one such means). Construed in this way, Aristotle’s paradeigma argument foreshadows deductive analyses of analogical reasoning (see §4.1 ).

The argument from likeness ( homoiotes ) seems to be closer than the paradeigma to our contemporary understanding of analogical arguments. This argument form receives considerable attention in Topics I, 17 and 18 and again in VIII, 1. The most important passage is the following.

Try to secure admissions by means of likeness; for such admissions are plausible, and the universal involved is less patent; e.g. that as knowledge and ignorance of contraries is the same, so too perception of contraries is the same; or vice versa, that since the perception is the same, so is the knowledge also. This argument resembles induction, but is not the same thing; for in induction it is the universal whose admission is secured from the particulars, whereas in arguments from likeness, what is secured is not the universal under which all the like cases fall. ( Topics 156b10–17)

This passage occurs in a work that offers advice for framing dialectical arguments when confronting a somewhat skeptical interlocutor. In such situations, it is best not to make one’s argument depend upon securing agreement about any universal proposition. The argument from likeness is thus clearly distinct from the paradeigma , where the universal proposition plays an essential role as an intermediate step in the argument. The argument from likeness, though logically less straightforward than the paradeigma , is exactly the sort of analogical reasoning we want when we are unsure about underlying generalizations.

In Topics I 17, Aristotle states that any shared attribute contributes some degree of likeness. It is natural to ask when the degree of likeness between two things is sufficiently great to warrant inferring a further likeness. In other words, when does the argument from likeness succeed? Aristotle does not answer explicitly, but a clue is provided by the way he justifies particular arguments from likeness. As Lloyd (1966) has observed, Aristotle typically justifies such arguments by articulating a (sometimes vague) causal principle which governs the two phenomena being compared. For example, Aristotle explains the saltiness of the sea, by analogy with the saltiness of sweat, as a kind of residual earthy stuff exuded in natural processes such as heating. The common principle is this:

Everything that grows and is naturally generated always leaves a residue, like that of things burnt, consisting in this sort of earth. ( Mete 358a17)

From this method of justification, we might conjecture that Aristotle believes that the important similarities are those that enter into such general causal principles.

Summarizing, Aristotle’s theory provides us with four important and influential criteria for the evaluation of analogical arguments:

  • The strength of an analogy depends upon the number of similarities.
  • Similarity reduces to identical properties and relations.
  • Good analogies derive from underlying common causes or general laws.
  • A good analogical argument need not pre-suppose acquaintance with the underlying universal (generalization).

These four principles form the core of a common-sense model for evaluating analogical arguments (which is not to say that they are correct; indeed, the first three will shortly be called into question). The first, as we have seen, appears regularly in textbook discussions of analogy. The second is largely taken for granted, with important exceptions in computational models of analogy ( §3.4 ). Versions of the third are found in most sophisticated theories. The final point, which distinguishes the argument from likeness and the argument from example, is endorsed in many discussions of analogy (e.g., Quine and Ullian 1970).

A slight generalization of Aristotle’s first principle helps to prepare the way for discussion of later developments. As that principle suggests, Aristotle, in common with just about everyone else who has written about analogical reasoning, organizes his analysis of the argument form around overall similarity. In the terminology of section 2.2, horizontal relationships drive the reasoning: the greater the overall similarity of the two domains, the stronger the analogical argument . Hume makes the same point, though stated negatively, in his Dialogues Concerning Natural Religion :

Wherever you depart, in the least, from the similarity of the cases, you diminish proportionably the evidence; and may at last bring it to a very weak analogy, which is confessedly liable to error and uncertainty. (1779/1947: 144)

Most theories of analogy agree with Aristotle and Hume on this general point. Disagreement relates to the appropriate way of measuring overall similarity. Some theories assign greatest weight to material analogy , which refers to shared, and typically observable, features. Others give prominence to formal analogy , emphasizing high-level structural correspondence. The next two sub-sections discuss representative accounts that illustrate these two approaches.

Hesse (1966) offers a sharpened version of Aristotle’s theory, specifically focused on analogical arguments in the sciences. She formulates three requirements that an analogical argument must satisfy in order to be acceptable:

  • Requirement of material analogy . The horizontal relations must include similarities between observable properties.
  • Causal condition . The vertical relations must be causal relations “in some acceptable scientific sense” (1966: 87).
  • No-essential-difference condition . The essential properties and causal relations of the source domain must not have been shown to be part of the negative analogy.

3.3.1 Requirement of material analogy

For Hesse, an acceptable analogical argument must include “observable similarities” between domains, which she refers to as material analogy . Material analogy is contrasted with formal analogy . Two domains are formally analogous if both are “interpretations of the same formal theory” (1966: 68). Nomic isomorphism (Hempel 1965) is a special case in which the physical laws governing two systems have identical mathematical form. Heat and fluid flow exhibit nomic isomorphism. A second example is the analogy between the flow of electric current in a wire and fluid in a pipe. Ohm’s law

states that voltage difference along a wire equals current times a constant resistance. This has the same mathematical form as Poiseuille’s law (for ideal fluids):

which states that the pressure difference along a pipe equals the volumetric flow rate times a constant. Both of these systems can be represented by a common equation. While formal analogy is linked to common mathematical structure, it should not be limited to nomic isomorphism (Bartha 2010: 209). The idea of formal analogy generalizes to cases where there is a common mathematical structure between models for two systems. Bartha offers an even more liberal definition (2010: 195): “Two features are formally similar if they occupy corresponding positions in formally analogous theories. For example, pitch in the theory of sound corresponds to color in the theory of light.”

By contrast, material analogy consists of what Hesse calls “observable” or “pre-theoretic” similarities. These are horizontal relationships of similarity between properties of objects in the source and the target. Similarities between echoes (sound) and reflection (light), for instance, were recognized long before we had any detailed theories about these phenomena. Hesse (1966, 1988) regards such similarities as metaphorical relationships between the two domains and labels them “pre-theoretic” because they draw on personal and cultural experience. We have both material and formal analogies between sound and light, and it is significant for Hesse that the former are independent of the latter.

There are good reasons not to accept Hesse’s requirement of material analogy, construed in this narrow way. First, it is apparent that formal analogies are the starting point in many important inferences. That is certainly the case in mathematics, a field in which material analogy, in Hesse’s sense, plays no role at all. Analogical arguments based on formal analogy have also been extremely influential in physics (Steiner 1989, 1998).

In Norton’s broad sense, however, ‘material analogy’ simply refers to similarities rooted in factual knowledge of the source and target domains. With reference to this broader meaning, Hesse proposes two additional material criteria.

3.3.2 Causal condition

Hesse requires that the hypothetical analogy, the feature transferred to the target domain, be causally related to the positive analogy. In her words, the essential requirement for a good argument from analogy is “a tendency to co-occurrence”, i.e., a causal relationship. She states the requirement as follows:

The vertical relations in the model [source] are causal relations in some acceptable scientific sense, where there are no compelling a priori reasons for denying that causal relations of the same kind may hold between terms of the explanandum [target]. (1966: 87)

The causal condition rules out analogical arguments where there is no causal knowledge of the source domain. It derives support from the observation that many analogies do appear to involve a transfer of causal knowledge.

The causal condition is on the right track, but is arguably too restrictive. For example, it rules out analogical arguments in mathematics. Even if we limit attention to the empirical sciences, persuasive analogical arguments may be founded upon strong statistical correlation in the absence of any known causal connection. Consider ( Example 11 ) Benjamin Franklin’s prediction, in 1749, that pointed metal rods would attract lightning, by analogy with the way they attracted the “electrical fluid” in the laboratory:

Electrical fluid agrees with lightning in these particulars: 1. Giving light. 2. Colour of the light. 3. Crooked direction. 4. Swift motion. 5. Being conducted by metals. 6. Crack or noise in exploding. 7. Subsisting in water or ice. 8. Rending bodies it passes through. 9. Destroying animals. 10. Melting metals. 11. Firing inflammable substances. 12. Sulphureous smell.—The electrical fluid is attracted by points.—We do not know whether this property is in lightning.—But since they agree in all the particulars wherein we can already compare them, is it not probable they agree likewise in this? Let the experiment be made. ( Benjamin Franklin’s Experiments , 334)

Franklin’s hypothesis was based on a long list of properties common to the target (lightning) and source (electrical fluid in the laboratory). There was no known causal connection between the twelve “particulars” and the thirteenth property, but there was a strong correlation. Analogical arguments may be plausible even where there are no known causal relations.

3.3.3 No-essential-difference condition

Hesse’s final requirement is that the “essential properties and causal relations of the [source] have not been shown to be part of the negative analogy” (1966: 91). Hesse does not provide a definition of “essential,” but suggests that a property or relation is essential if it is “causally closely related to the known positive analogy.” For instance, an analogy with fluid flow was extremely influential in developing the theory of heat conduction. Once it was discovered that heat was not conserved, however, the analogy became unacceptable (according to Hesse) because conservation was so central to the theory of fluid flow.

This requirement, though once again on the right track, seems too restrictive. It can lead to the rejection of a good analogical argument. Consider the analogy between a two-dimensional rectangle and a three-dimensional box ( Example 7 ). Broadening Hesse’s notion, it seems that there are many ‘essential’ differences between rectangles and boxes. This does not mean that we should reject every analogy between rectangles and boxes out of hand. The problem derives from the fact that Hesse’s condition is applied to the analogy relation independently of the use to which that relation is put. What counts as essential should vary with the analogical argument. Absent an inferential context, it is impossible to evaluate the importance or ‘essentiality’ of similarities and differences.

Despite these weaknesses, Hesse’s ‘material’ criteria constitute a significant advance in our understanding of analogical reasoning. The causal condition and the no-essential-difference condition incorporate local factors, as urged by Norton, into the assessment of analogical arguments. These conditions, singly or taken together, imply that an analogical argument can fail to generate any support for its conclusion, even when there is a non-empty positive analogy. Hesse offers no theory about the ‘degree’ of analogical support. That makes her account one of the few that is oriented towards the modal, rather than probabilistic, use of analogical arguments ( §2.3 ).

Many people take the concept of model-theoretic isomorphism to set the standard for thinking about similarity and its role in analogical reasoning. They propose formal criteria for evaluating analogies, based on overall structural or syntactical similarity. Let us refer to theories oriented around such criteria as structuralist .

A number of leading computational models of analogy are structuralist. They are implemented in computer programs that begin with (or sometimes build) representations of the source and target domains, and then construct possible analogy mappings. Analogical inferences emerge as a consequence of identifying the ‘best mapping.’ In terms of criteria for analogical reasoning, there are two main ideas. First, the goodness of an analogical argument is based on the goodness of the associated analogy mapping . Second, the goodness of the analogy mapping is given by a metric that indicates how closely it approximates isomorphism.

The most influential structuralist theory has been Gentner’s structure-mapping theory, implemented in a program called the structure-mapping engine (SME). In its original form (Gentner 1983), the theory assesses analogies on purely structural grounds. Gentner asserts:

Analogies are about relations, rather than simple features. No matter what kind of knowledge (causal models, plans, stories, etc.), it is the structural properties (i.e., the interrelationships between the facts) that determine the content of an analogy. (Falkenhainer, Forbus, and Gentner 1989/90: 3)

In order to clarify this thesis, Gentner introduces a distinction between properties , or monadic predicates, and relations , which have multiple arguments. She further distinguishes among different orders of relations and functions, defined inductively (in terms of the order of the relata or arguments). The best mapping is determined by systematicity : the extent to which it places higher-order relations, and items that are nested in higher-order relations, in correspondence. Gentner’s Systematicity Principle states:

A predicate that belongs to a mappable system of mutually interconnecting relationships is more likely to be imported into the target than is an isolated predicate. (1983: 163)

A systematic analogy (one that places high-order relations and their components in correspondence) is better than a less systematic analogy. Hence, an analogical inference has a degree of plausibility that increases monotonically with the degree of systematicity of the associated analogy mapping. Gentner’s fundamental criterion for evaluating candidate analogies (and analogical inferences) thus depends solely upon the syntax of the given representations and not at all upon their content.

Later versions of the structure-mapping theory incorporate refinements (Forbus, Ferguson, and Gentner 1994; Forbus 2001; Forbus et al. 2007; Forbus et al. 2008; Forbus et al 2017). For example, the earliest version of the theory is vulnerable to worries about hand-coded representations of source and target domains. Gentner and her colleagues have attempted to solve this problem in later work that generates LISP representations from natural language text (see Turney 2008 for a different approach).

The most important challenges for the structure-mapping approach relate to the Systematicity Principle itself. Does the value of an analogy derive entirely, or even chiefly, from systematicity? There appear to be two main difficulties with this view. First: it is not always appropriate to give priority to systematic, high-level relational matches. Material criteria, and notably what Gentner refers to as “superficial feature matches,” can be extremely important in some types of analogical reasoning, such as ethnographic analogies which are based, to a considerable degree, on surface resemblances between artifacts. Second and more significantly: systematicity seems to be at best a fallible marker for good analogies rather than the essence of good analogical reasoning.

Greater systematicity is neither necessary nor sufficient for a more plausible analogical inference. It is obvious that increased systematicity is not sufficient for increased plausibility. An implausible analogy can be represented in a form that exhibits a high degree of structural parallelism. High-order relations can come cheap, as we saw with Achinstein’s “swan” example ( §2.4 ).

More pointedly, increased systematicity is not necessary for greater plausibility. Indeed, in causal analogies, it may even weaken the inference. That is because systematicity takes no account of the type of causal relevance, positive or negative. (McKay 1993) notes that microbes have been found in frozen lakes in Antarctica; by analogy, simple life forms might exist on Mars. Freezing temperatures are preventive or counteracting causes; they are negatively relevant to the existence of life. The climate of Mars was probably more favorable to life 3.5 billion years ago than it is today, because temperatures were warmer. Yet the analogy between Antarctica and present-day Mars is more systematic than the analogy between Antarctica and ancient Mars. According to the Systematicity Principle , the analogy with Antarctica provides stronger support for life on Mars today than it does for life on ancient Mars.

The point of this example is that increased systematicity does not always increase plausibility, and reduced systematicity does not always decrease it (see Lee and Holyoak 2008). The more general point is that systematicity can be misleading, unless we take into account the nature of the relationships between various factors and the hypothetical analogy. Systematicity does not magically produce or explain the plausibility of an analogical argument. When we reason by analogy, we must determine which features of both domains are relevant and how they relate to the analogical conclusion. There is no short-cut via syntax.

Schlimm (2008) offers an entirely different critique of the structure-mapping theory from the perspective of analogical reasoning in mathematics—a domain where one might expect a formal approach such as structure mapping to perform well. Schlimm introduces a simple distinction: a domain is object-rich if the number of objects is greater than the number of relations (and properties), and relation-rich otherwise. Proponents of the structure-mapping theory typically focus on relation-rich examples (such as the analogy between the solar system and the atom). By contrast, analogies in mathematics typically involve domains with an enormous number of objects (like the real numbers), but relatively few relations and functions (addition, multiplication, less-than).

Schlimm provides an example of an analogical reasoning problem in group theory that involves a single relation in each domain. In this case, attaining maximal systematicity is trivial. The difficulty is that, compatible with maximal systematicity, there are different ways in which the objects might be placed in correspondence. The structure-mapping theory appears to yield the wrong inference. We might put the general point as follows: in object-rich domains, systematicity ceases to be a reliable guide to plausible analogical inference.

3.5.1 Connectionist models

During the past thirty-five years, cognitive scientists have conducted extensive research on analogy. Gentner’s SME is just one of many computational theories, implemented in programs that construct and use analogies. Three helpful anthologies that span this period are Helman 1988; Gentner, Holyoak, and Kokinov 2001; and Kokinov, Holyoak, and Gentner 2009.

One predominant objective of this research has been to model the cognitive processes involved in using analogies. Early models tended to be oriented towards “understanding the basic constraints that govern human analogical thinking” (Hummel and Holyoak 1997: 458). Recent connectionist models have been directed towards uncovering the psychological mechanisms that come into play when we use analogies: retrieval of a relevant source domain, analogical mapping across domains, and transfer of information and learning of new categories or schemas.

In some cases, such as the structure-mapping theory (§3.4), this research overlaps directly with the normative questions that are the focus of this entry; indeed, Gentner’s Systematicity Principle may be interpreted normatively. In other cases, we might view the projects as displacing those traditional normative questions with up-to-date, computational forms of naturalized epistemology . Two approaches are singled out here because both raise important challenges to the very idea of finding sharp answers to those questions, and both suggest that connectionist models offer a more fruitful approach to understanding analogical reasoning.

The first is the constraint-satisfaction model (also known as the multiconstraint theory ), developed by Holyoak and Thagard (1989, 1995). Like Gentner, Holyoak and Thagard regard the heart of analogical reasoning as analogy mapping , and they stress the importance of systematicity, which they refer to as a structural constraint. Unlike Gentner, they acknowledge two additional types of constraints. Pragmatic constraints take into account the goals and purposes of the agent, recognizing that “the purpose will guide selection” of relevant similarities. Semantic constraints represent estimates of the degree to which people regard source and target items as being alike, rather like Hesse’s “pre-theoretic” similarities.

The novelty of the multiconstraint theory is that these structural , semantic and pragmatic constraints are implemented not as rigid rules, but rather as ‘pressures’ supporting or inhibiting potential pairwise correspondences. The theory is implemented in a connectionist program called ACME (Analogical Constraint Mapping Engine), which assigns an initial activation value to each possible pairing between elements in the source and target domains (based on semantic and pragmatic constraints), and then runs through cycles that update the activation values based on overall coherence (structural constraints). The best global analogy mapping emerges under the pressure of these constraints. Subsequent connectionist models, such as Hummel and Holyoak’s LISA program (1997, 2003), have made significant advances and hold promise for offering a more complete theory of analogical reasoning.

The second example is Hofstadter and Mitchell’s Copycat program (Hofstadter 1995; Mitchell 1993). The program is “designed to discover insightful analogies, and to do so in a psychologically realistic way” (Hofstadter 1995: 205). Copycat operates in the domain of letter-strings. The program handles the following type of problem:

Suppose the letter-string abc were changed to abd ; how would you change the letter-string ijk in “the same way”?

Most people would answer ijl , since it is natural to think that abc was changed to abd by the “transformation rule”: replace the rightmost letter with its successor. Alternative answers are possible, but do not agree with most people’s sense of what counts as the natural analogy.

Hofstadter and Mitchell believe that analogy-making is in large part about the perception of novel patterns, and that such perception requires concepts with “fluid” boundaries. Genuine analogy-making involves “slippage” of concepts. The Copycat program combines a set of core concepts pertaining to letter-sequences ( successor , leftmost and so forth) with probabilistic “halos” that link distinct concepts dynamically. Orderly structures emerge out of random low-level processes and the program produces plausible solutions. Copycat thus shows that analogy-making can be modeled as a process akin to perception, even if the program employs mechanisms distinct from those in human perception.

The multiconstraint theory and Copycat share the idea that analogical cognition involves cognitive processes that operate below the level of abstract reasoning. Both computational models—to the extent that they are capable of performing successful analogical reasoning—challenge the idea that a successful model of analogical reasoning must take the form of a set of quasi-logical criteria. Efforts to develop a quasi-logical theory of analogical reasoning, it might be argued, have failed. In place of faulty inference schemes such as those described earlier ( §2.2 , §2.4 ), computational models substitute procedures that can be judged on their performance rather than on traditional philosophical standards.

In response to this argument, we should recognize the value of the connectionist models while acknowledging that we still need a theory that offers normative principles for evaluating analogical arguments. In the first place, even if the construction and recognition of analogies are largely a matter of perception, this does not eliminate the need for subsequent critical evaluation of analogical inferences. Second and more importantly, we need to look not just at the construction of analogy mappings but at the ways in which individual analogical arguments are debated in fields such as mathematics, physics, philosophy and the law. These high-level debates require reasoning that bears little resemblance to the computational processes of ACME or Copycat. (Ashley’s HYPO (Ashley 1990) is one example of a non-connectionist program that focuses on this aspect of analogical reasoning.) There is, accordingly, room for both computational and traditional philosophical models of analogical reasoning.

3.5.2 Articulation model

Most prominent theories of analogy, philosophical and computational, are based on overall similarity between source and target domains—defined in terms of some favoured subset of Hesse’s horizontal relations (see §2.2 ). Aristotle and Mill, whose approach is echoed in textbook discussions, suggest counting similarities. Hesse’s theory ( §3.3 ) favours “pre-theoretic” correspondences. The structure-mapping theory and its successors ( §3.4 ) look to systematicity, i.e., to correspondences involving complex, high-level networks of relations. In each of these approaches, the problem is twofold: overall similarity is not a reliable guide to plausibility, and it fails to explain the plausibility of any analogical argument.

Bartha’s articulation model (2010) proposes a different approach, beginning not with horizontal relations, but rather with a classification of analogical arguments on the basis of the vertical relations within each domain. The fundamental idea is that a good analogical argument must satisfy two conditions:

Prior Association . There must be a clear connection, in the source domain, between the known similarities (the positive analogy) and the further similarity that is projected to hold in the target domain (the hypothetical analogy). This relationship determines which features of the source are critical to the analogical inference.

Potential for Generalization . There must be reason to think that the same kind of connection could obtain in the target domain. More pointedly: there must be no critical disanalogy between the domains.

The first order of business is to make the prior association explicit. The standards of explicitness vary depending on the nature of this association (causal relation, mathematical proof, functional relationship, and so forth). The two general principles are fleshed out via a set of subordinate models that allow us to identify critical features and hence critical disanalogies.

To see how this works, consider Example 7 (Rectangles and boxes). In this analogical argument, the source domain is two-dimensional geometry: we know that of all rectangles with a fixed perimeter, the square has maximum area. The target domain is three-dimensional geometry: by analogy, we conjecture that of all boxes with a fixed surface area, the cube has maximum volume. This argument should be evaluated not by counting similarities, looking to pre-theoretic resemblances between rectangles and boxes, or constructing connectionist representations of the domains and computing a systematicity score for possible mappings. Instead, we should begin with a precise articulation of the prior association in the source domain, which amounts to a specific proof for the result about rectangles. We should then identify, relative to that proof, the critical features of the source domain: namely, the concepts and assumptions used in the proof. Finally, we should assess the potential for generalization: whether, in the three-dimensional setting, those critical features are known to lack analogues in the target domain. The articulation model is meant to reflect the conversations that can and do take place between an advocate and a critic of an analogical argument.

3.6.1 Norton’s material theory of analogy

As noted in §2.4 , Norton rejects analogical inference rules. But even if we agree with Norton on this point, we might still be interested in having an account that gives us guidelines for evaluating analogical arguments. How does Norton’s approach fare on this score?

According to Norton, each analogical argument is warranted by local facts that must be investigated and justified empirically. First, there is “the fact of the analogy”: in practice, a low-level uniformity that embraces both the source and target systems. Second, there are additional factual properties of the target system which, when taken together with the uniformity, warrant the analogical inference. Consider Galileo’s famous inference ( Example 12 ) that there are mountains on the moon (Galileo 1610). Through his newly invented telescope, Galileo observed points of light on the moon ahead of the advancing edge of sunlight. Noting that the same thing happens on earth when sunlight strikes the mountains, he concluded that there must be mountains on the moon and even provided a reasonable estimate of their height. In this example, Norton tells us, the the fact of the analogy is that shadows and other optical phenomena are generated in the same way on the earth and on the moon; the additional fact about the target is the existence of points of light ahead of the advancing edge of sunlight on the moon.

What are the implications of Norton’s material theory when it comes to evaluating analogical arguments? The fact of the analogy is a local uniformity that powers the inference. Norton’s theory works well when such a uniformity is patent or naturally inferred. It doesn’t work well when the uniformity is itself the target (rather than the driver ) of the inference. That happens with explanatory analogies such as Example 5 (the Acoustical Analogy ), and mathematical analogies such as Example 7 ( Rectangles and Boxes ). Similarly, the theory doesn’t work well when the underlying uniformity is unclear, as in Example 2 ( Life on other Planets ), Example 4 ( Clay Pots ), and many other cases. In short, if Norton’s theory is accepted, then for most analogical arguments there are no useful evaluation criteria.

3.6.2 Field-specific criteria

For those who sympathize with Norton’s skepticism about universal inductive schemes and theories of analogical reasoning, yet recognize that his approach may be too local, an appealing strategy is to move up one level. We can aim for field-specific “working logics” (Toulmin 1958; Wylie and Chapman 2016; Reiss 2015). This approach has been adopted by philosophers of archaeology, evolutionary biology and other historical sciences (Wylie and Chapman 2016; Currie 2013; Currie 2016; Currie 2018). In place of schemas, we find ‘toolkits’, i.e., lists of criteria for evaluating analogical reasoning.

For example, Currie (2016) explores in detail the use of ethnographic analogy ( Example 13 ) between shamanistic motifs used by the contemporary San people and similar motifs in ancient rock art, found both among ancestors of the San (direct historical analogy) and in European rock art (indirect historical analogy). Analogical arguments support the hypothesis that in each of these cultures, rock art symbolizes hallucinogenic experiences. Currie examines criteria that focus on assumptions about stability of cultural traits and environment-culture relationships. Currie (2016, 2018) and Wylie (Wylie and Chapman 2016) also stress the importance of robustness reasoning that combines analogical arguments of moderate strength with other forms of evidence to yield strong conclusions.

Practice-based approaches can thus yield specific guidelines unlikely to be matched by any general theory of analogical reasoning. One caveat is worth mentioning. Field-specific criteria for ethnographic analogy are elicited against a background of decades of methodological controversy (Wylie and Chapman 2016). Critics and defenders of ethnographic analogy have appealed to general models of scientific method (e.g., hypothetico-deductive method or Bayesian confirmation). To advance the methodological debate, practice-based approaches must either make connections to these general models or explain why the lack of any such connection is unproblematic.

3.6.3 Formal analogies in physics

Close attention to analogical arguments in practice can also provide valuable challenges to general ideas about analogical inference. In an interesting discussion, Steiner (1989, 1998) suggests that many of the analogies that played a major role in early twentieth-century physics count as “Pythagorean.” The term is meant to connote mathematical mysticism: a “Pythagorean” analogy is a purely formal analogy, one founded on mathematical similarities that have no known physical basis at the time it is proposed. One example is Schrödinger’s use of analogy ( Example 14 ) to “guess” the form of the relativistic wave equation. In Steiner’s view, Schrödinger’s reasoning relies upon manipulations and substitutions based on purely mathematical analogies. Steiner argues that the success, and even the plausibility, of such analogies “evokes, or should evoke, puzzlement” (1989: 454). Both Hesse (1966) and Bartha (2010) reject the idea that a purely formal analogy, with no physical significance, can support a plausible analogical inference in physics. Thus, Steiner’s arguments provide a serious challenge.

Bartha (2010) suggests a response: we can decompose Steiner’s examples into two or more steps, and then establish that at least one step does, in fact, have a physical basis. Fraser (forthcoming), however, offers a counterexample that supports Steiner’s position. Complex analogies between classical statistical mechanics (CSM) and quantum field theory (QFT) have played a crucial role in the development and application of renormalization group (RG) methods in both theories ( Example 15 ). Fraser notes substantial physical disanalogies between CSM and QFT, and concludes that the reasoning is based entirely on formal analogies.

4. Philosophical foundations for analogical reasoning

What philosophical basis can be provided for reasoning by analogy? What justification can be given for the claim that analogical arguments deliver plausible conclusions? There have been several ideas for answering this question. One natural strategy assimilates analogical reasoning to some other well-understood argument pattern, a form of deductive or inductive reasoning ( §4.1 , §4.2 ). A few philosophers have explored the possibility of a priori justification ( §4.3 ). A pragmatic justification may be available for practical applications of analogy, notably in legal reasoning ( §4.4 ).

Any attempt to provide a general justification for analogical reasoning faces a basic dilemma. The demands of generality require a high-level formulation of the problem and hence an abstract characterization of analogical arguments, such as schema (4). On the other hand, as noted previously, many analogical arguments that conform to schema (4) are bad arguments. So a general justification of analogical reasoning cannot provide support for all arguments that conform to (4), on pain of proving too much. Instead, it must first specify a subset of putatively ‘good’ analogical arguments, and link the general justification to this specified subset. The problem of justification is linked to the problem of characterizing good analogical arguments . This difficulty afflicts some of the strategies described in this section.

Analogical reasoning may be cast in a deductive mold. If successful, this strategy neatly solves the problem of justification. A valid deductive argument is as good as it gets.

An early version of the deductivist approach is exemplified by Aristotle’s treatment of the argument from example ( §3.2 ), the paradeigma . On this analysis, an analogical argument between source domain \(S\) and target \(T\) begins with the assumption of positive analogy \(P(S)\) and \(P(T)\), as well as the additional information \(Q(S)\). It proceeds via the generalization \(\forall x(P(x) \supset Q(x))\) to the conclusion: \(Q(T)\). Provided we can treat that intermediate generalization as an independent premise, we have a deductively valid argument. Notice, though, that the existence of the generalization renders the analogy irrelevant. We can derive \(Q(T)\) from the generalization and \(P(T)\), without any knowledge of the source domain. The literature on analogy in argumentation theory ( §2.2 ) offers further perspectives on this type of analysis, and on the question of whether analogical arguments are properly characterized as deductive.

Some recent analyses follow Aristotle in treating analogical arguments as reliant upon extra (sometimes tacit) premises, typically drawn from background knowledge, that convert the inference into a deductively valid argument––but without making the source domain irrelevant. Davies and Russell introduce a version that relies upon what they call determination rules (Russell 1986; Davies and Russell 1987; Davies 1988). Suppose that \(Q\) and \(P_1 , \ldots ,P_m\) are variables, and we have background knowledge that the value of \(Q\) is determined by the values of \(P_1 , \ldots ,P_m\). In the simplest case, where \(m = 1\) and both \(P\) and \(Q\) are binary Boolean variables, this reduces to

i.e., whether or not \(P\) holds determines whether or not \(Q\) holds. More generally, the form of a determination rule is

i.e., \(Q\) is a function of \(P_1,\ldots\), \(P_m\). If we assume such a rule as part of our background knowledge, then an analogical argument with conclusion \(Q(T)\) is deductively valid. More precisely, and allowing for the case where \(Q\) is not a binary variable: if we have such a rule, and also premises stating that the source \(S\) agrees with the target \(T\) on all of the values \(P_i\), then we may validly infer that \(Q(T) = Q(S)\).

The “determination rule” analysis provides a clear and simple justification for analogical reasoning. Note that, in contrast to the Aristotelian analysis via the generalization \(\forall x(P(x) \supset Q(x))\), a determination rule does not trivialize the analogical argument. Only by combining the rule with information about the source domain can we derive the value of \(Q(T)\). To illustrate by adapting one of the examples given by Russell and Davies ( Example 16 ), let’s suppose that the value \((Q)\) of a used car (relative to a particular buyer) is determined by its year, make, mileage, condition, color and accident history (the variables \(P_i)\). It doesn’t matter if one or more of these factors are redundant or irrelevant. Provided two cars are indistinguishable on each of these points, they will have the same value. Knowledge of the source domain is necessary; we can’t derive the value of the second car from the determination rule alone. Weitzenfeld (1984) proposes a variant of this approach, advancing the slightly more general thesis that analogical arguments are deductive arguments with a missing (enthymematic) premise that amounts to a determination rule.

Do determination rules give us a solution to the problem of providing a justification for analogical arguments? In general: no. Analogies are commonly applied to problems such as Example 8 ( morphine and meperidine ), where we are not even aware of all relevant factors, let alone in possession of a determination rule. Medical researchers conduct drug tests on animals without knowing all attributes that might be relevant to the effects of the drug. Indeed, one of the main objectives of such testing is to guard against reactions unanticipated by theory. On the “determination rule” analysis, we must either limit the scope of such arguments to cases where we have a well-supported determination rule, or focus attention on formulating and justifying an appropriate determination rule. For cases such as animal testing, neither option seems realistic.

Recasting analogy as a deductive argument may help to bring out background assumptions, but it makes little headway with the problem of justification. That problem re-appears as the need to state and establish the plausibility of a determination rule, and that is at least as difficult as justifying the original analogical argument.

Some philosophers have attempted to portray, and justify, analogical reasoning in terms of some well-understood inductive argument pattern. There have been three moderately popular versions of this strategy. The first treats analogical reasoning as generalization from a single case. The second treats it as a kind of sampling argument. The third recognizes the argument from analogy as a distinctive form, but treats past successes as evidence for future success.

4.2.1 Single-case induction

Let’s reconsider Aristotle’s argument from example or paradeigma ( §3.2 ), but this time regard the generalization as justified via induction from a single case (the source domain). Can such a simple analysis of analogical arguments succeed? In general: no.

A single instance can sometimes lead to a justified generalization. Cartwright (1992) argues that we can sometimes generalize from a single careful experiment, “where we have sufficient control of the materials and our knowledge of the requisite background assumptions is secure” (51). Cartwright thinks that we can do this, for example, in experiments with compounds that have stable “Aristotelian natures.” In a similar spirit, Quine (1969) maintains that we can have instantial confirmation when dealing with natural kinds.

Even if we accept that there are such cases, the objection to understanding all analogical arguments as single-case induction is obvious: the view is simply too restrictive. Most analogical arguments will not meet the requisite conditions. We may not know that we are dealing with a natural kind or Aristotelian nature when we make the analogical argument. We may not know which properties are essential. An insistence on the ‘single-case induction’ analysis of analogical reasoning is likely to lead to skepticism (Agassi 1964, 1988).

Interpreting the argument from analogy as single-case induction is also counter-productive in another way. The simplistic analysis does nothing to advance the search for criteria that help us to distinguish between relevant and irrelevant similarities, and hence between good and bad analogical arguments.

4.2.2 Sampling arguments

On the sampling conception of analogical arguments, acknowledged similarities between two domains are treated as statistically relevant evidence for further similarities. The simplest version of the sampling argument is due to Mill (1843/1930). An argument from analogy, he writes, is “a competition between the known points of agreement and the known points of difference.” Agreement of \(A\) and \(B\) in 9 out of 10 properties implies a probability of 0.9 that \(B\) will possess any other property of \(A\): “we can reasonably expect resemblance in the same proportion” (367). His only restriction has to do with sample size: we must be relatively knowledgeable about both \(A\) and \(B\). Mill saw no difficulty in using analogical reasoning to infer characteristics of newly discovered species of plants or animals, given our extensive knowledge of botany and zoology. But if the extent of unascertained properties of \(A\) and \(B\) is large, similarity in a small sample would not be a reliable guide; hence, Mill’s dismissal of Reid’s argument about life on other planets ( Example 2 ).

The sampling argument is presented in more explicit mathematical form by Harrod (1956). The key idea is that the known properties of \(S\) (the source domain) may be considered a random sample of all \(S\)’s properties—random, that is, with respect to the attribute of also belonging to \(T\) (the target domain). If the majority of known properties that belong to \(S\) also belong to \(T\), then we should expect most other properties of \(S\) to belong to \(T\), for it is unlikely that we would have come to know just the common properties. In effect, Harrod proposes a binomial distribution, modeling ‘random selection’ of properties on random selection of balls from an urn.

There are grave difficulties with Harrod’s and Mill’s analyses. One obvious difficulty is the counting problem : the ‘population’ of properties is poorly defined. How are we to count similarities and differences? The ratio of shared to total known properties varies dramatically according to how we do this. A second serious difficulty is the problem of bias : we cannot justify the assumption that the sample of known features is random. In the case of the urn, the selection process is arranged so that the result of each choice is not influenced by the agent’s intentions or purposes, or by prior choices. By contrast, the presentation of an analogical argument is always partisan. Bias enters into the initial representation of similarities and differences: an advocate of the argument will highlight similarities, while a critic will play up differences. The paradigm of repeated selection from an urn seems totally inappropriate. Additional variations of the sampling approach have been developed (e.g., Russell 1988), but ultimately these versions also fail to solve either the counting problem or the problem of bias.

4.2.3 Argument from past success

Section 3.6 discussed Steiner’s view that appeal to ‘Pythagorean’ analogies in physics “evokes, or should evoke, puzzlement” (1989: 454). Liston (2000) offers a possible response: physicists are entitled to use Pythagorean analogies on the basis of induction from their past success:

[The scientist] can admit that no one knows how [Pythagorean] reasoning works and argue that the very fact that similar strategies have worked well in the past is already reason enough to continue pursuing them hoping for success in the present instance. (200)

Setting aside familiar worries about arguments from success, the real problem here is to determine what counts as a similar strategy. In essence, that amounts to isolating the features of successful Pythagorean analogies. As we have seen (§2.4), nobody has yet provided a satisfactory scheme that characterizes successful analogical arguments, let alone successful Pythagorean analogical arguments.

An a priori approach traces the validity of a pattern of analogical reasoning, or of a particular analogical argument, to some broad and fundamental principle. Three such approaches will be outlined here.

The first is due to Keynes (1921). Keynes appeals to his famous Principle of the Limitation of Independent Variety, which he articulates as follows:

Armed with this Principle and some additional assumptions, Keynes is able to show that in cases where there is no negative analogy , knowledge of the positive analogy increases the (logical) probability of the conclusion. If there is a non-trivial negative analogy, however, then the probability of the conclusion remains unchanged, as was pointed out by Hesse (1966). Those familiar with Carnap’s theory of logical probability will recognize that in setting up his framework, Keynes settled on a measure that permits no learning from experience.

Hesse offers a refinement of Keynes’s strategy, once again along Carnapian lines. In her (1974), she proposes what she calls the Clustering Postulate : the assumption that our epistemic probability function has a built-in bias towards generalization. The objections to such postulates of uniformity are well-known (see Salmon 1967), but even if we waive them, her argument fails. The main objection here—which also applies to Keynes—is that a purely syntactic axiom such as the Clustering Postulate fails to discriminate between analogical arguments that are good and those that are clearly without value (according to Hesse’s own material criteria, for example).

A different a priori strategy, proposed by Bartha (2010), limits the scope of justification to analogical arguments that satisfy tentative criteria for ‘good’ analogical reasoning. The criteria are those specified by the articulation model ( §3.5 ). In simplified form, they require the existence of non-trivial positive analogy and no known critical disanalogy. The scope of Bartha’s argument is also limited to analogical arguments directed at establishing prima facie plausibility, rather than degree of probability.

Bartha’s argument rests on a principle of symmetry reasoning articulated by van Fraassen (1989: 236): “problems which are essentially the same must receive essentially the same solution.” A modal extension of this principle runs roughly as follows: if problems might be essentially the same, then they might have essentially the same solution. There are two modalities here. Bartha argues that satisfaction of the criteria of the articulation model is sufficient to establish the modality in the antecedent, i.e., that the source and target domains ‘might be essentially the same’ in relevant respects. He further suggests that prima facie plausibility provides a reasonable reading of the modality in the consequent, i.e., that the problems in the two domains ‘might have essentially the same solution.’ To call a hypothesis prima facie plausible is to elevate it to the point where it merits investigation, since it might be correct.

The argument is vulnerable to two sorts of concerns. First, there are questions about the interpretation of the symmetry principle. Second, there is a residual worry that this justification, like all the others, proves too much. The articulation model may be too vague or too permissive.

Arguably, the most promising available defense of analogical reasoning may be found in its application to case law (see Precedent and Analogy in Legal Reasoning ). Judicial decisions are based on the verdicts and reasoning that have governed relevantly similar cases, according to the doctrine of stare decisis (Levi 1949; Llewellyn 1960; Cross and Harris 1991; Sunstein 1993). Individual decisions by a court are binding on that court and lower courts; judges are obligated to decide future cases ‘in the same way.’ That is, the reasoning applied in an individual decision, referred to as the ratio decidendi , must be applied to similar future cases (see Example 10 ). In practice, of course, the situation is extremely complex. No two cases are identical. The ratio must be understood in the context of the facts of the original case, and there is considerable room for debate about its generality and its applicability to future cases. If a consensus emerges that a past case was wrongly decided, later judgments will distinguish it from new cases, effectively restricting the scope of the ratio to the original case.

The practice of following precedent can be justified by two main practical considerations. First, and above all, the practice is conservative : it provides a relatively stable basis for replicable decisions. People need to be able to predict the actions of the courts and formulate plans accordingly. Stare decisis serves as a check against arbitrary judicial decisions. Second, the practice is still reasonably progressive : it allows for the gradual evolution of the law. Careful judges distinguish bad decisions; new values and a new consensus can emerge in a series of decisions over time.

In theory, then, stare decisis strikes a healthy balance between conservative and progressive social values. This justification is pragmatic. It pre-supposes a common set of social values, and links the use of analogical reasoning to optimal promotion of those values. Notice also that justification occurs at the level of the practice in general; individual analogical arguments sometimes go astray. A full examination of the nature and foundations for stare decisis is beyond the scope of this entry, but it is worth asking the question: might it be possible to generalize the justification for stare decisis ? Is a parallel pragmatic justification available for analogical arguments in general?

Bartha (2010) offers a preliminary attempt to provide such a justification by shifting from social values to epistemic values. The general idea is that reasoning by analogy is especially well suited to the attainment of a common set of epistemic goals or values. In simple terms, analogical reasoning—when it conforms to certain criteria—achieves an excellent (perhaps optimal) balance between the competing demands of stability and innovation. It supports both conservative epistemic values, such as simplicity and coherence with existing belief, and progressive epistemic values, such as fruitfulness and theoretical unification (McMullin (1993) provides a classic list).

5. Beyond analogical arguments

As emphasized earlier, analogical reasoning takes in a great deal more than analogical arguments. In this section, we examine two broad contexts in which analogical reasoning is important.

The first, still closely linked to analogical arguments, is the confirmation of scientific hypotheses. Confirmation is the process by which a scientific hypothesis receives inductive support on the basis of evidence (see evidence , confirmation , and Bayes’ Theorem ). Confirmation may also signify the logical relationship of inductive support that obtains between a hypothesis \(H\) and a proposition \(E\) that expresses the relevant evidence. Can analogical arguments play a role, either in the process or in the logical relationship? Arguably yes (to both), but this role has to be delineated carefully, and several obstacles remain in the way of a clear account.

The second context is conceptual and theoretical development in cutting-edge scientific research. Analogies are used to suggest possible extensions of theoretical concepts and ideas. The reasoning is linked to considerations of plausibility, but there is no straightforward analysis in terms of analogical arguments.

How is analogical reasoning related to the confirmation of scientific hypotheses? The examples and philosophical discussion from earlier sections suggest that a good analogical argument can indeed provide support for a hypothesis. But there are good reasons to doubt the claim that analogies provide actual confirmation.

In the first place, there is a logical difficulty. To appreciate this, let us concentrate on confirmation as a relationship between propositions. Christensen (1999: 441) offers a helpful general characterization:

Some propositions seem to help make it rational to believe other propositions. When our current confidence in \(E\) helps make rational our current confidence in \(H\), we say that \(E\) confirms \(H\).

In the Bayesian model, ‘confidence’ is represented in terms of subjective probability. A Bayesian agent starts with an assignment of subjective probabilities to a class of propositions. Confirmation is understood as a three-place relation:

\(E\) represents a proposition about accepted evidence, \(H\) stands for a hypothesis, \(K\) for background knowledge and \(Pr\) for the agent’s subjective probability function. To confirm \(H\) is to raise its conditional probability, relative to \(K\). The shift from prior probability \(Pr(H \mid K)\) to posterior probability \(Pr(H \mid E \cdot K)\) is referred to as conditionalization on \(E\). The relation between these two probabilities is typically given by Bayes’ Theorem (setting aside more complex forms of conditionalization):

For Bayesians, here is the logical difficulty: it seems that an analogical argument cannot provide confirmation. In the first place, it is not clear that we can encapsulate the information contained in an analogical argument in a single proposition, \(E\). Second, even if we can formulate a proposition \(E\) that expresses that information, it is typically not appropriate to treat it as evidence because the information contained in \(E\) is already part of the background, \(K\). This means that \(E \cdot K\) is equivalent to \(K\), and hence \(Pr(H \mid E \cdot K) = Pr(H \mid K)\). According to the Bayesian definition, we don’t have confirmation. (This is a version of the problem of old evidence; see confirmation .) Third, and perhaps most important, analogical arguments are often applied to novel hypotheses \(H\) for which the prior probability \(Pr(H \mid K)\) is not even defined. Again, the definition of confirmation in terms of Bayesian conditionalization seems inapplicable.

If analogies don’t provide inductive support via ordinary conditionalization, is there an alternative? Here we face a second difficulty, once again most easily stated within a Bayesian framework. Van Fraassen (1989) has a well-known objection to any belief-updating rule other than conditionalization. This objection applies to any rule that allows us to boost credences when there is no new evidence. The criticism, made vivid by the tale of Bayesian Peter, is that these ‘ampliative’ rules are vulnerable to a Dutch Book . Adopting any such rule would lead us to acknowledge as fair a system of bets that foreseeably leads to certain loss. Any rule of this type for analogical reasoning appears to be vulnerable to van Fraassen’s objection.

There appear to be at least three routes to avoiding these difficulties and finding a role for analogical arguments within Bayesian epistemology. First, there is what we might call minimal Bayesianism . Within the Bayesian framework, some writers (Jeffreys 1973; Salmon 1967, 1990; Shimony 1970) have argued that a ‘seriously proposed’ hypothesis must have a sufficiently high prior probability to allow it to become preferred as the result of observation. Salmon has suggested that analogical reasoning is one of the most important means of showing that a hypothesis is ‘serious’ in this sense. If analogical reasoning is directed primarily towards prior probability assignments, it can provide inductive support while remaining formally distinct from confirmation, avoiding the logical difficulties noted above. This approach is minimally Bayesian because it provides nothing more than an entry point into the Bayesian apparatus, and it only applies to novel hypotheses. An orthodox Bayesian, such as de Finetti (de Finetti and Savage 1972, de Finetti 1974), might have no problem in allowing that analogies play this role.

The second approach is liberal Bayesianism : we can change our prior probabilities in a non-rule-based fashion . Something along these lines is needed if analogical arguments are supposed to shift opinion about an already existing hypothesis without any new evidence. This is common in fields such as archaeology, as part of a strategy that Wylie refers to as “mobilizing old data as new evidence” (Wylie and Chapman 2016: 95). As Hawthorne (2012) notes, some Bayesians simply accept that both initial assignments and ongoing revision of prior probabilities (based on plausibility arguments) can be rational, but

the logic of Bayesian induction (as described here) has nothing to say about what values the prior plausibility assessments for hypotheses should have; and it places no restrictions on how they might change.

In other words, by not stating any rules for this type of probability revision, we avoid the difficulties noted by van Fraassen. This approach admits analogical reasoning into the Bayesian tent, but acknowledges a dark corner of the tent in which rationality operates without any clear rules.

Recently, a third approach has attracted interest: analogue confirmation or confirmation via analogue simulation . As described in (Dardashti et al. 2017), the idea is as follows:

Our key idea is that, in certain circumstances, predictions concerning inaccessible phenomena can be confirmed via an analogue simulation in a different system. (57)

Dardashti and his co-authors concentrate on a particular example ( Example 17 ): ‘dumb holes’ and other analogues to gravitational black holes (Unruh 1981; Unruh 2008). Unlike real black holes, some of these analogues can be (and indeed have been) implemented and studied in the lab. Given the exact formal analogy between our models for these systems and our models of black holes, and certain important additional assumptions, Dardashti et al. make the controversial claim that observations made about the analogues provide evidence about actual black holes. For instance, the observation of phenomena analogous to Hawking radiation in the analogue systems would provide confirmation for the existence of Hawking radiation in black holes. In a second paper (Dardashti et al. 2018, Other Internet Resources), the case for confirmation is developed within a Bayesian framework.

The appeal of a clearly articulated mechanism for analogue confirmation is obvious. It would provide a tool for exploring confirmation of inaccessible phenomena not just in cosmology, but also in historical sciences such as archaeology and evolutionary biology, and in areas of medical science where ethical constraints rule out experiments on human subjects. Furthermore, as noted by Dardashti et al., analogue confirmation relies on new evidence obtained from the analogue system, and is therefore not vulnerable to the logical difficulties noted above.

Although the concept of analogue confirmation is not entirely new (think of animal testing, as in Example 8 ), the claims of (Dardashti et al. 2017, 2018 [Other Internet Resources]) require evaluation. One immediate difficulty for the black hole example: if we think in terms of ordinary analogical arguments, there is no positive analogy because, to put it simply, we have no basis of known similarities between a ‘dumb hole’ and a black hole. As Crowther et al. (2018, Other Internet Resources) argue, “it is not known if the particular modelling framework used in the derivation of Hawking radiation actually describes black holes in the first place. ” This may not concern Dardashti et al., since they claim that analogue confirmation is distinct from ordinary analogical arguments. It may turn out that analogue confirmation is different for cases such as animal testing, where we have a basis of known similarities, and for cases where our only access to the target domain is via a theoretical model.

In §3.6 , we saw that practice-based studies of analogy provide insight into the criteria for evaluating analogical arguments. Such studies also point to dynamical or programmatic roles for analogies, which appear to require evaluative frameworks that go beyond those developed for analogical arguments.

Knuttila and Loettgers (2014) examine the role of analogical reasoning in synthetic biology, an interdisciplinary field that draws on physics, chemistry, biology, engineering and computational science. The main role for analogies in this field is not the construction of individual analogical arguments but rather the development of concepts such as “noise” and “feedback loops”. Such concepts undergo constant refinement, guided by both positive and negative analogies to their analogues in engineered and physical systems. Analogical reasoning here is “transient, heterogeneous, and programmatic” (87). Negative analogies, seen as problematic obstacles for individual analogical arguments, take on a prominent and constructive role when the focus is theoretical construction and concept refinement.

Similar observations apply to analogical reasoning in its application to another cutting-edge field: emergent gravity. In this area of physics, distinct theoretical approaches portray gravity as emerging from different microstructures (Linneman and Visser 2018). “Novel and robust” features not present at the micro-level emerge in the gravitational theory. Analogies with other emergent phenomena, such as hydrodynamics and thermodynamics, are exploited to shape these proposals. As with synthetic biology, analogical reasoning is not directed primarily towards the formulation and assessment of individual arguments. Rather, its role is to develop different theoretical models of gravity.

These studies explore fluid and creative applications of analogy to shape concepts on the front lines of scientific research. An adequate analysis would certainly take us beyond the analysis of individual analogical arguments, which have been the focus of our attention. Knuttila and Loettgers (2014) are led to reject the idea that the individual analogical argument is the “primary unit” in analogical reasoning, but this is a debatable conclusion. Linneman and Visser (2018), for instance, explicitly affirm the importance of assessing the case for different gravitational models through “exemplary analogical arguments”:

We have taken up the challenge of making explicit arguments in favour of an emergent gravity paradigm… That arguments can only be plausibility arguments at the heuristic level does not mean that they are immune to scrutiny and critical assessment tout court. The philosopher of physics’ job in the process of discovery of quantum gravity… should amount to providing exactly this kind of assessments. (Linneman and Visser 2018: 12)

Accordingly, Linneman and Visser formulate explicit analogical arguments for each model of emergent gravity, and assess them using familiar criteria for evaluating individual analogical arguments. Arguably, even the most ambitious heuristic objectives still depend upon considerations of plausibility that benefit by being expressed, and examined, in terms of analogical arguments.

  • Achinstein, P., 1964, “Models, Analogies and Theories,” Philosophy of Science , 31: 328–349.
  • Agassi, J., 1964, “Discussion: Analogies as Generalizations,” Philosophy of Science , 31: 351–356.
  • –––, 1988, “Analogies Hard and Soft,” in D.H. Helman (ed.) 1988, 401–19.
  • Aristotle, 1984, The Complete Works of Aristotle , J. Barnes (ed.), Princeton: Princeton University Press.
  • Ashley, K.D., 1990, Modeling Legal Argument: Reasoning with Cases and Hypotheticals , Cambridge: MIT Press/Bradford Books.
  • Bailer-Jones, D., 2002, “Models, Metaphors and Analogies,” in Blackwell Guide to the Philosophy of Science , P. Machamer and M. Silberstein (eds.), 108–127, Cambridge: Blackwell.
  • Bartha, P., 2010, By Parallel Reasoning: The Construction and Evaluation of Analogical Arguments , New York: Oxford University Press.
  • Bermejo-Luque, L., 2012, “A unitary schema for arguments by analogy,” Informal Logic , 11(3): 161–172.
  • Biela, A., 1991, Analogy in Science , Frankfurt: Peter Lang.
  • Black, M., 1962, Models and Metaphors , Ithaca: Cornell University Press.
  • Campbell, N.R., 1920, Physics: The Elements , Cambridge: Cambridge University Press.
  • –––, 1957, Foundations of Science , New York: Dover.
  • Carbonell, J.G., 1983, “Learning by Analogy: Formulating and Generalizing Plans from Past Experience,” in Machine Learning: An Artificial Intelligence Approach , vol. 1 , R. Michalski, J. Carbonell and T. Mitchell (eds.), 137–162, Palo Alto: Tioga.
  • –––, 1986, “Derivational Analogy: A Theory of Reconstructive Problem Solving and Expertise Acquisition,” in Machine Learning: An Artificial Intelligence Approach, vol. 2 , J. Carbonell, R. Michalski, and T. Mitchell (eds.), 371–392, Los Altos: Morgan Kaufmann.
  • Carnap, R., 1980, “A Basic System of Inductive Logic Part II,” in Studies in Inductive Logic and Probability, vol. 2 , R.C. Jeffrey (ed.), 7–155, Berkeley: University of California Press.
  • Cartwright, N., 1992, “Aristotelian Natures and the Modern Experimental Method,” in Inference, Explanation, and Other Frustrations , J. Earman (ed.), Berkeley: University of California Press.
  • Christensen, D., 1999, “Measuring Confirmation,” Journal of Philosophy 96(9): 437–61.
  • Cohen, L. J., 1980, “Some Historical Remarks on the Baconian Conception of Probability,” Journal of the History of Ideas 41: 219–231.
  • Copi, I., 1961, Introduction to Logic, 2nd edition , New York: Macmillan.
  • Copi, I. and C. Cohen, 2005, Introduction to Logic, 12 th edition , Upper Saddle River, New Jersey: Prentice-Hall.
  • Cross, R. and J.W. Harris, 1991, Precedent in English Law, 4 th ed. , Oxford: Clarendon Press.
  • Currie, A., 2013, “Convergence as Evidence,” British Journal for the Philosophy of Science , 64: 763–86.
  • –––, 2016, “Ethnographic analogy, the comparative method, and archaeological special pleading,” Studies in History and Philosophy of Science , 55: 84–94.
  • –––, 2018, Rock, Bone and Ruin , Cambridge, MA: MIT Press.
  • Dardashti, R., K. Thébault, and E. Winsberg, 2017, “Confirmation via Analogue Simulation: What Dumb Holes Could Tell Us about Gravity,” British Journal for the Philosophy of Science , 68: 55–89.
  • Darwin, C., 1903, More Letters of Charles Darwin, vol. I , F. Darwin (ed.), New York: D. Appleton.
  • Davies, T.R., 1988, “Determination, Uniformity, and Relevance: Normative Criteria for Generalization and Reasoning by Analogy,” in D.H. Helman (ed.) 1988, 227–50.
  • Davies, T.R. and S. Russell, 1987, “A Logical Approach to Reasoning by Analogy,” in IJCAI 87: Proceedings of the Tenth International Joint Conference on Artificial Intelligence , J. McDermott (ed.), 264–70, Los Altos, CA: Morgan Kaufmann.
  • De Finetti, B., 1974, Theory of Probability, vols. 1 and 2 , trans. A. Machí and A. Smith, New York: Wiley.
  • De Finetti, B. and L.J. Savage, 1972, “How to Choose the Initial Probabilities,” in B. de Finetti, Probability, Induction and Statistics , 143–146, New York: Wiley.
  • Descartes, R., 1637/1954, The Geometry of René Descartes , trans. D.E. Smith and M.L. Latham, New York: Dover.
  • Douven, I. and T. Williamson, 2006, “Generalizing the Lottery Paradox,” British Journal for the Philosophy of Science , 57: 755–779.
  • Eliasmith, C. and P. Thagard, 2001, “Integrating structure and meaning: a distributed model of analogical mapping,” Cognitive Science 25: 245–286.
  • Evans, T.G., 1968, “A Program for the Solution of Geometric-Analogy Intelligence-Test Questions,” in M.L. Minsky (ed.), 271–353, Semantic Information Processing , Cambridge: MIT Press.
  • Falkenhainer, B., K. Forbus, and D. Gentner, 1989/90, “The Structure-Mapping Engine: Algorithm and Examples,” Artificial Intelligence 41: 2–63.
  • Forbus, K, 2001, “Exploring Analogy in the Large,” in D. Gentner, K. Holyoak, and B. Kokinov (eds.) 2001, 23–58.
  • Forbus, K., R. Ferguson, and D. Gentner, 1994, “Incremental Structure-mapping,” in Proceedings of the Sixteenth Annual Conference of the Cognitive Science Society , A. Ram and K. Eiselt (eds.), 313–18, Hillsdale, NJ: Lawrence Erlbaum.
  • Forbus, K., C. Riesbeck, L. Birnbaum, K. Livingston, A. Sharma, and L. Ureel, 2007, “A prototype system that learns by reading simplified texts,” in AAAI Spring Symposium on Machine Reading , Stanford University, California.
  • Forbus, K., J. Usher, A. Lovett, K. Lockwood, and J. Wetzel, 2008, “Cogsketch: Open domain sketch understanding for cognitive science research and for education,” in Proceedings of the Fifth Eurographics Workshop on Sketch-Based Interfaces and Modeling , Annecy, France.
  • Forbus, K., R. Ferguson, A. Lovett, and D. Gentner, 2017, “Extending SME to Handle Large-Scale Cognitive Modeling,” Cognitive Science , 41(5): 1152–1201.
  • Franklin, B., 1941, Benjamin Franklin’s Experiments , I.B. Cohen (ed.), Cambridge: Harvard University Press.
  • Fraser, D., forthcoming, “The development of renormalization group methods for particle physics: Formal analogies between classical statistical mechanics and quantum field theory,” Synthese , first online 29 June 2018. doi:10.1007/s11229-018-1862-0
  • Galilei, G., 1610 [1983], The Starry Messenger , S. Drake (trans.) in Telescopes, Tides and Tactics , Chicago: University of Chicago Press.
  • Gentner, D., 1983, “Structure-Mapping: A Theoretical Framework for Analogy,” Cognitive Science 7: 155–70.
  • Gentner, D., K. Holyoak, and B. Kokinov (eds.), 2001, The Analogical Mind: Perspectives from Cognitive Science , Cambridge: MIT Press.
  • Gildenhuys, P., 2004, “Darwin, Herschel, and the role of analogy in Darwin’s Origin,” Studies in the History and Philosophy of Biological and Biomedical Sciences , 35: 593–611.
  • Gould, R.A. and P.J. Watson, 1982, “A Dialogue on the Meaning and Use of Analogy in Ethnoarchaeological Reasoning,” Journal of Anthropological Archaeology 1: 355–381.
  • Govier, T., 1999, The Philosophy of Argument , Newport News, VA: Vale Press.
  • Guarini, M., 2004, “A Defence of Non-deductive Reconstructions of Analogical Arguments,” Informal Logic , 24(2): 153–168.
  • Hadamard, J., 1949, An Essay on the Psychology of Invention in the Mathematical Field , Princeton: Princeton University Press.
  • Hájek, A., 2018, “Creating heuristics for philosophical creativity,” in Creativity and Philosophy , B. Gaut and M. Kieran (eds.), New York: Routledge, 292–312.
  • Halpern, J. Y., 2003, Reasoning About Uncertainty , Cambridge, MA: MIT Press.
  • Harrod, R.F., 1956, Foundations of Inductive Logic , London: Macmillan.
  • Hawthorne, J., 2012, “Inductive Logic”, The Stanford Encyclopedia of Philosophy (Winter 2012 edition), Edward N. Zalta (ed.), URL= < https://plato.stanford.edu/archives/win2012/entries/logic-inductive/ >.
  • Helman, D.H. (ed.), 1988, Analogical Reasoning: perspectives of artificial intelligence, cognitive science, and philosophy , Dordrecht: Kluwer Academic Publishers.
  • Hempel, C.G., 1965, “Aspects of Scientific Explanation,” in Aspects of Scientific Explanation and Other Essays in the Philosophy of Science , 331–496, New York: Free Press.
  • Hesse, M.B., 1964, “Analogy and Confirmation Theory,” Philosophy of Science , 31: 319–327.
  • –––, 1966, Models and Analogies in Science , Notre Dame: University of Notre Dame Press.
  • –––, 1973, “Logic of discovery in Maxwell’s electromagnetic theory,” in Foundations of scientific method: the nineteenth century , R. Giere and R. Westfall (eds.), 86–114, Bloomington: University of Indiana Press.
  • –––, 1974, The Structure of Scientific Inference , Berkeley: University of California Press.
  • –––, 1988, “Theories, Family Resemblances and Analogy,” in D.H. Helman (ed.) 1988, 317–40.
  • Hofstadter, D., 1995, Fluid Concepts and Creative Analogies , New York: BasicBooks (Harper Collins).
  • –––, 2001, “Epilogue: Analogy as the Core of Cognition,” in Gentner, Holyoak, and Kokinov (eds.) 2001, 499–538.
  • Hofstadter, D., and E. Sander, 2013, Surfaces and Essences: Analogy as the Fuel and Fire of Thinking , New York: Basic Books.
  • Holyoak, K. and P. Thagard, 1989, “Analogical Mapping by Constraint Satisfaction,” Cognitive Science , 13: 295–355.
  • –––, 1995, Mental Leaps: Analogy in Creative Thought , Cambridge: MIT Press.
  • Huber, F., 2009, “Belief and Degrees of Belief,” in F. Huber and C. Schmidt-Petri (eds.) 2009, 1–33.
  • Huber, F. and C. Schmidt-Petri, 2009, Degrees of Belief , Springer, 2009,
  • Hume, D. 1779/1947, Dialogues Concerning Natural Religion , Indianapolis: Bobbs-Merrill.
  • Hummel, J. and K. Holyoak, 1997, “Distributed Representations of Structure: A Theory of Analogical Access and Mapping,” Psychological Review 104(3): 427–466.
  • –––, 2003, “A symbolic-connectionist theory of relational inference and generalization,” Psychological Review 110: 220–264.
  • Hunter, D. and P. Whitten (eds.), 1976, Encyclopedia of Anthropology , New York: Harper & Row.
  • Huygens, C., 1690/1962, Treatise on Light , trans. S. Thompson, New York: Dover.
  • Indurkhya, B., 1992, Metaphor and Cognition , Dordrecht: Kluwer Academic Publishers.
  • Jeffreys, H., 1973, Scientific Inference, 3rd ed. , Cambridge: Cambridge University Press.
  • Keynes, J.M., 1921, A Treatise on Probability , London: Macmillan.
  • Knuuttila, T., and A. Loettgers, 2014, “Varieties of noise: Analogical reasoning in synthetic biology,” Studies in History and Philosophy of Science , 48: 76–88.
  • Kokinov, B., K. Holyoak, and D. Gentner (eds.), 2009, New Frontiers in Analogy Research : Proceedings of the Second International Conference on Analogy ANALOGY-2009 , Sofia: New Bulgarian University Press.
  • Kraus, M., 2015, “Arguments by Analogy (and What We Can Learn about Them from Aristotle),” in Reflections on Theoretical Issues in Argumentation Theory , F.H. van Eemeren and B. Garssen (eds.), Cham: Springer, 171–182. doi: 10.1007/978-3-319-21103-9_13
  • Kroes, P., 1989, “Structural analogies between physical systems,” British Journal for the Philosophy of Science , 40: 145–54.
  • Kuhn, T.S., 1996, The Structure of Scientific Revolutions , 3 rd edition, Chicago: University of Chicago Press.
  • Kuipers, T., 1988, “Inductive Analogy by Similarity and Proximity,” in D.H. Helman (ed.) 1988, 299–313.
  • Lakoff, G. and M. Johnson, 1980, Metaphors We Live By , Chicago: University of Chicago Press.
  • Leatherdale, W.H., 1974, The Role of Analogy, Model, and Metaphor in Science , Amsterdam: North-Holland Publishing.
  • Lee, H.S. and Holyoak, K.J., 2008, “Absence Makes the Thought Grow Stronger: Reducing Structural Overlap Can Increase Inductive Strength,” in Proceedings of the Thirtieth Annual Conference of the Cognitive Science Society , V. Sloutsky, B. Love, and K. McRae (eds.), 297–302, Austin: Cognitive Science Society.
  • Lembeck, F., 1989, Scientific Alternatives to Animal Experiments , Chichester: Ellis Horwood.
  • Levi, E., 1949, An Introduction to Legal Reasoning , Chicago: University of Chicago Press.
  • Linnemann, N., and M. Visser, 2018, “Hints towards the emergent nature of gravity,” Studies in History and Philosophy of Modern Physics , 30: 1–13.
  • Liston, M., 2000, “Critical Discussion of Mark Steiner’s The Applicability of Mathematics as a Philosophical Problem,” Philosophia Mathematica , 3(8): 190–207.
  • Llewellyn, K., 1960, The Bramble Bush: On Our Law and its Study , New York: Oceana.
  • Lloyd, G.E.R., 1966, Polarity and Analogy , Cambridge: Cambridge University Press.
  • Macagno, F., D. Walton and C. Tindale, 2017, “Analogical Arguments: Inferential Structures and Defeasibility Conditions,” Argumentation , 31: 221–243.
  • Maher, P., 2000, “Probabilities for Two Properties,” Erkenntnis , 52: 63–91.
  • Maier, C.L., 1981, The Role of Spectroscopy in the Acceptance of the Internally Structured Atom 1860–1920 , New York: Arno Press.
  • Maxwell, J.C., 1890, Scientific Papers of James Clerk Maxwell, Vol. I , W.D. Niven (ed.), Cambridge: Cambridge University Press.
  • McKay, C.P., 1993, “Did Mars once have Martians?” Astronomy , 21(9): 26–33.
  • McMullin, Ernan, 1993, “Rationality and Paradigm Change in Science,” in World Changes: Thomas Kuhn and the Nature of Science , P. Horwich (ed.), 55–78, Cambridge: MIT Press.
  • Mill, J.S., 1843/1930, A System of Logic , London: Longmans-Green.
  • Mitchell, M., 1993, Analogy-Making as Perception , Cambridge: Bradford Books/MIT Press.
  • Moore, B. N. and R. Parker, 1998, Critical Thinking, 5th ed. , Mountain View, CA: Mayfield.
  • Nersessian, N., 2002, “Maxwell and ‘the Method of Physical Analogy’: Model-Based Reasoning, Generic Abstraction, and Conceptual Change,” in Reading Natural Philosophy , D. Malament (ed.), Chicago: Open Court.
  • –––, 2009, “Conceptual Change: Creativity, Cognition, and Culture,” in Models of Discovery and Creativity , J. Meheus and T. Nickles (eds.), Dordrecht: Springer 127–166.
  • Niiniluoto, I., 1988, “Analogy and Similarity in Scientific Reasoning,” in D.H. Helman (ed.) 1988, 271–98.
  • Norton, J., 2010, “There Are No Universal Rules for Induction,” Philosophy of Science , 77: 765–777.
  • Ortony, A. (ed.), 1979, Metaphor and Thought , Cambridge: Cambridge University Press.
  • Oppenheimer, R., 1955, “Analogy in Science,” American Psychologist 11(3): 127–135.
  • Pietarinen, J., 1972, Lawlikeness, Analogy and Inductive Logic , Amsterdam: North-Holland.
  • Poincaré, H., 1952a, Science and Hypothesis , trans. W.J. Greenstreet, New York: Dover.
  • –––, 1952b, Science and Method , trans. F. Maitland, New York: Dover.
  • Polya, G., 1954, Mathematics and Plausible Reasoning , 2 nd ed. 1968, two vols., Princeton: Princeton University Press.
  • Prieditis, A. (ed.), 1988, Analogica , London: Pitman.
  • Priestley, J., 1769, 1775/1966, The History and Present State of Electricity, Vols. I and II , New York: Johnson. Reprint.
  • Quine, W.V., 1969, “Natural Kinds,” in Ontological Relativity and Other Essays , 114–138, New York: Columbia University Press.
  • Quine, W.V. and J.S. Ullian, 1970, The Web of Belief , New York: Random House.
  • Radin, M., 1933, “Case Law and Stare Decisis ,” Columbia Law Review 33 (February), 199.
  • Reid, T., 1785/1895, Essays on the Intellectual Powers of Man . The Works of Thomas Reid, vol. 3, 8 th ed. , Sir William Hamilton (ed.), Edinburgh: James Thin.
  • Reiss, J., 2015, “A Pragmatist Theory of Evidence,” Philosophy of Science , 82: 341–62.
  • Reynolds, A.K. and L.O. Randall, 1975, Morphine and Related Drugs , Toronto: University of Toronto Press.
  • Richards, R.A., 1997, “Darwin and the inefficacy of artificial selection,” Studies in History and Philosophy of Science , 28(1): 75–97.
  • Robinson, D.S., 1930, The Principles of Reasoning, 2nd ed ., New York: D. Appleton.
  • Romeijn, J.W., 2006, “Analogical Predictions for Explicit Similarity,” Erkenntnis , 64(2): 253–80.
  • Russell, S., 1986, Analogical and Inductive Reasoning , Ph.D. thesis, Department of Computer Science, Stanford University, Stanford, CA.
  • –––, 1988, “Analogy by Similarity,” in D.H. Helman (ed.) 1988, 251–269.
  • Salmon, W., 1967, The Foundations of Scientific Inference , Pittsburgh: University of Pittsburgh Press.
  • –––, 1990, “Rationality and Objectivity in Science, or Tom Kuhn Meets Tom Bayes,” in Scientific Theories (Minnesota Studies in the Philosophy of Science: Volume 14), C. Wade Savage (ed.), Minneapolis: University of Minnesota Press, 175–204.
  • Sanders, K., 1991, “Representing and Reasoning about Open-Textured Predicates,” in Proceedings of the Third International Conference on Artificial Intelligence and Law , New York: Association of Computing Machinery, 137–144.
  • Schlimm, D., 2008, “Two Ways of Analogy: Extending the Study of Analogies to Mathematical Domains,” Philosophy of Science , 75: 178–200.
  • Shelley, C., 1999, “Multiple Analogies in Archaeology,” Philosophy of Science , 66: 579–605.
  • –––, 2003, Multiple Analogies in Science and Philosophy , Amsterdam: John Benjamins.
  • Shimony, A., 1970, “Scientific Inference,” in The Nature and Function of Scientific Theories , R. Colodny (ed.), Pittsburgh: University of Pittsburgh Press, 79–172.
  • Snyder, L., 2006, Reforming Philosophy: A Victorian Debate on Science and Society , Chicago: University of Chicago Press.
  • Spohn, W., 2009, “A Survey of Ranking Theory,” in F. Huber and C. Schmidt-Petri (eds.) 2009, 185-228.
  • –––, 2012, The Laws of Belief: Ranking Theory and its Philosophical Applications , Oxford: Oxford University Press.
  • Stebbing, L.S., 1933, A Modern Introduction to Logic, 2nd edition , London: Methuen.
  • Steiner, M., 1989, “The Application of Mathematics to Natural Science,” Journal of Philosophy , 86: 449–480.
  • –––, 1998, The Applicability of Mathematics as a Philosophical Problem , Cambridge, MA: Harvard University Press.
  • Stepan, N., 1996, “Race and Gender: The Role of Analogy in Science,” in Feminism and Science , E.G. Keller and H. Longino (eds.), Oxford: Oxford University Press, 121–136.
  • Sterrett, S., 2006, “Models of Machines and Models of Phenomena,” International Studies in the Philosophy of Science , 20(March): 69–80.
  • Sunstein, C., 1993, “On Analogical Reasoning,” Harvard Law Review , 106: 741–791.
  • Thagard, P., 1989, “Explanatory Coherence,” Behavioral and Brain Science , 12: 435–502.
  • Timoshenko, S. and J. Goodier, 1970, Theory of Elasticity , 3rd edition, New York: McGraw-Hill.
  • Toulmin, S., 1958, The Uses of Argument , Cambridge: Cambridge University Press.
  • Turney, P., 2008, “The Latent Relation Mapping Engine: Algorithm and Experiments,” Journal of Artificial Intelligence Research , 33: 615–55.
  • Unruh, W., 1981, “Experimental Black-Hole Evaporation?,” Physical Review Letters , 46: 1351–3.
  • –––, 2008, “Dumb Holes: Analogues for Black Holes,” Philosophical Transactions of the Royal Society A , 366: 2905–13.
  • Van Fraassen, Bas, 1980, The Scientific Image , Oxford: Clarendon Press.
  • –––, 1984, “Belief and the Will,” Journal of Philosophy , 81: 235–256.
  • –––, 1989, Laws and Symmetry , Oxford: Clarendon Press.
  • –––, 1995, “Belief and the Problem of Ulysses and the Sirens,” Philosophical Studies , 77: 7–37.
  • Waller, B., 2001, “Classifying and analyzing analogies,” Informal Logic , 21(3): 199–218.
  • Walton, D. and C. Hyra, 2018, “Analogical Arguments in Persuasive and Deliberative Contexts,” Informal Logic , 38(2): 213–261.
  • Weitzenfeld, J.S., 1984, “Valid Reasoning by Analogy,” Philosophy of Science , 51: 137–49.
  • Woods, J., A. Irvine, and D. Walton, 2004, Argument: Critical Thinking, Logic and the Fallacies , 2 nd edition, Toronto: Prentice-Hall.
  • Wylie, A., 1982, “An Analogy by Any Other Name Is Just as Analogical,” Journal of Anthropological Archaeology , 1: 382–401.
  • –––, 1985, “The Reaction Against Analogy,” Advances in Archaeological Method and Theory , 8: 63–111.
  • Wylie, A., and R. Chapman, 2016, Evidential Reasoning in Archaeology , Bloomsbury Academic.
How to cite this entry . Preview the PDF version of this entry at the Friends of the SEP Society . Look up topics and thinkers related to this entry at the Internet Philosophy Ontology Project (InPhO). Enhanced bibliography for this entry at PhilPapers , with links to its database.

Other Internet Resources

  • Crowther, K., N. Linnemann, and C. Wüthrich, 2018, “ What we cannot learn from analogue experiments ,” online at arXiv.org.
  • Dardashti, R., S. Hartmann, K. Thébault, and E. Winsberg, 2018, “ Hawking Radiation and Analogue Experiments: A Bayesian Analysis ,” online at PhilSci Archive.
  • Norton, J., 2018. “ Analogy ”, unpublished draft, University of Pittsburgh.
  • Resources for Research on Analogy: a Multi-Disciplinary Guide (University of Windsor)
  • UCLA Reasoning Lab (UCLA)
  • Dedre Gentner’s publications (Northwestern University)
  • The Center for Research on Concepts and Cognition (Indiana University)

abduction | analogy: medieval theories of | argument and argumentation | Bayes’ Theorem | confirmation | epistemology: Bayesian | evidence | legal reasoning: precedent and analogy in | logic: inductive | metaphor | models in science | probability, interpretations of | scientific discovery

Copyright © 2019 by Paul Bartha < paul . bartha @ ubc . ca >

  • Accessibility

Support SEP

Mirror sites.

View this site from another server:

  • Info about mirror sites

The Stanford Encyclopedia of Philosophy is copyright © 2023 by The Metaphysics Research Lab , Department of Philosophy, Stanford University

Library of Congress Catalog Data: ISSN 1095-5054

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings
  • Advanced Search
  • Journal List
  • Clin Orthop Relat Res
  • v.468(3); 2010 Mar

Logo of corr

P Value and the Theory of Hypothesis Testing: An Explanation for New Researchers

David jean biau.

1 Département de Biostatistique et Informatique Médicale, INSERM–UMR-S 717, AP-HP, Université Paris 7, Hôpital Saint Louis, 1, Avenue Claude-Vellefaux, Paris Cedex, 10 75475 France

Brigitte M. Jolles

2 Hôpital Orthopédique Département de l’Appareil Locomoteur Centre Hospitalier, Universitaire Vaudois Université de Lausanne, Lausanne, Switzerland

Raphaël Porcher

In the 1920s, Ronald Fisher developed the theory behind the p value and Jerzy Neyman and Egon Pearson developed the theory of hypothesis testing. These distinct theories have provided researchers important quantitative tools to confirm or refute their hypotheses. The p value is the probability to obtain an effect equal to or more extreme than the one observed presuming the null hypothesis of no effect is true; it gives researchers a measure of the strength of evidence against the null hypothesis. As commonly used, investigators will select a threshold p value below which they will reject the null hypothesis. The theory of hypothesis testing allows researchers to reject a null hypothesis in favor of an alternative hypothesis of some effect. As commonly used, investigators choose Type I error (rejecting the null hypothesis when it is true) and Type II error (accepting the null hypothesis when it is false) levels and determine some critical region. If the test statistic falls into that critical region, the null hypothesis is rejected in favor of the alternative hypothesis. Despite similarities between the two, the p value and the theory of hypothesis testing are different theories that often are misunderstood and confused, leading researchers to improper conclusions. Perhaps the most common misconception is to consider the p value as the probability that the null hypothesis is true rather than the probability of obtaining the difference observed, or one that is more extreme, considering the null is true. Another concern is the risk that an important proportion of statistically significant results are falsely significant. Researchers should have a minimum understanding of these two theories so that they are better able to plan, conduct, interpret, and report scientific experiments.

Introduction

“We are inclined to think that as far as a particular hypothesis is concerned, no test based upon a theory of probability can by itself provide any valuable evidence of the truth or falsehood of a hypothesis” [ 15 ].

Since their introduction in the 1920s, the p value and the theory of hypothesis testing have permeated the scientific community and medical research almost completely. These theories allow a researcher to address a certain hypothesis such as the superiority of one treatment over another or the association between a characteristic and an outcome. In these cases, researchers frequently wish to disprove the well-known null hypothesis, that is, the absence of difference between treatments or the absence of association of a characteristic with outcome. Although statistically the null hypothesis does not necessarily relate to no effect or to no association, the presumption that it does relate to no effect or association frequently is made in medical research and the one we will consider here. The introduction of these theories in scientific reasoning has provided important quantitative tools for researchers to plan studies, report findings, compare results, and even make decisions. However, there is increasing concern that these tools are not properly used [ 9 , 10 , 13 , 20 ].

The p value is attributed to Ronald Fisher and represents the probability of obtaining an effect equal to or more extreme than the one observed considering the null hypothesis is true [ 3 ]. The lower the p value, the more unlikely the null hypothesis is, and at some point of low probability, the null hypothesis is preferably rejected. The p value thus provides a quantitative strength of evidence against the null hypothesis stated.

The theory of hypothesis testing formulated by Jerzy Neyman and Egon Pearson [ 15 ] was that regardless of the results of an experiment, one could never be absolutely certain whether a particular treatment was superior to another. However, they proposed one could limit the risks of concluding a difference when there is none (Type I error) or concluding there is no difference when there is one (Type II error) over numerous experiments to prespecified chosen levels denoted α and β, respectively. The theory of hypothesis testing offers a rule of behavior that, in the long run, ensures followers they would not be wrong often.

Despite simple formulations, both theories frequently are misunderstood and misconceptions have emerged in the scientific community. Therefore, researchers should have a minimum understanding of the p value and hypothesis testing to manipulate these tools adequately and avoid misinterpretation and errors in judgment. In this article, we present the basic statistics behind the p value and hypothesis testing, with historical perspectives, common misunderstandings, and examples of use for each theory. Finally, we discuss the implications of these issues for clinical research.

The p Value

The p value is the probability of obtaining an effect equal to or more extreme than the one observed considering the null hypothesis is true. This effect can be a difference in a measurement between two groups or any measure of association between two variables. Although the p value was introduced by Karl Pearson in 1900 with his chi square test [ 17 ], it was the Englishman Sir Ronald A. Fisher, considered by many as the father of modern statistics, who in 1925 first gave the means to calculate the p value in a wide variety of situations [ 3 ].

Fisher’s theory may be presented as follows. Let us consider some hypothesis, namely the null hypothesis, of no association between a characteristic and an outcome. For any magnitude of the association observed after an experiment is conducted, we can compute a test statistic that measures the difference between what is observed and the null hypothesis. This test statistic may be converted to a probability, namely the p value, using the probability distribution of the test statistic under the null hypothesis. For instance, depending on the situation, the test statistic may follow a χ 2 distribution (chi square test statistic) or a Student’s t distribution. Its graphically famous form is the bell-shaped curve of the probability distribution function of a t test statistic (Fig.  1 A). The null hypothesis is said to be disproven if the effect observed is so important, and consequently the p value is so low, that “either an exceptionally rare chance has occurred or the theory is not true” [ 6 ]. Fisher, who was an applied researcher, strongly believed the p value was solely an objective aid to assess the plausibility of a hypothesis and ultimately the conclusion of differences or associations to be drawn remained to the scientist who had all the available facts at hand. Although he supported a p value of 0.05 or less as indicating evidence against the null, he also considered other more stringent cutoffs. In his words “If p is between 0.1 and 0.9 there is certainly no reason to suspect the hypothesis tested. If it is below 0.02 it is strongly indicated that the hypothesis fails to account for the whole of the facts. We shall not often be astray if we draw a conventional line at 0.05…” [ 4 ].

An external file that holds a picture, illustration, etc.
Object name is 11999_2009_1164_Fig1_HTML.jpg

These graphs show the results of three trials (t 1 , t 2 , and t 3 ) comparing the 1-month HHS after miniincision or standard incision hip arthroplasty under the theory of ( A ) Fisher and ( B ) Neyman and Pearson. For these trials, α = 5% and β = 10%. Trial 1 yields a standardized difference between the groups of 0.5 in favor of the standard incision; Trials 2 and 3 yield standardized differences of 1.8 and 2.05, respectively. The corresponding p values are 0.62, 0.074, and 0.042 for Trials 1, 2, and 3, respectively. ( A ) Fisher’s p value for Trial 2 is represented by the gray area under the null hypothesis; it corresponds to the probability of observing a standardized difference of 1.8 (Point 2) or more extreme differences (gray area on both sides) considering the null hypothesis is true. According to Fisher, Trials 2 and 3 provide fair evidence against the null hypothesis of no difference between treatments; the decision to reject the null hypothesis of no difference in these cases will depend on other important information (previous data, etc). Trial 1 provides poor evidence against the null as the difference observed, or one more extreme, had 62% probability of resulting from chance alone if the treatments were equal. ( B ) Under the Neyman and Pearson theory, the Types I (α = 0.05, gray area under the null hypothesis) and II (β = 0.1, shaded area under the alternative hypothesis) error rates and the difference to be detected (δ = 10) define a critical region for the test statistic (|t test| > 1.97). If the test statistic (standardized difference here) falls into that critical region, the null hypothesis is rejected; this is the case for Trial 3. Trials 1 and 2 do not fall into the critical region and the null is not rejected. According to Neyman and Pearson’s theory, the null hypothesis of no difference between treatments is rejected after Trial 3 only. The distributions depicted are the probability distribution functions of the t test with 168 degrees of freedom.

For instance, say a researcher wants to test the association between the existence of a radiolucent line in Zone 1 on the postoperative radiograph in cemented cups and the risk of acetabular loosening. He or she can use a score test in a Cox regression model, after adjusting for other potentially important confounding variables. The null hypothesis that he or she implicitly wants to disprove is that a radiolucent line in Zone 1 has no effect on acetabular loosening. The researcher’s hypothetical study shows an increased occurrence of acetabular loosening when a radiolucent line in Zone 1 exists on the postoperative radiograph and the p value computed using the score test is 0.02. Consequently, the researcher concludes either a rare event has occurred or the null hypothesis of no association is not true. Similarly, the p value may be used to test the null hypothesis of no difference between two or more treatments. The lower the p value, the more likely is the difference between treatments.

The Neyman-Pearson Theory of Hypothesis Testing

We owe the theory of hypothesis testing as we use it today to the Polish mathematician Jerzy Neyman and American statistician Egon Pearson (the son of Karl Pearson). Neyman and Pearson [ 15 ] thought one could not consider a null hypothesis unless one could conceive at least one plausible alternative hypothesis.

Their theory may be presented in a few words this way. Consider a null hypothesis H0 of equal improvement for patients under Treatment A or B and an alternative hypothesis H1 of a difference in improvement of some relevant size δ between the two treatments. Researchers may make two types of incorrect decisions at the end of a trial: they may consider the null hypothesis false when it is true (a Type I error) or consider the null true when it is in fact false (Type II error) (Table  1 ). Neyman and Pearson proposed, if we set the risks we are willing to accept for Type I errors, say α (ie, the probability of a Type I error), and Type II errors, say β (ie, the probability of a Type II error), then, “without hoping to know whether each separate hypothesis is true or false, we may search for rules to govern our behavior with regard to them, in following which we insure that, in the long run of experience, we shall not often be wrong.” These Types I and II error rates allow defining a critical region for the test statistic used. For instance for α set at 5%, the corresponding critical regions would be χ 2  > 3.84 for the chi square statistic or |t 168df | > 1.97 for Student’s t test with 168 degrees of freedom (Fig.  1 B) (the reader need not know the details of these computations to grasp the point). If, for example, the comparison of the mean improvement under Treatments A and B falls into that critical region, then the null hypothesis is rejected in favor of the alternative; otherwise, the null hypothesis is accepted. In the case of group comparisons, the test statistic represents a measure of the likelihood that the groups compared are issued from the same population (null hypothesis): the more groups differ, the higher the test statistic and at some point the null hypothesis is rejected and the alternative is accepted. Although Neyman and Pearson did not view the 5% level for Type I error as a binding threshold, this level has permeated the scientific community. For the Type II error rate, 0.1 or 0.2 often is chosen and corresponds to powers (defined as 1 − β) of 90% and 80%, respectively.

Table 1

Types I and II errors according to the theory of hypothesis tests

α and β represent the probability of Types I and II errors, respectively.

For instance, say a surgeon wants to compare the 1-month Harris hip score (HHS) after miniincision and standard incision hip arthroplasty. With the help of a statistician, he plans a randomized controlled trial and considers the null hypothesis H0 of no difference between the standard treatment and experimental treatment (miniincision) and the alternative hypothesis H1 of a difference δ of more than 10 points on the HHS, which he considers is the minimal clinically important difference. Because the statistician is performing many statistical tests across different studies all day long, she has grown very concerned about false positives and, as a general rule, she is not willing to accept more than 5% Type I error rate, that is, if no difference exists between treatments, there is only a 5% chance to conclude a difference. However, the surgeon is willing to give the best chances to detect that difference if it exists and chooses a Type II error of 10%, ie, a power of 90%; therefore, if a difference of 10 points exists between treatments, there is an acceptable 10% chance that the trial will not detect it. Let us presume the expected 1-month HHS after standard incision hip arthroplasty is 70 and the expected SD in both groups is 20. The required sample size therefore is 85 patients per group (two-sample t test).The critical region to reject the null hypothesis therefore is 1.97 (Student’s t test with 168 degrees of freedom). Therefore, if at the end of the trial Student’s t test yields a statistic of 1.97 or greater, the null hypothesis will be rejected; otherwise the null hypothesis will not be rejected and the trial will conclude no difference between the experimental and standard treatment groups. Although the Neyman-Pearson theory of hypothesis testing usually is used for group comparisons, it also may be used for other purposes such as to test the association of a variable and an outcome.

The Difference between Fisher’s P Value and Neyman-Pearson’s Hypothesis Testing

Despite the fiery opposition these two schools of thought have concentrated against each other for more than 70 years, the two approaches nowadays are embedded in a single exercise that often leads to misuse of the original approaches by naïve researchers and sometimes even statisticians (Table  2 ) [ 13 ]. Fisher’s significance testing with the p value is a practical approach whose statistical properties are derived from a hypothetical infinite population and which applies to any single experiment. Neyman and Pearson’s theory of hypothesis testing is a more mathematical view with statistical properties derived from the long-run frequency of experiments and does not provide by itself evidence of the truth or falsehood of a particular hypothesis. The confusion between approaches comes from the fact that the critical region of Neyman-Pearson theory can be defined in terms of p value. For instance, the critical regions defined by thresholds at ± 1.96 for the normal distribution, 3.84 for the chi square test at 1 degree of freedom, and ± 1.97 for a t test at 168 degrees of freedom all correspond to setting a threshold at 0.05 for the p value. The p value is found more practical because it represents a single probability across the different distributions of numerous test statistics and usually the value of the test statistic is omitted and only the p value is reported.

Table 2

Comparison of Fisher’s p value and Neyman-Pearson’s hypothesis testing

The difference between approaches may be more easily understandable through a hypothetical example. After a trial comparing an experimental Treatment A with a standard Treatment B is conducted, a surgeon has to decide whether Treatment A is or is not superior to Treatment B. Following Fisher’s theory, the surgeon weighs issues such as relevant in vitro tests, the design of the trial, previous results comparing treatments, etc, and the p value of the comparison to eventually reach a conclusion. In such cases, p values of 0.052 and 0.047 likely would be similarly weighted in making the conclusion whereas p values of 0.047 and 0.0001 probably would have differing weights. In contrast, statisticians have to give their opinion regarding an enormous quantity of new drugs and medical devices during their life. They cannot be concerned whether each new particular treatment tested is superior to the standard one because they know the evidence can never be certain. However, they know following Neyman and Pearson’s theory they can control the overall proportion of errors, either Type I or II errors (Table  1 ), they make over their entire career. By setting α at, say, 5% and power (1 – β) at 90%, at the end of their career, they know in 5% cases they will have concluded the experimental treatment was superior to the standard when it was not and in 10% cases they will have concluded the experimental treatment was not different from the standard treatment although it was. In that case, very close p values such as 0.047 and 0.052, will lead to rather dramatically opposite actions. In the first case, the treatment studied will be considered superior and used, when in the second case the treatment will be rejected for inefficacy despite very close evidence observed from the two experiments (in a Fisherian point of view).

Misconceptions When Considering Statistical Results

First, the most common and certainly most serious error made is to consider the p value as the probability that the null hypothesis is true. For instance, in the above-mentioned example to illustrate Fisher’s theory, which yielded a p value of 0.02, one should not conclude the data show there is a 2% chance of no association between the existence of a radiolucent line in Zone 1 on the postoperative radiograph in cemented cups and the risk of acetabular loosening. The p value is not the probability of the null hypothesis being true; it is the probability of observing these data, or more extreme data, if the null is true. The p value is computed on the basis that the null hypothesis is true and therefore it cannot give any probability of it being more or less true. The proper interpretation in the example should be: considering no association exists between a radiolucent line in Zone 1 and the risk of acetabular loosening (the null hypothesis), there was only a 2% chance to observe the results of the study (or more extreme results).

Second, there is also a false impression that if trials are conducted with a controlled Type I error, say 5%, and adequate power, say 80%, then significant results almost always are corresponding to a true difference between the treatments compared. This is not the case, however. Imagine we test 1000 null hypotheses of no difference between experimental and control treatments. There is some evidence that the null only rarely is false, namely that only rarely the treatment under study is effective (either superior to a placebo or to the usual treatment) or that a factor under observation has some prognostic value [ 12 , 19 , 20 ]. Say that 10% of these 1000 null hypotheses are false and 90% are true [ 20 ]. Now if we conduct the tests at the aforementioned levels of α = 5% and power = 80%, 36% of significant p values will not report true differences between treatments (Fig.  2 , Scenario 1, 64% true-positive and 36% false-positive significant results; Fig.  3 , Point A). Moreover, in certain contexts, the power of most studies does not exceed 50% [ 1 , 7 ]; in that case, almost ½ of significant p values would not report true differences [ 20 ] (Fig.  3 , Point B).

An external file that holds a picture, illustration, etc.
Object name is 11999_2009_1164_Fig2_HTML.jpg

The flowchart shows the classification tree for 1000 theoretical null hypotheses with two different scenarios considering 10% false null hypotheses. Scenario 1 has a Type I error rate of 5% and a Type II error rate of 20% (power = 80%); Scenario 2 has a Type I error rate of 1% and a Type II error rate of 10% (power = 90%). The first node (A) separates the 900 true null hypotheses (no effect of treatment) from the 100 false null hypotheses (effect of treatment). For Scenario 1, the second node left (B) separates the 900 true null hypotheses (no treatment effect) at the 5% level: 855 tests are not significant (true-negative [TN] results) and 45 tests are falsely significant (false-positive [FP] results). The second node right (C) separates the 100 false null hypotheses (effect of treatment) at the 20% level (power = 80%): 20 tests are falsely not significant (false-negative [FN] results) and 80 tests are significant (true-positive [TP] results). The corresponding positive predictive value [TP/(TP + FP)] is 64%. The figures in parentheses at the second nodes right and left and at the bottom show the results for Scenario 2. The positive predictive value of significant results for Scenario 2 is 91%.

An external file that holds a picture, illustration, etc.
Object name is 11999_2009_1164_Fig3_HTML.jpg

This graph shows the effect of the Types I and II error rates and the proportion of false null hypotheses (true effect of treatment) on the positive predictive value of significant results. Three different levels of Types I and II error rates are depicted: α = 5% and β = 20% (power = 80%), α = 5% and β = 50% (power = 50%), and α = 1% and β = 10% (power = 90%). It can be seen, the higher the proportion of false null hypotheses tested, the better is the positive predictive value of significant results. Point A corresponds to a standard α = 5%, β = 20% (power = 80%), and 10% of false null hypotheses tested. The positive predictive value of a significant result is 64% (also see Fig.  2 ). Point B corresponds to the suspected reality α = 5%, β = 50% (power = 50%), and 10% of false null hypotheses tested. The positive predictive value of a significant result decreases to 53%. Point C corresponds to α = 5%, β = 20% (power = 80%), and 33% of false null hypotheses tested. The positive predictive value of a significant result increases to 89%. Finally, Point D corresponds to α = 1%, β = 10% (power = 90%), and 10% of false null hypotheses tested. The positive predictive value of a significant result increases to 91%. At the extreme, if all null hypotheses tested are true (no effect of treatment), regardless of α and β, the positive predictive value of a significant result is 0.

Implications for Research

Fisher, who designed studies for agricultural field experiments, insisted “a scientific fact should be regarded as experimentally established only if a properly designed experiment rarely fails to give this level of significance” [ 5 ]. There are three issues that a researcher should consider when conducting, or when assessing the report of, a study (Table  3 ).

Table 3

Implications for research

First, the relevance of the hypothesis tested is paramount to the solidity of the conclusion inferred. The proportion of false null hypotheses tested has a strong effect on the predictive value of significant results. For instance, say we shift from a presumed 10% of null hypotheses tested being false to a reasonable 33% (ie, from 10% of treatments tested effective to 1/3 of treatments tested effective), then the positive predictive value of significant results improves from 64% to 89% (Fig.  3 , Point C). Just as a building cannot be expected to have more resistance to environmental challenges than its own foundation, a study nonetheless will fail regardless of its design, materials, and statistical analysis if the hypothesis tested is not sound. The danger of testing irrelevant or trivial hypotheses is that, owing to chance only, a small proportion of them eventually will wrongly reject the null and lead to the conclusion that Treatment A is superior to Treatment B or that a variable is associated with an outcome when it is not. Given that positive results are more likely to be reported than negative ones, a misleading impression may arise from the literature that a given treatment is effective when it is not and it may take numerous studies and a long time to invalidate this incorrect evidence. The requirement to register trials before the first patient is included may prove to be an important means to deter this issue. For instance, by 1981, 246 factors had been reported [ 12 ] as potentially predictive of cardiovascular disease, with many having little or no relevance at all, such as certain fingerprints patterns, slow beard growth, decreased sense of enjoyment, garlic consumption, etc. More than 25 years later, only the following few are considered clinically relevant in assessing individual risk: age, gender, smoking status, systolic blood pressure, ratio of total cholesterol to high-density lipoprotein, body mass index, family history of coronary heart disease in first-degree relatives younger than 60 years, area measure of deprivation, and existing treatment with antihypertensive agent [ 19 ]. Therefore it is of prime importance that researchers provide the a priori scientific background for testing a hypothesis at the time of planning the study, and when reporting the findings, so that peers may adequately assess the relevance of the research. For instance, with respect to the first example given, we may hypothesize that the presence of a radiolucent line observed in Zone 1 on the postoperative radiograph is a sign of a gap between cement and bone that will favor micromotion and facilitate the passage of polyethylene wear particles, both of which will favor eventual bone resorption and loosening [ 16 , 18 ]. An important endorsement exists when other studies also have reported the association [ 8 , 11 , 14 ].

Second, it is essential to plan and conduct studies that limit the biases so that the outcome rightfully may be attributed to the effect under observation of the study. The difference observed at the end of an experiment between two treatments is the sum of the effect of chance, of the treatment or characteristic studied, and of other confounding factors or biases. Therefore, it is essential to minimize the effect of confounding factors by adequately planning and conducting the study so we know the difference observed can be inferred to be the treatment studied, considering we are willing to reject the effect of chance (when the p value or equivalently the test statistic engages us to do so). Randomization, when adequate, for example, when comparing the 1-month HHS after miniincision and standard incision hip arthroplasty, is the preferred experimental design to control on average known and unknown confounding factors. The same principles should apply to other experimental designs. For instance, owing to the rare and late occurrence of certain events, a retrospective study rather than a prospective study is preferable to judge the association between the existence of a radiolucent line in Zone 1 on the postoperative radiograph in cemented cups and the risk of acetabular loosening. Nonetheless researchers should ensure there is no systematic difference regarding all known confounding factors between patients who have a radiolucent line in Zone 1 and those who do not. For instance, they should retrieve both groups over the same period of time and the acetabular components used and patients under study should be the same in both groups. If the types of acetabular components differ markedly between groups, the researcher will not be able to say whether the difference observed in aseptic loosening between groups is attributable to the existence of a radiolucent line in Zone 1 or to differences in design between acetabular components.

Last, choosing adequate levels of Type I and Type II errors, or alternatively the level of significance for the p value, may improve the reliance we can have in purported significant results (Figs.  2 , ​ ,3). 3 ). Decreasing the α level will proportionally decrease the number of false-positive findings and subsequently improve the positive predictive value of significant results. Increasing the power of studies will correspondingly increase the number of true-positive findings and also improve the positive predictive value of significant results. For example, if we shift from a Type I error rate of 5% and power of 80% to a Type I error rate of 1% and power of 90%, the positive predictive value of a significant result increases from 64% to 91% (Fig.  2 , Scenario 2; Fig.  3 , Point D). Sample size can be used as a lever to control for Types I and II error levels [ 2 ]. However, a strong statistical association, p values, or test statistics never imply any causal effect. The causal effect is built on, study after study, little by little. Therefore, replication of the experiment by others is crucial before accepting any hypothesis. To replicate an experiment, the methods used must be described sufficiently so that the study can be replicated by other informed investigators.

The p value and the theory of hypothesis testing are useful tools that help doctors conduct research. They are helpful for planning an experiment, interpreting the results observed, and reporting findings to peers. However, it is paramount researchers understand precisely what these tools mean and do not mean so that eventually they will not be blinded by the irrelevance of some statistical value in front of important medical reasoning.

Each author certifies that he or she has no commercial associations (eg, consultancies, stock ownership, equity interest, patent/licensing arrangements, etc) that might pose a conflict of interest in connection with the submitted article.

Statology

Statistics Made Easy

How to Write Hypothesis Test Conclusions (With Examples)

A   hypothesis test is used to test whether or not some hypothesis about a population parameter is true.

To perform a hypothesis test in the real world, researchers obtain a random sample from the population and perform a hypothesis test on the sample data, using a null and alternative hypothesis:

  • Null Hypothesis (H 0 ): The sample data occurs purely from chance.
  • Alternative Hypothesis (H A ): The sample data is influenced by some non-random cause.

If the p-value of the hypothesis test is less than some significance level (e.g. α = .05), then we reject the null hypothesis .

Otherwise, if the p-value is not less than some significance level then we fail to reject the null hypothesis .

When writing the conclusion of a hypothesis test, we typically include:

  • Whether we reject or fail to reject the null hypothesis.
  • The significance level.
  • A short explanation in the context of the hypothesis test.

For example, we would write:

We reject the null hypothesis at the 5% significance level.   There is sufficient evidence to support the claim that…

Or, we would write:

We fail to reject the null hypothesis at the 5% significance level.   There is not sufficient evidence to support the claim that…

The following examples show how to write a hypothesis test conclusion in both scenarios.

Example 1: Reject the Null Hypothesis Conclusion

Suppose a biologist believes that a certain fertilizer will cause plants to grow more during a one-month period than they normally do, which is currently 20 inches. To test this, she applies the fertilizer to each of the plants in her laboratory for one month.

She then performs a hypothesis test at a 5% significance level using the following hypotheses:

  • H 0 : μ = 20 inches (the fertilizer will have no effect on the mean plant growth)
  • H A : μ > 20 inches (the fertilizer will cause mean plant growth to increase)

Suppose the p-value of the test turns out to be 0.002.

Here is how she would report the results of the hypothesis test:

We reject the null hypothesis at the 5% significance level.   There is sufficient evidence to support the claim that this particular fertilizer causes plants to grow more during a one-month period than they normally do.

Example 2: Fail to Reject the Null Hypothesis Conclusion

Suppose the manager of a manufacturing plant wants to test whether or not some new method changes the number of defective widgets produced per month, which is currently 250. To test this, he measures the mean number of defective widgets produced before and after using the new method for one month.

He performs a hypothesis test at a 10% significance level using the following hypotheses:

  • H 0 : μ after = μ before (the mean number of defective widgets is the same before and after using the new method)
  • H A : μ after ≠ μ before (the mean number of defective widgets produced is different before and after using the new method)

Suppose the p-value of the test turns out to be 0.27.

Here is how he would report the results of the hypothesis test:

We fail to reject the null hypothesis at the 10% significance level.   There is not sufficient evidence to support the claim that the new method leads to a change in the number of defective widgets produced per month.

Additional Resources

The following tutorials provide additional information about hypothesis testing:

Introduction to Hypothesis Testing 4 Examples of Hypothesis Testing in Real Life How to Write a Null Hypothesis

' src=

Published by Zach

Leave a reply cancel reply.

Your email address will not be published. Required fields are marked *

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

  • Knowledge Base
  • Null and Alternative Hypotheses | Definitions & Examples

Null & Alternative Hypotheses | Definitions, Templates & Examples

Published on May 6, 2022 by Shaun Turney . Revised on June 22, 2023.

The null and alternative hypotheses are two competing claims that researchers weigh evidence for and against using a statistical test :

  • Null hypothesis ( H 0 ): There’s no effect in the population .
  • Alternative hypothesis ( H a or H 1 ) : There’s an effect in the population.

Table of contents

Answering your research question with hypotheses, what is a null hypothesis, what is an alternative hypothesis, similarities and differences between null and alternative hypotheses, how to write null and alternative hypotheses, other interesting articles, frequently asked questions.

The null and alternative hypotheses offer competing answers to your research question . When the research question asks “Does the independent variable affect the dependent variable?”:

  • The null hypothesis ( H 0 ) answers “No, there’s no effect in the population.”
  • The alternative hypothesis ( H a ) answers “Yes, there is an effect in the population.”

The null and alternative are always claims about the population. That’s because the goal of hypothesis testing is to make inferences about a population based on a sample . Often, we infer whether there’s an effect in the population by looking at differences between groups or relationships between variables in the sample. It’s critical for your research to write strong hypotheses .

You can use a statistical test to decide whether the evidence favors the null or alternative hypothesis. Each type of statistical test comes with a specific way of phrasing the null and alternative hypothesis. However, the hypotheses can also be phrased in a general way that applies to any test.

Receive feedback on language, structure, and formatting

Professional editors proofread and edit your paper by focusing on:

  • Academic style
  • Vague sentences
  • Style consistency

See an example

similarities between hypothesis and conclusion

The null hypothesis is the claim that there’s no effect in the population.

If the sample provides enough evidence against the claim that there’s no effect in the population ( p ≤ α), then we can reject the null hypothesis . Otherwise, we fail to reject the null hypothesis.

Although “fail to reject” may sound awkward, it’s the only wording that statisticians accept . Be careful not to say you “prove” or “accept” the null hypothesis.

Null hypotheses often include phrases such as “no effect,” “no difference,” or “no relationship.” When written in mathematical terms, they always include an equality (usually =, but sometimes ≥ or ≤).

You can never know with complete certainty whether there is an effect in the population. Some percentage of the time, your inference about the population will be incorrect. When you incorrectly reject the null hypothesis, it’s called a type I error . When you incorrectly fail to reject it, it’s a type II error.

Examples of null hypotheses

The table below gives examples of research questions and null hypotheses. There’s always more than one way to answer a research question, but these null hypotheses can help you get started.

*Note that some researchers prefer to always write the null hypothesis in terms of “no effect” and “=”. It would be fine to say that daily meditation has no effect on the incidence of depression and p 1 = p 2 .

The alternative hypothesis ( H a ) is the other answer to your research question . It claims that there’s an effect in the population.

Often, your alternative hypothesis is the same as your research hypothesis. In other words, it’s the claim that you expect or hope will be true.

The alternative hypothesis is the complement to the null hypothesis. Null and alternative hypotheses are exhaustive, meaning that together they cover every possible outcome. They are also mutually exclusive, meaning that only one can be true at a time.

Alternative hypotheses often include phrases such as “an effect,” “a difference,” or “a relationship.” When alternative hypotheses are written in mathematical terms, they always include an inequality (usually ≠, but sometimes < or >). As with null hypotheses, there are many acceptable ways to phrase an alternative hypothesis.

Examples of alternative hypotheses

The table below gives examples of research questions and alternative hypotheses to help you get started with formulating your own.

Null and alternative hypotheses are similar in some ways:

  • They’re both answers to the research question.
  • They both make claims about the population.
  • They’re both evaluated by statistical tests.

However, there are important differences between the two types of hypotheses, summarized in the following table.

Here's why students love Scribbr's proofreading services

Discover proofreading & editing

To help you write your hypotheses, you can use the template sentences below. If you know which statistical test you’re going to use, you can use the test-specific template sentences. Otherwise, you can use the general template sentences.

General template sentences

The only thing you need to know to use these general template sentences are your dependent and independent variables. To write your research question, null hypothesis, and alternative hypothesis, fill in the following sentences with your variables:

Does independent variable affect dependent variable ?

  • Null hypothesis ( H 0 ): Independent variable does not affect dependent variable.
  • Alternative hypothesis ( H a ): Independent variable affects dependent variable.

Test-specific template sentences

Once you know the statistical test you’ll be using, you can write your hypotheses in a more precise and mathematical way specific to the test you chose. The table below provides template sentences for common statistical tests.

Note: The template sentences above assume that you’re performing one-tailed tests . One-tailed tests are appropriate for most studies.

If you want to know more about statistics , methodology , or research bias , make sure to check out some of our other articles with explanations and examples.

  • Normal distribution
  • Descriptive statistics
  • Measures of central tendency
  • Correlation coefficient

Methodology

  • Cluster sampling
  • Stratified sampling
  • Types of interviews
  • Cohort study
  • Thematic analysis

Research bias

  • Implicit bias
  • Cognitive bias
  • Survivorship bias
  • Availability heuristic
  • Nonresponse bias
  • Regression to the mean

Hypothesis testing is a formal procedure for investigating our ideas about the world using statistics. It is used by scientists to test specific predictions, called hypotheses , by calculating how likely it is that a pattern or relationship between variables could have arisen by chance.

Null and alternative hypotheses are used in statistical hypothesis testing . The null hypothesis of a test always predicts no effect or no relationship between variables, while the alternative hypothesis states your research prediction of an effect or relationship.

The null hypothesis is often abbreviated as H 0 . When the null hypothesis is written using mathematical symbols, it always includes an equality symbol (usually =, but sometimes ≥ or ≤).

The alternative hypothesis is often abbreviated as H a or H 1 . When the alternative hypothesis is written using mathematical symbols, it always includes an inequality symbol (usually ≠, but sometimes < or >).

A research hypothesis is your proposed answer to your research question. The research hypothesis usually includes an explanation (“ x affects y because …”).

A statistical hypothesis, on the other hand, is a mathematical statement about a population parameter. Statistical hypotheses always come in pairs: the null and alternative hypotheses . In a well-designed study , the statistical hypotheses correspond logically to the research hypothesis.

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.

Turney, S. (2023, June 22). Null & Alternative Hypotheses | Definitions, Templates & Examples. Scribbr. Retrieved February 29, 2024, from https://www.scribbr.com/statistics/null-and-alternative-hypotheses/

Is this article helpful?

Shaun Turney

Shaun Turney

Other students also liked, inferential statistics | an easy introduction & examples, hypothesis testing | a step-by-step guide with easy examples, type i & type ii errors | differences, examples, visualizations, what is your plagiarism score.

  • Link to facebook
  • Link to linkedin
  • Link to twitter
  • Link to youtube
  • Writing Tips

Are Research Questions and Hypotheses the Same?

Are Research Questions and Hypotheses the Same?

3-minute read

  • 23rd March 2022

A researcher might assume that research questions and hypotheses are identical concepts and should be treated equally in their approach to research. Are they really the same? Let’s find out!

While both are powerful research instruments that are determined in the initial experimental phase and help to guide the research planning and processes, the similarities end there. Read on to discover the differences between these two terms and understand how to accurately apply them in your research paper .

A hypothesis , specifically in academic research, is an educated guess or assumption about an expected relationship or occurrence that can be tested to determine if it’s correct. Essentially, you’ll make a statement —almost like a prediction—about a specific event or a relationship between two or more variables. You’ll then perform in-depth research into the topic and conduct experiments and tests to prove if the statement is true. A hypothesis is far more structured and requires you to have a fair amount of existing knowledge about your chosen topic. Using what you already know, you’ll follow a predetermined plan to prove or disprove your initial assumption.

Research Question

A research question is simply an unanswered question the researcher has about a topic that intrigues them or a query about the world or life in general. You’ll pose this question at the start of your research paper. You’ll then accumulate as much information as possible to determine a reasonable answer. Alternatively, with adequate proof, you can explain why the question should be left as is—unanswered. As you can see, research questions offer a more flexible approach since it’s essentially an investigation to appease your curiosity as a researcher. You don’t have to arrive at any specific answer, but your research should provide your readers with enough knowledge to draw their own conclusions.

At a Glance

●  Hypotheses are used in mathematics, engineering, and every branch of science.

●  Research questions are typically used when researching less calculable fields like literature, education, and sociology.

Find this useful?

Subscribe to our newsletter and get writing tips from our editors straight to your inbox.

●  Prior knowledge is required to form a hypothesis, following a very structured approach to arrive at a definite answer—was the assumption correct or not?

●  Researchers have more leniency when compiling papers based on research questions since the nature of the questions is more open-ended.

●  For a hypothesis, your conclusion should state whether you were able to prove your hypothesis to be true or not based on the results of your research and testing.

●  With a research question, you’ll use the conclusion section to summarize your answer based on your personal findings.

Now that you understand the correct way to incorporate these terms in your research papers, consider submitting a free sample to us ! Our proofreaders are available 24/7 to help ensure that your hard work translates into a concise, error-free, and effective academic achievement.

Share this article:

Post A New Comment

Got content that needs a quick turnaround? Let us polish your work. Explore our editorial business services.

The 7 best market research tools in 2024.

Market research is the backbone of successful marketing strategies. To gain a competitive edge, businesses...

4-minute read

Google Patents: Tutorial and Guide

Google Patents is a valuable resource for anyone who wants to learn more about patents, whether...

How to Come Up With Newsletter Ideas

If used strategically, email can have a substantial impact on your business. In fact, according...

Free Online Peer Review Template

Having your writing peer-reviewed is a valuable process that can showcase the strengths and weaknesses...

How to Embed a Video in PowerPoint

Including a video in your PowerPoint presentation can make it more exciting and engaging. And...

What Is a Patent?

A patent is a form of intellectual property that restricts who can copy your invention....

Logo Harvard University

Make sure your writing is the best it can be with our expert English proofreading and editing.

Hypothesis, Model, Theory, and Law

Dorling Kindersley / Getty Images

  • Physics Laws, Concepts, and Principles
  • Quantum Physics
  • Important Physicists
  • Thermodynamics
  • Cosmology & Astrophysics
  • Weather & Climate

similarities between hypothesis and conclusion

  • M.S., Mathematics Education, Indiana University
  • B.A., Physics, Wabash College

In common usage, the words hypothesis, model, theory, and law have different interpretations and are at times used without precision, but in science they have very exact meanings.

Perhaps the most difficult and intriguing step is the development of a specific, testable hypothesis. A useful hypothesis enables predictions by applying deductive reasoning, often in the form of mathematical analysis. It is a limited statement regarding the cause and effect in a specific situation, which can be tested by experimentation and observation or by statistical analysis of the probabilities from the data obtained. The outcome of the test hypothesis should be currently unknown, so that the results can provide useful data regarding the validity of the hypothesis.

Sometimes a hypothesis is developed that must wait for new knowledge or technology to be testable. The concept of atoms was proposed by the ancient Greeks , who had no means of testing it. Centuries later, when more knowledge became available, the hypothesis gained support and was eventually accepted by the scientific community, though it has had to be amended many times over the year. Atoms are not indivisible, as the Greeks supposed.

A model is used for situations when it is known that the hypothesis has a limitation on its validity. The Bohr model of the atom , for example, depicts electrons circling the atomic nucleus in a fashion similar to planets in the solar system. This model is useful in determining the energies of the quantum states of the electron in the simple hydrogen atom, but it is by no means represents the true nature of the atom. Scientists (and science students) often use such idealized models  to get an initial grasp on analyzing complex situations.

Theory and Law

A scientific theory or law represents a hypothesis (or group of related hypotheses) which has been confirmed through repeated testing, almost always conducted over a span of many years. Generally, a theory is an explanation for a set of related phenomena, like the theory of evolution or the big bang theory . 

The word "law" is often invoked in reference to a specific mathematical equation that relates the different elements within a theory. Pascal's Law refers an equation that describes differences in pressure based on height. In the overall theory of universal gravitation developed by Sir Isaac Newton , the key equation that describes the gravitational attraction between two objects is called the law of gravity .

These days, physicists rarely apply the word "law" to their ideas. In part, this is because so many of the previous "laws of nature" were found to be not so much laws as guidelines, that work well within certain parameters but not within others.

Scientific Paradigms

Once a scientific theory is established, it is very hard to get the scientific community to discard it. In physics, the concept of ether as a medium for light wave transmission ran into serious opposition in the late 1800s, but it was not disregarded until the early 1900s, when Albert Einstein proposed alternate explanations for the wave nature of light that did not rely upon a medium for transmission.

The science philosopher Thomas Kuhn developed the term scientific paradigm to explain the working set of theories under which science operates. He did extensive work on the scientific revolutions that take place when one paradigm is overturned in favor of a new set of theories. His work suggests that the very nature of science changes when these paradigms are significantly different. The nature of physics prior to relativity and quantum mechanics is fundamentally different from that after their discovery, just as biology prior to Darwin’s Theory of Evolution is fundamentally different from the biology that followed it. The very nature of the inquiry changes.

One consequence of the scientific method is to try to maintain consistency in the inquiry when these revolutions occur and to avoid attempts to overthrow existing paradigms on ideological grounds.

Occam’s Razor

One principle of note in regards to the scientific method is Occam’s Razor (alternately spelled Ockham's Razor), which is named after the 14th century English logician and Franciscan friar William of Ockham. Occam did not create the concept—the work of Thomas Aquinas and even Aristotle referred to some form of it. The name was first attributed to him (to our knowledge) in the 1800s, indicating that he must have espoused the philosophy enough that his name became associated with it.

The Razor is often stated in Latin as:

entia non sunt multiplicanda praeter necessitatem
or, translated to English:
entities should not be multiplied beyond necessity

Occam's Razor indicates that the most simple explanation that fits the available data is the one which is preferable. Assuming that two hypotheses presented have equal predictive power, the one which makes the fewest assumptions and hypothetical entities takes precedence. This appeal to simplicity has been adopted by most of science, and is invoked in this popular quote by Albert Einstein:

Everything should be made as simple as possible, but not simpler.

It is significant to note that Occam's Razor does not prove that the simpler hypothesis is, indeed, the true explanation of how nature behaves. Scientific principles should be as simple as possible, but that's no proof that nature itself is simple.

However, it is generally the case that when a more complex system is at work there is some element of the evidence which doesn't fit the simpler hypothesis, so Occam's Razor is rarely wrong as it deals only with hypotheses of purely equal predictive power. The predictive power is more important than the simplicity.

Edited by Anne Marie Helmenstine, Ph.D.

  • Scientific Hypothesis, Model, Theory, and Law
  • Einstein's Theory of Relativity
  • What Is a Paradigm Shift?
  • Wave Particle Duality and How It Works
  • Scientific Method
  • Oversimplification and Exaggeration Fallacies
  • Kinetic Molecular Theory of Gases
  • Understanding Cosmology and Its Impact
  • The History of Gravity
  • Laws of Thermodynamics
  • The Copenhagen Interpretation of Quantum Mechanics
  • Tips on Winning the Debate on Evolution
  • Geological Thinking: Method of Multiple Working Hypotheses
  • The Life and Work of Albert Einstein
  • What Are the Elements of a Good Hypothesis?
  • Hard Determinism Explained

Alchem Learning

Similarities and Differences Between Hypothesis and Theory

In the realm of scientific inquiry, two terms that are often used interchangeably but hold distinct meanings are “hypothesis” and “theory.” Both play crucial roles in the scientific method, contributing to the understanding and advancement of knowledge. This article delves into the similarities and differences between these two fundamental scientific concepts.

Hypothesis: The Starting Point

A hypothesis is a proposed explanation for a phenomenon. It is an educated guess or a tentative solution to a problem based on existing knowledge. Scientists formulate hypotheses to guide their research and make predictions that can be tested through experimentation or observation.

Characteristics

  • Testability: A good hypothesis is testable, meaning it can be investigated through empirical methods.
  • Falsifiability: It should be possible to prove the hypothesis false through experimentation or observation.
  • Specificity: The hypothesis must be clear and specific, outlining the expected outcome of the experiment.

If plants receive more sunlight, then their growth rate will increase.

Theory: A Comprehensive Explanation

On the other hand, a theory is a well-substantiated explanation of some aspect of the natural world. Unlike a hypothesis, a theory has withstood extensive testing and scrutiny, providing a comprehensive framework for understanding a particular phenomenon.

  • Explanatory Power: Theories explain a wide range of phenomena and observations.
  • Predictive Capability: They can predict future observations and experiments accurately.
  • Consistency: The components of a theory are internally consistent and align with existing scientific knowledge.

The theory of evolution explains the biodiversity of life through the processes of natural selection and genetic variation.

Similarities

1. both guide scientific inquiry.

Both hypotheses and theories play integral roles in the scientific method, guiding researchers in the pursuit of knowledge. Hypotheses set the initial direction for experiments, while theories provide overarching frameworks.

2. Subject to Revision

Scientific knowledge is dynamic, and both hypotheses and theories are subject to revision based on new evidence. As more data becomes available, scientists may refine or even discard hypotheses and theories.

Differences

1. level of certainty.

The primary distinction lies in the level of certainty associated with each term. A hypothesis is a tentative explanation that requires testing, while a theory is a well-established explanation supported by a substantial body of evidence.

Hypotheses are narrow in scope, addressing specific questions or problems, while theories have a broader scope, encompassing a wide range of related phenomena.

In conclusion, hypotheses and theories are essential components of the scientific process, each serving distinct roles. Hypotheses initiate investigations, while theories provide robust explanations for observed phenomena. Recognizing the differences and similarities between these concepts is crucial for understanding how scientific knowledge evolves and progresses.

Related References:

  • Scientific Method – Wikipedia
  • Understanding Science – University of California Museum of Paleontology
  • The Difference Between Hypothesis and Theory – ThoughtCo

Share this:

Leave a reply cancel reply, discover more from alchem learning.

Subscribe now to keep reading and get access to the full archive.

Type your email…

Continue reading

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • My Account Login
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 01 March 2024

Testing the unique item hypothesis with phrasal verbs in Chinese–English translations of Lu Xun’s short stories: the perspective of translation directionality

  • Juhong Zhan   ORCID: orcid.org/0000-0001-8150-067X 1 &
  • Yue Jiang 1  

Humanities and Social Sciences Communications volume  11 , Article number:  344 ( 2024 ) Cite this article

1 Altmetric

Metrics details

  • Language and linguistics

The present study revisits the unique item hypothesis (UIH) from the perspective of translation directionality in the Chinese–English(C–E) language pair. Phrasal verb (PV) is used as the linguistic feature to investigate whether UIH holds true in C–E translations and whether translation directionality plays a role in the representation of unique items, based on a self-built parallel corpus of Lu Xun’s short stories and their English translations done by two L1 and two L2 translators, and a reference corpus of BNC short stories as the non-translated reference. It is found PVs are significantly over-represented in C–E translated texts when compared with English non-translated texts, and this overrepresentation is mainly attributed to the remarkable use of PVs by L1 translators; and there is a significant difference in the use of PVs by translators of different directionality, while no significant difference is found within the same direction. Additionally, L2 translators tend to use a limited range of PVs and prefer transparent PVs to semi-transparent and opaque ones. The results falsify the UIH in general and suggest that UIH is a conditional translation tendency constrained by translation directionality, or UIH is directionality-dependent. Gravitational pull model is used to analyze and explain the divergence between different translation directions.

Introduction

Unique item hypothesis.

The unique item hypothesis (UIH) is a translation universal tendency proposed by Tirkkonen-Condit ( 2002 ), claiming translations tend to contain fewer unique items than comparable non-translated texts. Unique items are defined as target-language-specific items that lack straightforward translation counterparts or equivalents in the source language (Tirkkonen-Condit 2004 ). Presumably, translators may ignore these items, as they are not likely to suggest themselves readily as one-to-one equivalents to any particular item in the source text (Tirkkonen-Condit 2002 ). Tirkkonen-Condit ( 2004 ) attributed the underrepresentation of unique items to translators’ tendency to rely on a literal approach when selecting lexical items, syntactic structures, and idiomatic expressions from their bilingual mental lexicon during the translation process. Therefore, the absence of an obvious linguistic trigger for these items in the source language could lead to their underrepresentation or less frequent use in the translated texts compared to non-translated texts.

Since its proposal, the unique items hypothesis (UIH) has garnered significant scholarly attention and has been the subject of many empirical investigations and tests. The evidence gathered from these studies presents a diverse picture. Tirkkonen-Condit ( 2004 ) conducted a study on Finnish translated texts from English and found that certain typical elements of Finnish were less frequently used compared to non-translated Finnish texts, providing empirical support for the UIH. Similarly, Cappelle ( 2012 ) examined the use of manner-of-motion verbs in translated and non-translated English texts. The study revealed that translations from French contained fewer manner-of-motion verbs than original English texts, given the verb-framed nature of the French language. In contrast, no such difference was observed between English translations from German and original English, as Germanic languages share a similar satellite-framed structure. These findings further bolstered the validity of the unique item hypothesis. Further evidence for the UIH was presented by Vilinsky ( 2012 ), who compared the frequencies of Spanish verbal periphrases in original Spanish texts and Spanish translations from English. The study revealed a lower frequency of Spanish verbal periphrases in translated texts, indicating a tendency among translators to avoid unique items. Similarly, Tello ( 2022 ) explored the translation of diminutives into Spanish using the COVALT corpus and found supporting evidence for the UIH. Translated texts exhibited lower frequencies of diminutives compared to non-translated texts, suggesting a deliberate avoidance by translators. The study also found that the use of diminutives varies depending on the source language, target language, and genre. In another investigation by Cappelle and Loock ( 2017 ), an underuse of phrasal verbs was observed in English translations from Romance languages compared to non-translated fictions in the British National Corpus. However, no significant difference in the use of phrasal verbs was found between non-translated English and English translated from Germanic languages. This discrepancy was attributed to source-language interference resulting from typological differences between the source and target languages. These findings underscore the influence of typological similarities or differences between language pairs on the use of unique items in translated texts, suggesting that the use of unique items in translated texts is source language dependent and influenced by typological similarities or differences between the language pair. These evidences consistently support the unique item hypothesis, i.e., translators tend to underuse linguistic structures that are not present in the source language.

However, findings from other research have challenged or falsified the unique item hypothesis. Hareide’s studies (Hareide 2017a ; Hareide 2017b ) on the Spanish gerund in translations from Norwegian found that the Spanish gerund was significantly over-represented. The study suggests that over-representation of target-language specific features may occur in translations, in contrast to the under-representation predicted by the UIH. The study attributed this over-representation to the normalization process, where translated texts tend to conform to the linguistic norms of the target language. Kenny and Satthachai’s study ( 2018 ) on the translation of passive voice from English into Thai also refuted the UIH. The study found that most English passives are translated into Thai using passive voice, indicating that unique items are not underrepresented in Thai translation.

To provide possible explanations for translation universal tendencies, Halverson ( 2003 , 2010 ) proposed the gravitational pull hypothesis (GPH) and developed it into a cognitive linguistic model ( 2017 ). The model depicts three cognitive forces influencing translators’ choice of language items (Hareide 2017b ; Halverson 2010 , 2017 ), namely, the gravitational pull caused by the highly salient representational elements in the source language (Halverson 2017 ), the magnetism exerted by the target language items with high salience/frequency (ibid.), the connectivity resulting from the strength of connectivity between elements in the source and target languages(ibid.). The interrelation and interplay of the three forces will result in the make-up of the translated language. Regarding unique items, the gravitational pull hypothesis suggests that both over- and under-representation of particular target-language items is possible, depending on the specific structure of the bilingual semantic network activated in any given instance. (Halverson 2010 ). Specifically, this model may predict an underrepresentation resulting from insufficient gravitational pull due to the lack of these items in the source language, an overrepresentation caused by magnetism due to their pervasiveness in the English target language, and an underrepresentation due to weak connectivity between the language pair. The ultimate outcome depends on which force outweighs the other.

To determine which force predominates the others, previous researches tested gravitational pull model in different contexts. Marco and Oster ( 2018 ) compared the use of diminutive suffixes in Catalan translated from German and English, the former having productive diminutive suffixes while the latter not. The major findings include that the force of connectivity overrules that of magnetism, and the counterbalance of the forces makes the outcome somewhat unpredictable. Marco ( 2021 ) examined the use of the Catalan modal verb caldre in translations from English and French into Catalan, using data from two sub-corpora of the COVALT corpus. The study found evidence to support hypotheses of under-representation of caldre in the English-Catalan sub-corpus and over-representation in the French-Catalan sub-corpus. The study additionally revealed that the over-representation of caldre was significantly influenced by its strong connectivity with corresponding linguistic triggers in the French source texts. It concludes that connectivity may determine the over or under-representation of the language features in translations. Lefera and De Sutterb ( 2022 ) used Halverson’s gravitational pull hypothesis to explain the interpretation and translation of concatenated nouns in mediated European Parliament discourse and support the applicability of GPH in this specific domain. They found that the forces of gravitational pull, magnetism, and connectivity all play a role in the translation and interpretation of concatenated nouns and suggested the gravitational pull model is a useful tool for understanding the complex interplay of forces that influence the translation and interpretation of specific linguistic features.

In summary, three cognitive forces were found waxing and waning, counterbalancing each other, and jointly shaping the translational language. However, no consistent conclusion was reached regarding which cognitive force predominates in the specific context.

Phrasal verbs

Unique items in a target language can be found at various linguistic levels, such as lexical, phraseological, syntactical, textual, collocational, or pragmatic (Tirkkonen-Condit 2002 ). The present study is interested in the use of phrasal verbs at the phraseological level. A phrasal verb is usually defined as a structure that consists of a verb proper and a morphologically invariable particle that functions as a single unit both lexically and syntactically (Darwin and Gray 1999 ). The present study determines on phrasal verbs as the linguistic feature for several reasons.

Firstly, phrasal verbs are a typical phenomenon of the English language and have always held a central place in English. Phrasal verbs occur more frequently than other linguistic features, such as the verb are , the determiners this or his , the negative not , the conjunction but , or the pronoun they (Gardner and Davies 2007 ).

Secondly, phrasal verbs exhibit the traits of unique items in the context of Chinese–English translation. According to Tirkkonen-Condit ( 2004 ) and Chesterman ( 2004 , 2011 ), unique items should semantically or pragmatically the same but structurally different in the language pair. Although in the Chinese language, there are examples of verb-particle structures like the motion verbs guolai (过来 - come over here) and guoqu (过去 - go over there), where particles lai (来) and qu (去) are added to the verb guo (过) to indicate direction. However, unlike in English, the particles in these Chinese words are generally considered inseparable from the verbs and are always treated as a single unit or word (Liao and Fukuya 2004 ). Moreover, English phrasal verbs can be replaced with a single verb (e.g., put off can be replaced with postpone ), whereas Chinese verb-particles cannot. Furthermore, English phrasal verbs and Chinese motion verbs differ significantly in their numbers. English has about 3,000 established phrasal verbs, which make up one-third of the English verb vocabulary (Li et al. 2003 ), while Chinese motion verbs are limited in number (Liao and Fukuya 2004 ).

Lastly, verb-particle structures rarely take on figurative meanings in Chinese as they often do in English (ibid). Phrasal verbs can be categorized into three types based on their semantic transparency: directional, aspectual, and figurative. Directional PVs are semantically transparent, in which both the verb and the particle retain their literal meanings, the particle often indicating geographical direction, e.g., stand up . Aspectual PVs are semantically semi-transparent, in which the verb has a literal meaning and the particle provides an aspectual meaning, e.g., read through . Figurative PVs are semantically opaque in which the verb and the particle have an idiomatic meaning, e.g., figure out (Wierszycka 2013 ; Riguel 2014 ).

What is worth notice is that the limited number of semantically transparent directional PVs like stand up , have equivalents of motion verbs like Zhanqilai (站起来) in Chinese, while many semantically semitransparent aspectual phrasal verbs like read through and semantically opaque figurative ones like figure out do not have direct equivalents in Chinese. Due to these distinct features of verb-particle structures between Chinese and English, English phrasal verb can be regarded as a unique item to test unique item hypothesis in C–E translation.

Regarding the use of phrasal verbs, studies on second language acquisition have highlighted significant challenges faced by L2 English language users, primarily due to the idiomaticity and polysemy of these constructions (Riguel 2014 ). Research has shown that second language learners of English often exhibit a preference for single-word verbs over phrasal verbs (Dagut and Laufer 1985 ; Laufer and Eliasson 1993 ). This inclination is observed across all proficiency levels, including the most advanced learners (Siyanova and Schmitt 2007 ), although an improvement in phrasal verb use has been noted from intermediate to advanced proficiency levels (Wei 2021 ). Scholars refer to this phenomenon as “avoidance behavior” (Kleinmann 1977 ), suggesting that learners consciously avoid phrasal verbs to minimize errors and maintain linguistic safety. Notably, this tendency is particularly observed in English learners whose first language lacks equivalent phrasal verb constructions (Riguel 2014 ; Liao and Fukuya 2004 ). Additionally, L2 learners tend to use a limited range of PVs and prefer transparent verbs to semi-transparent and opaque ones (Dagut and Laufer 1985 ; Wierszycka 2013 ). These findings highlight the perplexity of acquiring and using phrasal verbs by Chinese L2 English learners and spark our curiosity about the use of phrasal verbs by L2 translators. Given these, phrasal verbs may serve as an ideal linguistic feature to investigate unique item hypothesis in Chinese to English translation.

Directionality

Translation directionality can be understood in two ways. One is the direction between the source language (SL) and target language (TL), like Chinese to English as one direction and English to Chinese the other. The other understanding of directionality concerns whether translators are working from a foreign language into their mother language or vice versa (Beeby Lonsdale 1998 ), the one done from a foreign language into one’s mother tongue being native translation or L1 translation, otherwise non-native translation or L2 translation. This study adopts the second understanding.

The significance of directionality has been highlighted by the findings made in the studies of cognitive processes during translation. These studies typically utilize neurolinguistic and psycholinguistic approaches, such as eye-tracking, keylogging, and behavioral experiments together with recall interviews, to compare the cognitive processes involved in L1-L2 and L2-L1 translation and interpretation tasks. The results consistently demonstrate that translation directionality does have impact on the cognitive processing of translation (Pavlović and Jensen 2009 ; Chou et al. 2021 ; Tomczak and Whyatt 2022 ; Jia et al. 2023 ) and suggest the importance of considering directionality when investigating the cognitive processes of translation, although directionality may not impact translation quality as decisively as other factors like L2 proficiency, the broadness of the translator’s general knowledge and text type (Pokorn et al. 2020 ).

Despite the growing interests in cognitive translation and interpreting studies (Ferreira 2023 ), translation directionality is not a primary concern in descriptive translation studies. Although factors like text types, language pairs, language typology, translator style etc. are generally taken into consideration when researchers try to test various translation generalizations and explore the boundaries of their validity in the translated texts, directionality receives little attention. This neglect of translation directionality may stem from the conventional belief that translations away from one’s own language are not noteworthy unless when the difficulties are emphasized. Consequently, the predominance of L1 translations is often assumed, overshadowing L2 translations and neglecting the potential impact of directionality on the final translated products.

Few researchers have studied the impact of translation directionality on the translational texts. Zhan and Jiang ( 2023 ) found that the idiomaticity level of native translations of Lu Xun’s short stories is significantly higher than that in non-native translations by observing the use of phrasal verbs in the translational texts; Xu et al. ( 2021 ) found that translation directionality influences the representation of the emotions and thus influences the image reconstruction of in the target texts. Despite the valuable insights already gained, the role of translation directionality remains underexplored in the context of the translation universal hypothesis. Integrating an analysis of translation directionality into the study of translation universals is anticipated to provide pivotal insights, particularly in validating and expanding our understanding of prevalent translation tendencies, such as those observed in the UIH in the present study. This broadening of research scope holds significant promise for extending the boundary conditions of the UIH and illuminating the complex dynamics inherent in cognitive processes during translation.

Research gaps and questions

In summary, three significant research gaps have been identified. Firstly, testing of the Unique item hypothesis has been conducted mainly between Indo-European language pairs, and there is a lack of investigation into the UIH in the linguistically distant Chinese–English language pair. Secondly, despite the significance of translation directionality, there is a notable absence of research in descriptive translation studies that specifically explores how it influences the features of translational products. This gap limits the understanding of how translation directionality might support or refute such hypotheses of translation universals like the UIH in this study. Thirdly, in attempting to prove the validity of the gravitational pull model in accounting for the varying outcomes observed in the testing of the UIH, existing research has overlooked the potential impact of translation directionality on the cognitive semantic networks of translators, which in turn limits the insights into unique item hypothesis. Addressing these gaps can enhance the understanding of the unique item hypothesis, highlight the differences in cognitive processes of translators of different directionality, and contribute to the ongoing development of descriptive translation studies.

The present study is to examine the validity of the UIH in the Chinese–English language pair and to explore whether the representation of unique items differs between L1 and L2 translations. Specifically, we aim to test the UIH in Chinese–English translation by considering the directionality of translation, to determine whether UIH holds true in Chinese–English translation and whether translation directionality affects the use of unique items in the target texts.

The first question is whether phrasal verbs will be underrepresented in Chinese–English translations when compared with original English. Based on the UIH, we predict that phrasal verbs will be underrepresented in the Chinese–English (C–E) translated texts than non-translated texts. Considering translation directionality, we also seek to explore whether the use of phrasal verbs diverges between L1 and L2 translations. Building upon the findings in second language acquisition, we aim to investigate whether L2 translators with Chinese as their mother tongue may have a similar tendency to underuse PVs, particularly those semantically opaque ones, compared to L1 English translators. Accordingly, in what follows, the present study aims to seek answers to the questions below:

Are phrasal verbs underrepresented in C–E translations when compared with non-translated (original) English texts, respectively?

Does the overall frequency of phrasal verb use differ between L1 and L2 translations? If the answer is yes, we will proceed to question 3.

What specific differences do L1 and L2 translations demonstrate in the use of phrasal verbs, with reference to the original English?

The writer and the translators

To guarantee the representativeness and relevance of the parallel corpus, we controlled several key variables, including the reputation of both the author and the translators, the distinctiveness of the source texts’ style, and an equal number of translations in both directions. We finally determined on short stories by renowned Chinese author Lu Xun, translated by four distinguished L2 and L1 translators, proficient in both languages: Wang Jizhen and Yang Xianyi, Julia Lovell and William A. Lyell.

The present study chose Lu Xun’s short stories as the source texts for two primary considerations: their distinct casual and colloquial language style, and the abundance of available target texts. Lu Xun (1881–1936) was a pivotal figure in the New Culture Movement in early 20th-century China, particularly renowned for his groundbreaking contributions in the advocacy and utilization of vernacular Chinese. His departure from the use of highly codified wenyan (classical Chinese, 文言文) to the use of baihua (vernacular Chinese,白话文) in writing was meant to bridge the gap between literature and the common people, to break down the barriers that existed between the elite literary circles and the general populace, to better capture the realities of the society and resonate with a broader audience. Therefore, vernacular Chinese is depicted as dialectal, plain, unadorned and colloquial, close to the spoken Chinese used by ordinary people, contrasted with the “classical Chinese” used by the elites (Kullberg and Watson 2022 ). This vernacular style involves a high level of colloquiality and idiomaticity, characterized by the frequent use of dialects, idioms, and colloquialisms (Wang and Phil 2011 ). A phrasal verb is a typical feature of colloquial language and everyday English, and thus, English translations of Lu Xun’s works are assumed to characterize a typical use of phrasal verbs, to preserve the stylistic essence of the source texts. Furthermore, Lu Xun’s literary works have gained widespread attention worldwide and have been extensively translated by esteemed translators, contributing to their significance and relevance in translation studies.

Both the two L2 translators, Wang and Yang, were highly proficient bilingual translators with extensive experience. They received higher education in English-speaking countries and were immersed in English language environments for many years. As such, their translations can represent the pinnacle of Chinese-to-English L2 translations. Similarly, both the two L1 translators, Lovell and Lyell, gained immense recognition in translating Chinese literature into English. Their translations are believed to exemplify the highest quality of Chinese-to-English translational language.

It is noteworthy that Yang Xianyi collaborated with his wife, Gladys, in their translation endeavors: Yang Xianyi would craft the initial draft of the translation, while Gladys undertook the tasks of proofreading and typing (Yang 2003 ; Wang and Wang 2013 ). The translation approach used by the Yangs involved both partners initially reading the source material together. Subsequently, Xianyi Yang would create a preliminary translation draft. This draft would then be revised by Gladys Yang, typically undergoing two or three revisions (Yang 2002 ). Despite their differing views on translation, where Yang prioritized loyalty and adhered to literal translation, while Gladys sought creativity with a target culture orientation (Fu 2011 ), their collaboration was notably dominated by Yang. The resulting translations predominantly mirror Yang’s well-established translation habits, as asserted by Wang et al. ( 2020 ). Consequently, their translation is conventionally classified as L2 translation or indirect translation.

Materials and methodology

A parallel corpus was built composed of 10 Chinese short stories written by Lu Xun and their corresponding English versions translated both by four L1 and L2 translators.

We initially aimed to include all of 25 Lu Xun’s short stories from the collections “Nahan” (呐喊) and “Panghuang” (彷徨) as the source texts. However, Yang and Wang had translated only a subset of these stories, rather than the entire collection. To balance the corpus, we only used the ten source texts that were translated by all the four translators. The resulting corpus contains a sub-corpus of 84018 Chinese words and four sub-corpora of 261888 English words in total, as shown in Table 1 . The target texts were cleaned, tagged with CLAWS7 and aligned at the sentence level with the cleaned source texts.

For the original reference corpus, we randomly sampled 40 short stories from BNC sub-corpus of fiction, totaling 371,155 words (see Appendix ). BNC short stories were also cleaned and tagged with CLAWS7.

Combining quantitative and qualitative analysis, the research follows these steps:

Retrieving phrasal verbs

To facilitate the analysis of phrasal verbs in our study, we utilized Python programming to count and extract these linguistic structures. Specifically, we employed Python scripts to tally the occurrences of phrasal verbs in each text and extract them from both the L1 and L2 corpora. The identified phrasal verbs were subsequently subjected to lemmatization, allowing us to generate comprehensive lists of phrasal verbs for both L1 and L2 translations.

PVs under the CLAWS tagging system are annotated as VV0, VVD, VVG (including VVGK), VVI, VVN, and VVZ as lexical verbs followed by RP standing for adverbial prepositions. The tags, their explanations, and examples are shown in Table 2 for clarity. The number of intervening words between the lexical verb and the adverbial particle is set between 0 and 6 (Gardner and Davies 2007 ).

To ensure accuracy and double-check the results obtained through Python programming, we conducted additional searches using the Concordance function of AntConc 3.5.7w, specifically targeting the adverbial preposition tag (RP) in each text. This manual check allowed us to reconfirm the counts obtained through the Python program. It is important to note that, due to the inherent limitations of the tagging system, manual screening was necessary to eliminate some unwanted concordances, such as cases where the adverbial particle is not part of a phrasal verb (e.g., “all the way down to the floor,” where “down” functions as an adverbial particle but not part of a phrasal verb).

Performing statistical significance testing

After counting the occurrences of phrasal verbs in each text, a statistical analysis was performed to determine the presence of any significant differences in the frequencies of phrasal verbs. Our analysis focused on three comparisons: (1) between translated texts and non-translated texts, (2) between L1 and non-translated texts, between L2 and non-translated texts, and (3) between L1 and L2 translations.

Comparing the TTR, ARR, hapax legomena, and high-frequency PVs

TTR is a metric used to assess the level of variation or diversity in the usage of phrasal verbs within a text. A higher TTR indicates a greater variety of phrasal verb lemmas used, suggesting a more diverse usage of phrasal verbs. Conversely, a lower TTR suggests a more concentrated or repetitive use of phrasal verbs within the text. ARR is a measure of the concentration and repetitiveness of phrasal verb lemmas in a given text or corpus, reflecting the morphological concentration and lexical complexity of phrasal verbs. ARR is calculated by dividing the total number of PV tokens by the number of unique PV lemmas. A higher ARR value suggests a more concentrated and repetitive use of PVs, indicating a lower degree of diversification in the choice of PV lemmas. Conversely, a lower ARR value indicates a higher level of lexical diversity in the usage of phrasal verbs. Hapax legomena refers to phrasal verb forms that occur only once within a specific text or corpus (Kenny, 2001 ). It is considered an indicator of creativity, suggesting a unique and innovative expression. The use of hapax legomena can provide insights into the stylistic and linguistic creativity in the use of phrasal verbs in the translated texts.

The TTR, hapax, and the top twenty phrasal verbs were compared to uncover the more specific differences or similarities between L1 and L2 translations.

Making semantic analysis of the key PVs

Python programming was used to lemmatize and generate the PV lists of L1 and L2 corpus. Then the keyword (PV) analysis was conducted to identify the key phrasal verbs in different lists, based on which a further qualitative semantic analysis of key phrasal verbs used in L1 and L2 translations was performed.

Are phrasal verbs underrepresented in C–E translations when compared with non-translated (original) English texts?

As shown in Table 3 and Fig. 1 , BNC short stories contain an average of 12.9‰ of PVs, with the peak point at 25.4‰, and the valley point at 4.5‰. The proportion of PVs in the translated texts is averaged 15.8‰, with the peak at 25.5‰ and the valley at 8.7‰. Figure 1 is the standardized proportion of PVs per thousand words across texts in BNC, with the horizontal axis listing the 40 randomly sampled novels of BNC.

figure 1

Standardized proportion of PV (‰) across texts in BNC.

As the BNC corpus of short stories was limited in its size, to cross-validate the results, we proceeded to compare the data from Lu Xun’s translated texts with that drawn from another English fictional corpus, which was referenced from Rodríguez-Puente ( 2019 ). Through a chronological study of phrasal verbs from 1650 to 1990, Rodríguez-Puente found the use of phrasal verbs showed a steady increase in fiction over time. The standardized frequency of PVs per 1000 words in original English fiction was approximately 9.7(‰) between 1900–1950 and 13.3(‰) between 1950 and 1990. The average number of 13.3‰ is very close to the data we obtained from the corpus of the sampled BNC short stories in the present study, 12.9‰. With the PV frequency in translated texts averaged 15.8(‰), it is evident that phrasal verbs are generally overrepresented in C–E translations than that in the English original.

Statistical analysis of the standardized PV proportion (Table 4 ) showed that there was a significant difference in the use of PV between Lu Xun translational corpus and BNC ( P  < 0.001). We observed the standard deviation in BNC was 5.1, while that in translated texts was 4.6, with the former more discrete than the latter, suggesting that the PV use in original English is more varied across texts and than that in translations.

Thus, our answer to the first question is negative, i.e., phrasal verbs are not underrepresented in Chinese–English translated texts when compared with original English. This result is contrary to what the unique item hypothesis predicts as the tendency of underrepresentation of unique items in the target texts. On the contrary, our results suggest an over-representation of phrasal verbs in C–E translations, which runs counter to the aforementioned hypothesis.

Does the use of phrasal verbs significantly differ between L1 and L2 translations and original English, respectively?

What arouses our curiosity is whether L1 and L2 translations unanimously contain a higher proportion of phrasal verbs when compared with original texts. To further investigate the influence of translation directionality on the use of phrasal verbs, we compared the proportion of PVs between L1 translations and BNC, L2 translations and BNC, and L1 and L2 translations, respectively. It was found that L1 translations contained a significantly higher proportion of phrasal verbs compared to BNC short stories, with an average of 19.2‰ and 12.9‰, respectively ( P  < 0.001). However, L2 translations had a slightly lower proportion of phrasal verbs than BNC short stories, with an average of 12.3‰ and 12.9‰, respectively, and this underrepresentation is not statistically significant ( P  > 0.05). Furthermore, L1 translations contained a significantly higher proportion of phrasal verbs than L2 translations, with an average of 19.2‰ and 12.3‰, respectively, and the statistical difference is remarkably significant ( P  < 0.001) (see Table 4 ).

To test whether the use of PVs significantly differs within the same translation direction, another statistical analysis was carried out. The results, as presented in Table 5 , indicate that there is no significant difference in the use of PVs between the two translators within the same translation direction ( P  > 0.05).

Therefore, our answer to the second research question is that the use of phrasal verbs remarkably differs between L1 and L2 translations. On the one hand, all the L1 translated texts contain a significantly higher proportion of PVs than L2 translated texts (see Fig. 2 ). As is shown in Fig. 3 , Wang’s translations contain the lowest proportion of PVs (8.7‰), slightly lower than of Yang’s (8.8‰), while Lyell’s the highest (23.5‰), with the ranking order as Lyell>Lovell>Yang>Wang. These findings lend support to the effect of translation directionality on the use of phrasal verbs; namely, the use of phrasal verbs is apparently directionality dependent in C–E translations.

figure 2

PV frequency (‰) in translated texts in both directions.

figure 3

Proportion of PVs (‰) across different texts by each translator.

Even when narrowing down the search to the top three most commonly used particles up, out , and down (Cappelle and Loock 2017 ), the results are still consistent with what is previously found with all the particles. The use of PVs is ranked as L1 translations>BNC > L2 translations (see Table 6 ). L1 translations contain a remarkably higher proportion of adverbials up , out , and down than BNC short stories and L2 translations contain less proportion of up , out , and down than BNC short stories. Overall, phrasal verbs containing the three top particles are slightly over-represented in translations when compared with BNC short stories.

The findings contradict what the unique item hypothesis predicts, as phrasal verbs are slightly more prevalent in Chinese–English translations compared to their occurrence in English originals.

When translation direction is considered, it is found that the representation of PVs diverges between different directions, that is, phrasal verbs are slightly underrepresented in L2 translations but remarkably overrepresented in L1 translations. This observation suggests that while phrasal verbs may exhibit an overrepresentation in translations compared to original English texts, this overrepresentation is predominantly influenced by the behavior of L1 translators. Contrarily, L2 translators tend to slightly underuse phrasal verbs. This divergence indicates that the unique item hypothesis is not a universal tendency as it is claimed, but instead is constrained by translation directionality.

What specific differences do L1 translations and L2 translations demonstrate in the use of phrasal verbs, with reference to the original English?

Ttr, arr, hapax legomena, and high-frequency pvs.

To make an in-depth analysis of PV use in different directions, we ran a self-designed Python program to retrieve all the PV tokens, count the occurrences of the PV tokens, manually screen out the noisy items, lemmatize the tokens, and count the number of PV lemmas, and then output all these results to excel files.

The comparison of TTR, ARR, and Hapax between the translated and original English texts has yielded several findings (Table 7 ). Firstly, the TTR of the translated texts is lower than that of the BNC, with a value of 0.50 for translations compared to 0.62 for the BNC. This suggests that the translated texts contain a less diverse range of phrasal verbs compared to the original English texts. Secondly, the average repetition of each PV in the translated texts is higher than that in the BNC, with a value of 1.97 for translations compared to 1.59 for the BNC. This indicates a higher degree of repetitiveness and concentration in the usage of PVs in the translated texts. Thirdly, the proportion of PV hapax in the translated texts is much higher than that in the BNC. PV hapax accounts for 64% of all PVs in the translated texts, whereas the proportion of PV hapax is only 22% in the BNC. This indicates that the translated texts contain a much greater number of unique or rare PVs compared to the BNC, reflecting a higher level of creativity and distinctiveness in the use of phrasal verbs. These findings suggest that translated texts may be less diverse in their use of PVs, but they compensate for this by using some PVs more concentratedly and introducing a greater number of unique PVs.

Regarding translation directionality, it is found that the total counts of PVs in L1 translations are strikingly higher than that in L2 translations, the former being 2781 and the latter 1622; PV lemmas used by native translators are 1298 and 747 by non-natives. L1 translations contain almost twice as many distinct phrasal verbs as L2 translations, although the total L1 text length is only 11% more than that of L2 (see Table 1 ). Apparently, the numbers of both PV types and tokens in L1 translations are dramatically higher than those in L2 despite that they are translated from the same source texts. L1 translators use PVs much more valiantly and diversely than L2 translators. However, when examining lexical diversity as measured by TTR, no significant difference was observed between L1 and L2 translations, with the former being 0.50 and the latter 0.51. Similarly, ARR of PVs in L1 translations (2.01) is slightly higher than in L2 translations (1.98), but the difference is insignificant, suggesting a comparable repetition rate of PVs in both L1 and L2 translations. In terms of hapax legomena, L1 translators used 823 unique PVs, which is 1.7 times higher than the 481 used by L2 translators. However, the percentage of hapax legomena in both corpora is similar, around 63–64%. This indicates that hapax legomena account for more than half of the total PV lemmas in both L1 and L2 translations. The overall analysis indicates that L1 translators employ a more extensive range and a higher frequency of phrasal verbs compared to their L2 counterparts. However, when specific metrics are examined, such as PV lexical diversity, average repetition rates of each phrasal verb, and the percentage of hapax legomena, a closer similarity between L1 and L2 translations emerges. However, the PV lexical diversity, average repetition rates of each phrasal verb, and hapax percentage show more similarity between L1 and L2 translations. Despite these similarities in terms of type-token ratio (TTR), average repetition rate (AAR), and hapax proportion, the data suggests that L1 translators demonstrate a more diverse and varied use of phrasal verbs than L2 translators.

A comparison of the PV lists of L1 and L2 translations shows an overall similarity (Table 8 ). Specifically, the majority of the top 20 phrasal verbs are semantically transparent directional and semi-transparent aspectual phrasal verbs, with only turn out , make out, set out , and take on being semantically opaque in L1 translations and turn out and make up being semantically opaque in L2 translations. It is evident that both L1 and L2 translators use directional and aspectual phrasal verbs quite frequently, with L2 translators’ use even more remarkable than L1 translators.

It’s also found that the use of the top 20 PV lemmas takes up a substantial proportion of the total PV occurrences, accounting for about 1/5(21.5%) and 1/3(31.1%) of the total occurrences of PVs in L1 and L2 texts, respectively. The keyness of phrasal verbs used in L1 translations is lower than that in L2 translations, reconfirming that L1 translators have a more balanced and varied use of PVs than L2, while L2 translators rely more heavily on the high frequency PVs than L1 translators.

Keyword analysis of PVs

A keyword is defined as “a word which occurs with unusual frequency in a given text compared with a reference corpus of some kind” (Scott 1997 ). A key PV in the context of the present study is a PV that occurs in L1 or L2 translational corpus more often than in BNC. It is calculated by carrying out a statistical test that compares the PV frequency in L1 and L2 corpus against those in BNC as a reference. Keyness is the statistical significance of a keyword’s frequency in the present corpus, relative to BNC-fiction.

To limit the data to a manageable range, the top ten key PVs are chosen for analysis (Table 9 ). It is evident that the key PVs used in L2 translations demonstrate a higher level of keyness than those in L1 translations, indicating that L2 translators exhibit a pronounced preference for and concentration on particular high-frequency PVs.

A semantic comparison of the ten key phrasal verbs used in L2 and L1 translations shows an overall difference. Specifically, almost all the key PVs used by L2 translators are comprised of a dynamic action verb such as go, come, rush, walk , or look , along with commonly used particles, either semantically transparent or semi-transparent. By contrast, among the ten key PVs used by L1 translators, only come along and stare up are semantically transparent, whereas all the others, including work up , take back , figure out , head back , let out , scrub up , head off , and throw in , are idiomatic in the context. The observation suggests that L2 translators may rely more on the semantically explicit PVs, probably due to their priority of achieving clarity and accuracy in translations. L1 translators, being native speakers of the target language, presumably with a greater command of linguistic nuances, use phrasal verbs of a higher level of diversity and balance between semantically implicit and explicit ones.

Key PV analysis reveals a distinct preference among L2 translators for literal PVs over idiomatic ones, a tendency not as pronounced in L1 translators who exhibit a more balanced usage across different PV categories. In Chinese–English translations, certain items in the source language act as catalysts for the use of directional and aspectual phrasal verbs in the target language, as in the case of “出去” in Chinese triggering the English phrasal verb “go out” in translations. However, there appears to be no direct linguistic trigger in the source text for the use of idiomatic phrasal verbs, making these idiomatic PVs genuinely unique items.

The present study found that phrasal verbs are significantly overused in the translated texts in general compared to the non-translated texts. This result falsified the unique item hypothesis in general, but how does this happen? When translation directionality is considered, a significant divergence is observed between L1 and L2 translators. L1 translators significantly overuse phrasal verbs, while L2 translators slightly underuse them, resulting in an overrepresentation of phrasal verbs in translated texts primarily due to the significant overuse by L1 translators. While both L1 and L2 translators prefer to use semantically transparent phrasal verbs, L1 translators use more semantically opaque phrasal verbs than L2 translators.

L2 translators’ more limited and less frequent use of phrasal verbs and their favor for semantically transparent over semantically implicit phrasal verbs are in line with the tendency of L2 learners, but we think it insufficient to attribute these simply to translators’ strategic choice to minimize errors or ensure safety, given L2 translators’ language proficiency as professional bilinguals. Based on the gravitational pull hypothesis (Halverson 2017 ), it is highly possible that the use of phrasal verbs involves the translators’ cognitive process in which the translators may subconsciously choose to use or avoid phrasal verbs.

Gravitational pull model

Halverson’s gravitational pull hypothesis, initially proposed in 2003 and 2010 and later evolved into a cognitive linguistic model in 2017, offers insights into translation universals. This model outlines three key cognitive forces that shape translators’ choices of linguistic elements, as noted by Hareide ( 2013b ) and Halverson ( 2010 , 2017 ). These include the gravitational pull from highly salient features in the source language (Halverson 2017 ), the magnetism from salient or frequently used items in the target language, and the connectivity derived from the strength of links between elements in both the source and target languages. The combined effect and interaction of these forces ultimately determine the features of the translated texts. Our findings about the translation directionality add an important variable to this model, i.e., the translator. With reference to this model, L1 and L2 translators may be under different degrees of cognitive forces. For L2 translators, the gravitational pull is often stronger from the salient linguistic items of the source language because they are part of the source language culture. This indicates that while they may understand the source text well, the cognitive effort to translate idiomatic or culturally specific elements into the target language is greater. Their translations might be more literal and faithful to the source or less nuanced in terms of idiomatic usage, as the magnetism of their native language (which is now the source language) influences their cognitive processing. For L1 translators, the magnetism is towards the target language, which is their native tongue. They are deeply rooted in the cultural and linguistic nuances of the target language, causing them to produce translations that are more idiomatic and culturally resonant than L2 translators. This strong force of magnetism towards the native language allows L1 translators to navigate semantically complex constructs with greater ease and intuition.

In the specific case of idiomatic phrasal verbs, which are often semantically opaque and culturally loaded, these differences in the cognitive forces play a crucial role. L1 translators are more likely to use a higher frequency of these constructs due to the strong magnetism of phrasal verbs in the target language norms. L2 translators, without any gravitational pull of similar structures from their native language, might opt for more literal translations or may underuse such idiomatic expressions. These may help to explain why phrasal verbs are significantly overrepresented in L1 translations than the non-translated and L2 translated texts, and why phrasal verbs are slightly underrepresented in L2 translations.

For semantically transparent phrasal verbs, L1 and L2 translators may be similarly influenced by the force of connectivity and equally likely to select those correspondent and equivalent items in the Chinese–English language pair. Although L2 translators use phrasal verbs less frequently in general, they still prefer to use semantically transparent ones. As these types of phrasal verbs are perfectly equivalent in the Chinese–English language pair, all translators are supposed to experience the same forces of magnetism, gravitational pull, and connectivity. The corresponding items in the source and target languages especially exert a strong force of connectivity for translators, making them activated and readily available in translators’ minds, despite the translation directionality. The result confirms Halverson’s hypothesis (Halverson 2017 ) that the more established a link is between the source and target languages, the more likely it will be activated and used in translation, and vice versa”.

Traditionally, the gravitational pull model has focused on the influence of source and target languages’ salient features on translation choices. It considered factors like the saliency of linguistic elements and the strength of connections between the source and target language. However, this approach primarily views translation as a language-centered process, without explicitly considering the role of the translator, specifically, whether they are translating into their first language (L1) or second language (L2).

The present study broadens its scope and application boundaries by incorporating the crucial factor: the translation directionality, into the semantic network analysis. This expansion allows for a more comprehensive understanding of how the directionality of translation influences the decision-making process of translators, enhancing the model’s applicability in real-world scenarios.

Other possible explanations

The findings of the present study are not in line with what Cappelle and Loock ( 2017 ) found about the use of phrasal verbs in English translations done from Germanic and Romance languages. They found in translations done from Romance language, phrasal verbs are significantly underused when compared with non-translated texts in BNC; while no significant difference was found in the use of phrasal verbs between translations done from Germanic language and non-translated texts in BNC. Therefore, they propose that typological differences between source and target texts may result in a significant underuse of unique linguistic items, while typological similarities in the S-T language pair may result in no significant difference. In contrast, the present study didn’t find a similar tendency of significant underuse of phrasal verbs in the translated texts between the Chinese and English languages. Chinese belongs to the Sino-Tibetan language family, while English is part of the Indo-European family, making them far more distinct than the Romance and Germanic language pairs. It is possible that “typological difference shining through” (ibid.) may be more prominent when the language pair belongs to the same language family, but when the language pair is typologically more distant, like Chinese and English, translators may be more aware of the typological difference of the language pairs. This heightened awareness could lead them to emphasize the use of unique items typical of the target language, possibly to counterbalance the typological interference effect.

The discrepancy between L1 and L2 translations prompts an investigation into various influencing factors, including publication years. This inquiry is grounded in the historical evolution of phrasal verb usage in English, as illustrated in Fig. 4 . Such a trend suggests that the chronological progression of PV usage might impact the translation choices in different eras. Particularly in fiction, there has been a steady increase in the use of PV since the 1800s, with a more pronounced escalation post-1900, as reported by Rodríguez-Puente ( 2019 ). Consequently, this chronological evolution in the usage of PV could be a significant factor influencing the variations in different translations.

figure 4

Diachronic development of phrasal verb use in English originals (per 10,000 words). Note: Fig. 4 is referenced from Rodríguez-Puente ( 2019 ).

For instance, Wang’s translation of “ Ah Q and Others, Selected Stories of Lusin ” was published by Columbia University Press in 1941. In contrast, Yang’s “ Complete Stories of Lu Xun ” came out in 1981 through Indiana University Press in collaboration with Foreign Language Press. Later, Lyell’s “ Diary of a Madman and Other Stories ” was published by the University of Hawaii Press in 1999, and Lovell’s “ The Real Story of Ah-Q and Other Tales of China: The Complete Fiction of Lu Xun ” by Penguin Classics in 2010. The span of publication years for these works is as much as 40 years. Given this considerable temporal spread, the question arises: does the time difference in publication play a role in causing the divergence between L1 and L2 translations?

Given the chronological evolution of PV usage in English fiction, we may expect an order of PV use as Lovell (2010)>Lyell (1999)>Yang (1981)>Wang (1941), with Wang’s translations having the lowest and Lovell’s the highest frequency of PVs. And if the use of PVs increases steadily over time, we may expect a more remarkable difference between Wang and the other three translators, as there is forty years of time difference between Wang and the other three. However, the data reveals a different order (see Table 5 ): Lyell > Lovell > Yang > Wang, with Wang’s translations showing a slightly lower frequency of PVs (11.2‰) compared with Yang’s (12.7‰), but not significantly different. Lyell’s translations exhibit the highest frequency (19.2‰). This unexpected order challenges the assumption that the chronological evolution of PV use is the primary determinant of their frequency in translations. Instead, the data suggests that the time of publication may not play as decisive a role as previously thought. It points toward translation directionality as a more significant factor influencing the discrepancies in PV usage between L1 and L2 translations.

The unique style of the source texts may also be an important factor accounting for the results of the present study. The short stories by Lu Xun are written in vernacular Chinese and are rich in direct quotations and everyday conversations, marked by a colloquial, casual, and informal style, as noted by Zhan and Jiang ( 2017 ). Presumably, experienced translators may be well aware of this stylistic feature of the source texts and try to keep the original style in the target texts, as evidenced by the interview with Yang Xianyi (Qian, Almberg 2001 ). In other words, the translators’ sensitivity to Lu Xun’s distinctive narrative style and their conscientious efforts to reflect it in the target language could have overridden the expected patterns of PV usage as predicted by the UIH. Therefore, contrary to the UIH prediction of a significantly lower usage of phrasal verbs in translations when compared with the original English, L1 translated texts did not exhibit such a trend, and only L2 texts display a marginal underrepresentation. This finding underscores the complexity of translation as an art form, where linguistic choices are deeply intertwined with cognitive, cultural, and stylistic considerations, and where translators navigate between the source text’s idiosyncrasies and the target language’s norms to create a text that resonates with the source in both content and style.

This is the first study testing and investigating the unique item hypothesis from the perspective of translation directionality. It’s found L1 translators tend to normalize the target-language-specific features to make the translations sound native-like, resulting in the significant overuse and broader range and richness of the phrasal verbs; while L2 translators are comparatively more source-language-dependent, without the natural trigger of unique items in source language, they tend to slightly underuse the phrasal verbs. Despite both L1 and L2 translators showing a preference for semantically transparent phrasal verbs, it’s observed that L1 translators use more semantically opaque phrasal verbs compared to their L2 counterparts. The results have falsified the UIH in general and suggest that UIH is a conditional translation tendency constrained by translation directionality, or UIH is directionality-dependent.

The gravitational pull model is used to provide an explanation for the divergence in the representation of phrasal verbs between L1 and L2 translations. According to this model, three forces, i.e., gravitational pull, magnetism, and connectivity, interrelate and interact to shape translational texts. Both over- and under-representation of unique items is possible, depending on the specific structure of the bilingual semantic network activated (Halverson, 2010 ), and which force outweighs the other two. When the language pair and the translation task are both controlled, translation directionality is identified as an important factor influencing the translators’ cognitive processing. Specifically, L1 translators are under a stronger force of magnetism exerted by the target language, L2 translators are seemingly under more influence of the source language, and both the L1 and L2 translators are affected by the connectivity between the source and target texts. It is, therefore, proposed that the semantic network is not a closed system but with translation directionality as a vital variable that influences the interplay and the counteraction of the three cognitive forces.

Traditionally, research into the translation universal hypothesis has not taken translation directionality into account. The inclusion of translation directionality opens new avenues for future research. Upcoming studies could, for example, compare the translation outputs of L1 and L2 translators in various contexts to testify translation universals, investigate how translation directionality influences the handling of specific linguistic constructs, explore the psychological and cognitive aspects of translation from the perspective of L1 and L2 translators, etc.

Methodologically, the present study utilizes a source-text controlled corpus, which enables a finer granularity study and observation of the language phenomenon in the language pair, thus increasing the comparability of the data. This design significantly improves data comparability, making it particularly suitable for testing the translation universal hypothesis within the realm of corpus translation studies, which typically rely on large and balanced corpora, and thus can explore some boundary conditions for the applicability of translation universal tendencies.

However, even as a case study, the corpus size is not big enough yet, constrained by the controlled variables of the same source texts and the limited number of translators in each direction. Therefore, we only cautiously suggest considering the conditional factor of translation directionality when testing various translation universal tendencies and acknowledge that further academic justification and statistical evidence are required.

Data availability

The datasets generated during and/or analyzed during the current study are available from the corresponding author on request.

Beeby Lonsdale A (1998) Direction of Translation (directionality). In: Baker M (ed), Routledge encyclopedia of translation studies. Routledge, London/New York, p 63–67

Cappelle B (2012) English is less rich in manner-of-motion verbs when translated from French. Across Lang Cult 13(2):173–195. https://doi.org/10.1556/Acr.13.2012.2.3

Article   Google Scholar  

Cappelle B, Loock R (2017) Typological differences shining through: The case of phrasal verbs in translated English. In: Sutter et al. (eds) Empirical translation studies: new methodological and theoretical traditions. De Gruyter Mouton, Berlin/Boston, p 235–262. https://doi.org/10.1515/9783110459586-009

Chesterman A (2004) Beyond the particular. In: Mauranen A, Kujamaki P (eds) Translation universals: do they exist? Benjamins, Amsterdam, p 33–49

Chesterman A (2011) Reflections on the literal translation hypothesis. In: Alvstad C et al. (eds) Methods and strategies of process research: integrative approaches in translation studies. John Benjamins, Amsterdam/Philadelphia, p 23–35

Chou I, Liu KL, Zhao N (2021) Effects of directionality on interpreting performance: evidence from interpreting between Chinese and English by trainee interpreters. Front Psychol 12:781610. https://doi.org/10.3389/fpsyg.2021.781610

Article   PubMed   PubMed Central   Google Scholar  

Dagut M, Laufer B (1985) Avoidance of phrasal verbs: a case for contrastive analysis. Stud Second Lang Acquis 7(1):73–79. https://doi.org/10.1017/S0272263100005167

Darwin CM, Gray LS (1999) Going after the phrasal verb: an alternative approach to classification. TESOL Q 33(1):65–83. https://doi.org/10.2307/3588191

Ferreira A (2023) Directionality in cognitive translation and interpreting studies. In: Ferreira A, Schwieter JW (eds.) The Routledge handbook of translation, interpreting and bilingualism. Routledge, London

Fu WH (2011) Interpreting Gladys’ English translations under multiple cultural identities. Chin Translators J 6:16–20

Google Scholar  

Gardner D, Davies M (2007) Pointing out frequent phrasal verbs: a corpus‐based analysis. TESOL Q 41(2):339–359. https://doi.org/10.1002/j.1545-7249.2007.tb00062.x

Halverson SL (2003) The cognitive basis of translation universals. Target Int J Translation Stud 15(2):197–241. https://doi.org/10.1075/target.15.2.02hal

Halverson S (2010) Cognitive translation studies: development in theory and method. In: Shreve GM, Angelone E (eds.) Translation and cognition. John Benjamins, Amsterdam/Philadelphia, p 349–369

Halverson S (2017) Gravitational pull in translation. Testing a revised model. In: De Sutter G, Lefer MA, Delaere I (eds.) Empirical translation studies. De Gruyter Mouton, Berlin, pp 9-45

Hareide L (2017a) The translation of formal source-language lacunas. An empirical study of the over-representation of target-language specific features and the unique items hypothesis. In: Ji M et al. (eds) Corpus methodologies explained. an empirical approach to translation studies. Routledge, London/New York, p 137–187

Hareide L (2017b) Is there gravitational pull in translation? A corpus-based test of the gravitational pull hypothesis on the language pairs Norwegian-Spanish and English-Spanish. In: Ji M et al. (eds) Corpus methodologies explained. an empirical approach to translation studies. Routledge, London/New York, p 188–231

Jia J et al. (2023) Translation directionality and translator anxiety: Evidence from eye movements in L1-L2 translation. Front Psychol 14:1120140. https://doi.org/10.3389/fpsyg.2023.1120140

Pokorn KN et al. (2020) The influence of directionality on the quality of translation output in educational settings. Interpreter Translator Train 14(1):58–78. https://doi.org/10.1080/1750399X.2019.1594563

Kenny D, Satthachai M (2018) Explicitation, unique items and the translation of English passives in Thai legal texts. Meta 63(3):604–626. https://doi.org/10.7202/1060165ar

Kenny D (2001) Lexis and creativity in translation: a corpus-based approach, 1st edn. Routledge, London. https://doi.org/10.4324/9781315759968

Tomczak E, Whyatt B (2022) Directionality and lexical selection in professional translators: evidence from verbal fluency and translation tasks. Translation Interpreting 14(2):120–136. https://doi.org/10.12807/ti.114202.2022.a08

Kleinmann HH (1977) Avoidance behavior in adult second language acquisition. Lang Learn 27:93–107. https://doi.org/10.1111/j.1467-1770.1977.tb00294.x

Kullberg C, Watson D (2022) Vernaculars in an age of world literatures. Bloomsbury Academic, New York

Book   Google Scholar  

Laufer B, Eliasson S (1993) What causes avoidance in L2 learning: L1-L2 difference, L1-L2 similarity, or L2 complexity. Stud Second Lang Acquis 15(1):35–48. https://doi.org/10.1017/S0272263100011657

Lefera MA, De Sutterb G (2022) Using the Gravitational Pull Hypothesis to explain patterns in interpreting and translation: The case of concatenated nouns in mediated European Parliament discourse. In: Kajzer-Wietrzny M et al.(eds) Mediated discourse at the European Parliament: empirical investigations in translation and multilingual natural language processing, vol 19. Language Science Press, pp 133–159

Li W et al. (2003) An Expert Lexicon Approach to Identifying English Phrasal Verbs. In: Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics. Sapporo, Japan, p 513–520

Liao Y, Fukuya YJ (2004) Avoidance of phrasal verbs: the case of Chinese learners of English. Lang Learn 54(2):193–226. https://doi.org/10.1111/j.1467-9922.2004.00254.x

Marco J, Oster U (2018) The gravitational pull of diminutives in Catalan translated and non-translated texts. Using Corpora in Contrastive and Translation Studies Conference, 5th edn., Louvain-la-Neuve, Belgium

Marco J (2021) Testing the gravitational pull hypothesis on modal verbs expressing obligation and necessity in Catalan through the COVALT corpus. In: Bisiada M (ed) Empirical studies in translation and discourse. Language Science Press, Berlin, p 27–52

Pavlović N, Jensen K (2009) Eye tracking translation directionality. In: Pym A, Perekrestenko A (eds) Translation research projects 2. Intercultural Studies Group, Tarragona, p 93–109. http://isg.urv.es/publicity/isg/publications/trp_2_2009/index.htm

Qian DX, Almberg ES-P (2001) Interview with Yang Xianyi. Translation Rev 62(1):17–25. https://doi.org/10.1080/07374836.2001.10523795

Riguel E (2014) Phrasal verbs, “the scourge of the learner”. : 9th Lanc Univ Postgrad Conf Linguist Lang Teach 9:1–20

Rodríguez-Puente P (2019) The English Phrasal Verb, 1650-Present: history, stylistic drifts, and lexicalisation. Cambridge University Press, Cambridge

Scott M (1997) PC analysis of key words—and key key words. System 25(2):233–245. https://doi.org/10.1016/S0346-251X(97)00011-0

Siyanova A, Schmitt N (2007) Native and nonnative use of multi-word vs. one-word verbs. IRAL-Int Rev Appl Linguist Lang Teach 45(2):119–139. https://doi.org/10.1515/IRAL.2007.005

Tello I (2022) The translation of diminutives into Spanish: testing the unique items hypothesis with COVALT corpus. Book of Abstracts. Translation Transit 6:p190–p194

Tirkkonen-Condit S (2002) Translationese—a myth or an empirical fact? A study into the linguistic identifiability of translated language. Target. International Journal of Translation. Studies 14(2):207–220. https://doi.org/10.1075/target.14.2.02tir

Tirkkonen-Condit S (2004) Unique items—over-or under-represented in translated language? In: Mauranen A, Kujamäki P(eds). Translation universals: do they exist? John Benjamins, Amsterdam, p 177–184

Vilinsky B (2012) On the lower frequency of occurrence of Spanish verbal periphrases in translated texts as evidence for the unique items hypothesis. Across Lang Cult 13(2):197–210. https://doi.org/10.1556/Acr.13.2012.2.4

Wang BR, Li WR (2020) A comparative analysis of the translatorial habitus of Yang Xianyi and Gladys. Fudan J Foreign Lang Lit 01:141–146

MathSciNet   Google Scholar  

Wang BR, Phil M (2011) Lu Xun’s Fiction in English Translation: the early years. Dissertation, University of Hong Kong http://hdl.handle.net/10722/173974

Wang YC, Wang KF (2013) The development of translator’s working patterns in rendering Chinese fictions into English. Foreign Lang Lit 29(2):118–124

MathSciNet   CAS   Google Scholar  

Wei Y (2021) Use of English phrasal verbs of Chinese students across proficiency levels: a corpus-based analysis. Int J TESOL Stud 3(4):25–41. https://doi.org/10.46451/ijts.2021.12.03

Wierszycka J (2013) Phrasal verbs in learner English: a semantic approach. A study based POS tagged Spok Corpus learner Engl Res Corpus Linguist 1:81–93. https://ricl.aelinco.es/index.php/ricl/article/view/14

Xu ZH, Jiang Y, Zhan JH (2021) Direct and inverse translations of “Li Sao”: Emotion-related elements and reconstruction of Qu Yuan’s image. Foreign Lang Res 4:81–88. https://doi.org/10.13978/j.cnki.wyyj.2021.04.014

Yang XY (2002) White Tiger: An Autobiography of Yang Xianyi. The Chinese University Press, Hong Kong

Yang XY (2003) I Have Two Motherlands - Gladys and Her World. Guangxi Normal University Press, Guilin

Zhan JH, Jiang Y (2017) A study of the contractions in Chinese-English literary translation: a case study of Lu Xun’s novels. Foreign Lang Res 5:75–82. https://doi.org/10.13978/j.cnki.wyyj.2017.05.015

Zhan JH, Jiang Y (2023) A comparative study on the idiomaticity between native and non-native translations by comparing the use of phrasal verbs. J Xi’ Int Stud Univ 3:103–108. https://doi.org/10.16362/j.cnki.cn61-1457/h.2023.03.023

Download references

Acknowledgements

This work is supported by the Shaanxi Provincial Social Science Fund (Grant No. 2021K007).

Author information

Authors and affiliations.

School of Foreign Studies, Xi’an Jiaotong University, Xi’an, Shaanxi, China

Juhong Zhan & Yue Jiang

You can also search for this author in PubMed   Google Scholar

Contributions

All authors contributed to the design of the work. The first author implemented the work, interpreted the data and drafted the manuscript. The corresponding authors revised and proofread it. All authors approved the final version.

Corresponding author

Correspondence to Yue Jiang .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Ethical approval

Ethical approval was not required as the study did not involve human participants.

Informed consent

This article does not contain any studies with human participants performed by any of the authors. Lu Xun’s short stories, the different versions of translations, and the BNC fictional corpus belong to the public domain. Informed consent is thus not applicable in the context of our specific study.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Zhan, J., Jiang, Y. Testing the unique item hypothesis with phrasal verbs in Chinese–English translations of Lu Xun’s short stories: the perspective of translation directionality. Humanit Soc Sci Commun 11 , 344 (2024). https://doi.org/10.1057/s41599-024-02814-y

Download citation

Received : 12 August 2023

Accepted : 06 February 2024

Published : 01 March 2024

DOI : https://doi.org/10.1057/s41599-024-02814-y

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

similarities between hypothesis and conclusion

  • Anatomy & Physiology
  • Astrophysics
  • Earth Science
  • Environmental Science
  • Organic Chemistry
  • Precalculus
  • Trigonometry
  • English Grammar
  • U.S. History
  • World History

... and beyond

  • Socratic Meta
  • Featured Answers

Search icon

What is the difference between inference and hypothesis?

similarities between hypothesis and conclusion

Related questions

  • How can the scientific method be applied to everyday life?
  • What are some common mistakes students make with the scientific method?
  • What are hypotheses according to the scientific method?
  • What is a theory according to the scientific method?
  • Do scientists have to record all data precisely in order to follow the scientific method?
  • What is the goal of peer review in the scientific method?
  • Why is the scientific method important to follow?
  • How did Tycho Brahe and Kepler employ the scientific method?
  • Do all scientists use the scientific method?
  • Why should scientists provide an abstract for, or summary of their research?

Impact of this question

similarities between hypothesis and conclusion

IMAGES

  1. Null Hypothesis and Alternative Hypothesis

    similarities between hypothesis and conclusion

  2. 10 Difference between Hypothesis and Prediction with Comparison Table

    similarities between hypothesis and conclusion

  3. Hypothesis vs Theory|Difference between hypothesis and theory|Hypothesis and theory difference

    similarities between hypothesis and conclusion

  4. Similarities Between Hypothesis and Theory

    similarities between hypothesis and conclusion

  5. Identifying Hypothesis and Conclusion

    similarities between hypothesis and conclusion

  6. 🐈 Psychology hypothesis topics. 100+ Psychology Research Topics Ideas to Explore in 2021. 2022-11-05

    similarities between hypothesis and conclusion

VIDEO

  1. What is Hypothesis #hypothesis

  2. 1.5. Hypothesis statement

  3. Word Of The Day

  4. State your hypothesis! #shorts

  5. Forming the Conclusion of a Hypothesis Test

  6. HypothesisTesting

COMMENTS

  1. A Practical Guide to Writing Quantitative and Qualitative Research Questions and Hypotheses in Scholarly Articles

    Hypothesis-testing (Quantitative hypothesis-testing research) - Quantitative research uses deductive reasoning. - This involves the formation of a hypothesis, collection of data in the investigation of the problem, analysis and use of the data from the investigation, and drawing of conclusions to validate or nullify the hypotheses.

  2. PDF Conclusions

    you think about your conclusion. Begin with the "what" In a short paper—even a research paper—you don't need to provide an exhaustive summary as part of your conclusion. But you do need to make some kind of transition between your final body paragraph and your concluding paragraph. This may come in the form of a few sentences of summary.

  3. The Scientific Method

    CONCLUSION. The final step in the scientific method is the conclusion. This is a summary of the experiment's results, and how those results match up to your hypothesis. You have two options for your conclusions: based on your results, either: (1) YOU CAN REJECT the hypothesis, or (2) YOU CAN NOT REJECT the hypothesis. This is an important point!

  4. What is the difference between a hypothesis and a conclusion?

    Well, an hypothesis is something that is proposed.... And the mark of a good "hypothesis" is its "testability". That is there exist a few simple experiments whose results would confirm or deny the original hypothesis. And a conclusion is drawn AFTER the experiment is performed, and reports whether or not the results of the experiment supported ...

  5. Hypothesis vs Conclusion

    As nouns the difference between hypothesis and conclusion is that hypothesis is used loosely, a tentative conjecture explaining an observation, phenomenon or scientific problem that can be tested by further observation, investigation and/or experimentation. As a scientific term of art, see the attached quotation. Compare to theory, and quotation given there while conclusion is the end, finish ...

  6. How to Write Discussions and Conclusions

    Begin with a clear statement of the principal findings. This will reinforce the main take-away for the reader and set up the rest of the discussion. Explain why the outcomes of your study are important to the reader. Discuss the implications of your findings realistically based on previous literature, highlighting both the strengths and ...

  7. 6.6

    The conclusion drawn from a two-tailed confidence interval is usually the same as the conclusion drawn from a two-tailed hypothesis test. In other words, if the the 95% confidence interval contains the hypothesized parameter, then a hypothesis test at the 0.05 \(\alpha\) level will almost always fail to reject the null hypothesis.

  8. This is the Difference Between a Hypothesis and a Theory

    A hypothesis is an assumption made before any research has been done. It is formed so that it can be tested to see if it might be true. A theory is a principle formed to explain the things already shown in data. Because of the rigors of experiment and control, it is much more likely that a theory will be true than a hypothesis.

  9. How to Write a Thesis or Dissertation Conclusion

    Step 1: Answer your research question. Step 2: Summarize and reflect on your research. Step 3: Make future recommendations. Step 4: Emphasize your contributions to your field. Step 5: Wrap up your thesis or dissertation. Full conclusion example. Conclusion checklist. Other interesting articles.

  10. Writing a Research Paper Conclusion

    Table of contents. Step 1: Restate the problem. Step 2: Sum up the paper. Step 3: Discuss the implications. Research paper conclusion examples. Frequently asked questions about research paper conclusions.

  11. Analogy and Analogical Reasoning

    An analogy is a comparison between two objects, or systems of objects, that highlights respects in which they are thought to be similar.Analogical reasoning is any type of thinking that relies upon an analogy. An analogical argument is an explicit representation of a form of analogical reasoning that cites accepted similarities between two systems to support the conclusion that some further ...

  12. Theory vs. Hypothesis: Basics of the Scientific Method

    Theory vs. Hypothesis: Basics of the Scientific Method. Written by MasterClass. Last updated: Jun 7, 2021 • 2 min read. Though you may hear the terms "theory" and "hypothesis" used interchangeably, these two scientific terms have drastically different meanings in the world of science.

  13. Conjecture and hypothesis: The importance of reality checks

    Abstract. In origins of life research, it is important to understand the difference between conjecture and hypothesis. This commentary explores the difference and recommends alternative hypotheses as a way to advance our understanding of how life can begin on the Earth and other habitable planets. As an example of how this approach can be used ...

  14. P Value and the Theory of Hypothesis Testing: An Explanation for New

    If the test statistic falls into that critical region, the null hypothesis is rejected in favor of the alternative hypothesis. Despite similarities between the two, the p value and the theory of hypothesis testing are different theories that often are misunderstood and confused, leading researchers to improper conclusions.

  15. How to Write Hypothesis Test Conclusions (With Examples)

    Example 1: Reject the Null Hypothesis Conclusion. Suppose a biologist believes that a certain fertilizer will cause plants to grow more during a one-month period than they normally do, which is currently 20 inches. To test this, she applies the fertilizer to each of the plants in her laboratory for one month. She then performs a hypothesis test ...

  16. Null & Alternative Hypotheses

    A null hypothesis claims that there is no effect in the population, while an alternative hypothesis claims that there is an effect. ... Similarities and differences between null and alternative hypotheses. ... inferential statistics help you come to conclusions and make predictions based on your data. 580. Hypothesis Testing | A Step-by-Step ...

  17. Are Research Questions and Hypotheses the Same?

    Hypothesis. A hypothesis, specifically in academic research, is an educated guess or assumption about an expected relationship or occurrence that can be tested to determine if it's correct. Essentially, you'll make a statement —almost like a prediction—about a specific event or a relationship between two or more variables.

  18. Hypothesis, Model, Theory, and Law

    A scientific theory or law represents a hypothesis (or group of related hypotheses) which has been confirmed through repeated testing, almost always conducted over a span of many years. Generally, a theory is an explanation for a set of related phenomena, like the theory of evolution or the big bang theory . The word "law" is often invoked in ...

  19. The Relationship Between Hypothesis Testing and Confidence Intervals

    An example of a typical hypothesis test (two-tailed) where "p" is some parameter. First, we state our two kinds of hypothesis:. Null hypothesis (H0): The "status quo" or "known/accepted fact".States that there is no statistical significance between two variables and is usually what we are looking to disprove.

  20. Similarities and Differences Between Hypothesis and Theory

    A hypothesis is a tentative explanation that requires testing, while a theory is a well-established explanation supported by a substantial body of evidence. 2. Scope. Hypotheses are narrow in scope, addressing specific questions or problems, while theories have a broader scope, encompassing a wide range of related phenomena.

  21. What is similarities and difference between ...

    inferentiel statistic is a type of statistic, on statistic we have two types descriptive statistics and inferentiel statistics. the last one is used to draw conclusion from population. statistic testing is the way used to draw conclusion. to recap descriptive statistic to describe population using graphs and tables while inferentiel statistic used to draw conclusion by setting hypothesis and ...

  22. Gender-Similarities Hypothesis

    The gender similarities hypothesis, proposed by Hyde (2005), states that males and females are similar on most, but not all, psychological variables. Based on a meta-analysis of 46 meta-analyses of psychological gender differences, 30% of effect sizes were trivial in magnitude ( d between 0 and 0.10) and an additional 48% were small (0.10-0.35).

  23. Testing the unique item hypothesis with phrasal verbs in ...

    The present study revisits the unique item hypothesis (UIH) from the perspective of translation directionality in the Chinese-English(C-E) language pair. Phrasal verb (PV) is used as the ...

  24. What is the difference between inference and hypothesis?

    A hypothesis is the prediction about the outcome of an experiment. An inference is conclusion drawn based on observations and prior knowledge. Hypothesis: Made before an experiment. "If I do this (has to do with the independent variable), then this will happen (has to do with the dependent variable)." Inference: "Based on my observations (concrete, provable things found via the 5 senses), I ...