• Machine Learning Tutorial
  • Data Analysis Tutorial
  • Python - Data visualization tutorial
  • Machine Learning Projects
  • Machine Learning Interview Questions
  • Machine Learning Mathematics
  • Deep Learning Tutorial
  • Deep Learning Project
  • Deep Learning Interview Questions
  • Computer Vision Tutorial
  • Computer Vision Projects
  • NLP Project
  • NLP Interview Questions
  • Statistics with Python
  • 100 Days of Machine Learning
  • SEMMA Model
  • Azure Virtual Machine for Machine Learning
  • Orthogonal Projections
  • CNN | Introduction to Padding
  • "Hello World" Smart Contract in Remix-IDE
  • Brain storm optimization
  • Cubic spline Interpolation
  • The Ultimate Guide to Quantum Machine Learning - The next Big thing
  • Power BI - Timeseries, Aggregation, and Filters
  • ML | Naive Bayes Scratch Implementation using Python
  • Kullback-Leibler Divergence
  • Well posed learning problems
  • How L1 Regularization brings Sparsity`
  • Hierarchical clustering using Weka
  • FastText Working and Implementation
  • ML - Candidate Elimination Algorithm
  • Open AI GPT-3
  • Differentiate between Support Vector Machine and Logistic Regression
  • Introduction to Speech Separation Based On Fast ICA

Inductive Learning Algorithm

In this article, we will learn about Inductive Learning Algorithm which generally comes under the domain of Machine Learning.

What is Inductive Learning Algorithm?

Inductive Learning Algorithm (ILA) is an iterative and inductive machine learning algorithm that is used for generating a set of classification rules, which produces rules of the form “IF-THEN”, for a set of examples, producing rules at each iteration and appending to the set of rules.

There are basically two methods for knowledge extraction firstly from domain experts and then with machine learning. For a very large amount of data, the domain experts are not very useful and reliable. So we move towards the machine learning approach for this work. To use machine learning One method is to replicate the expert’s logic in the form of algorithms but this work is very tedious, time taking, and expensive. So we move towards the inductive algorithms which generate the strategy for performing a task and need not instruct separately at each step.

Why you should use Inductive Learning?

The ILA is a new algorithm that was needed even when other reinforcement learnings like ID3 and AQ were available.

  • The need was due to the pitfalls which were present in the previous algorithms, one of the major pitfalls was the lack of generalization of rules.
  • The ID3 and AQ used the decision tree production method which was too specific which were difficult to analyze and very slow to perform for basic short classification problems.
  • The decision tree-based algorithm was unable to work for a new problem if some attributes are missing.
  • The ILA uses the method of production of a general set of rules instead of decision trees , which overcomes the above problems

Basic Requirements to Apply Inductive Learning Algorithm

  • List the examples in the form of a table ‘T’ where each row corresponds to an example and each column contains an attribute value.
  • Create a set of m training examples, each example composed of k attributes and a class attribute with n possible decisions.
  • Create a rule set, R, having the initial value false.
  • Initially, all rows in the table are unmarked.

Necessary Steps for Implementation

  • Step 1: divide the table ‘T’ containing m examples into n sub-tables (t1, t2,…..tn). One table for each possible value of the class attribute. (repeat steps 2-8 for each sub-table)
  • Step 2: Initialize the attribute combination count ‘ j ‘ = 1.
  • Step 3: For the sub-table on which work is going on, divide the attribute list into distinct combinations, each combination with ‘j ‘ distinct attributes.
  • Step 4: For each combination of attributes, count the number of occurrences of attribute values that appear under the same combination of attributes in unmarked rows of the sub-table under consideration, and at the same time, not appears under the same combination of attributes of other sub-tables. Call the first combination with the maximum number of occurrences the max-combination ‘ MAX’.
  • Step 5: If ‘MAX’ == null, increase ‘ j ‘ by 1 and go to Step 3.
  • Step 6: Mark all rows of the sub-table where working, in which the values of ‘MAX’ appear, as classified.
  • Step 7: Add a rule (IF attribute = “XYZ” –> THEN decision is YES/ NO) to R whose left-hand side will have attribute names of the ‘MAX’ with their values separated by AND, and its right-hand side contains the decision attribute value associated with the sub-table.
  • Step 8: If all rows are marked as classified, then move on to process another sub-table and go to Step 2. Else, go to Step 4. If no sub-tables are available, exit with the set of rules obtained till then. 

An example showing the use of ILA suppose an example set having attributes Place type, weather, location, decision, and seven examples, our task is to generate a set of rules that under what condition is the decision.

Subset – 1

Subset – 2

  • At iteration 1 rows 3 & 4 column weather is selected and rows 3 & 4 are marked. the rule is added to R IF the weather is warm then a decision is yes. 
  • At iteration 2 row 1 column place type is selected and row 1 is marked. the rule is added to R IF the place type is hilly then the decision is yes. 
  • At iteration 3 row 2 column location is selected and row 2 is marked. the rule is added to R IF the location is Shimla then the decision is yes. 
  • At iteration 4 row 5&6 column location is selected and row 5&6 are marked. the rule is added to R IF the location is Mumbai then a decision is no. 
  • At iteration 5 row 7 column place type & the weather is selected and row 7 is marked. the rule is added to R IF the place type is beach AND the weather is windy then the decision is no. 

Finally, we get the rule set:- Rule Set

  • Rule 1: IF the weather is warm THEN the decision is yes.
  • Rule 2: IF the place type is hilly THEN the decision is yes.
  • Rule 3: IF the location is Shimla THEN the decision is yes.
  • Rule 4: IF the location is Mumbai THEN the decision is no.
  • Rule 5: IF the place type is beach AND the weather is windy THEN the decision is no.

Please Login to comment...

Similar reads.

author

  • Machine Learning

advertisewithusBannerImg

Improve your Coding Skills with Practice

 alt=

What kind of Experience do you want to share?

helpful professor logo

Inductive Learning: Examples, Definition, Pros, Cons

inductive learning vs deductive learning

Inductive learning is a teaching strategy where students discover operational principles by observing examples.

It is used in inquiry-based and project-based learning where the goal is to learn through observation rather than being ‘told’ the answers by the teacher.

It is consistent with a constructivist approach to learning as it holds that knowledge should be constructed in the mind rather than transferred from the teacher to student.

Inductive Learning Definition

Inductive learning involves the students ‘constructing’ theories and ideas through observation. We contrast it to deductive learning , where the teacher presents the theories then students examine examples.

It is argued that learning with the inductive approach results in deep cognitive processing of information, creative independent thinking, and a rich understanding of the concepts involved.

It can also lead to long memory retention and strong transferability of knowledge to other situations.

Prince and Felder (2006) highlight that this concept explains a range of approaches to teaching and learning:

“Inductive teaching and learning is an umbrella term that encompasses a range of instructional methods, including inquiry learning, problem-based learning, project-based learning, case-based teaching, discovery learning, and just-in-time teaching” (Prince & Felder, 2006, p. 124).

Inductive Learning vs Deductive Learning

While both inductive and deductive learning are used in education, they are distinct in terms of their underlying principles and teaching methods.

Generally, inductive learning is a bottom-up approach meaning the observations precede the conclusions. It involves making observations, recognizing patterns, and forming generalizations .

On the other hand, deductive learning is a top-down approach meaning that it involves a teacher presenting general principles which are then examined using scientific research.

Both are legitimate methods, and in fact, despite its limitations, many students get a lot of pleasure out of doing deductive research in a physics or chemistry class.

Below is a table comparing the differences:

Inductive Learning Strengths and Limitations

Inductive learning is praised as an effective approach because it involves students constructing knowledge through observation, active learning and trial and error.

As a result, it helps develop critical thinking skills and fosters creativity because students must create the theories rather than being presented with them at the beginning of the lesson.

However, inductive learning isn’t always beneficial. To start with, students often don’t understand what the end goal of the activity is, which leads to confusion and disillusionment.

Secondly, it can be more challenging for novice learners who don’t have strong frameworks for systematic analysis and naturalistic observation .

Below is a table summary of the strengths and weaknesses:

Inductive Learning Examples

  • Mrs. Williams shows her art students a wide range of masterpieces from different genres. Students then develop their own categorical definitions and classify the artwork accordingly.   
  • Children in third grade are shown photos of different musical instruments and then asked to categorize them based on their own definitions.
  • A company has customers try out a new product while the design team observes behind a two-way mirror. The team tries to identify common concerns, operational issues, and desirable features.
  • A team of researchers observes the verbal interactions between parents and children in households. They then try to identify patterns and characteristics that affect language acquisition.
  • A biologist observes the foraging and hunting behavior of the Artic fox to determine types of terrain and environmental features conducive to survival.
  • Researchers interested in group dynamics and decision-making analyze the functional statements of personnel during meetings and try to find patterns that facilitate problem-solving . 
  • Chef Phillips presents 5 desserts to his students and asks them to identify the qualities that make each one distinct….and tasty.
  • Dr. Guttierrez gives each team of students in his advertising class a set of effective and ineffective commercials. Each team then develops a set of criteria for what makes a good commercial. 
  • The Career Center shows a range of video-recorded job interviews and asks students to identify the characteristics that make some of them impressive and others not.
  • Kumar demonstrates different yoga poses in a Far East Religions class and then the students try to identify the areas of the body and problem each pose is meant to address.

Case Studies and Research Basis

1. inductive learning in an inquiry-based classroom.

On the surface, this would appear to be a very straightforward question with a very straightforward answer. Many formal definitions share several common characteristics: existence of a m etabolism, replication, evolution, responsiveness, growth, movement, and cellular structure.

However, Prud’homme-Généreux (2013) points out that in one popular biology textbook there are 48 different experts offering different definitions.

In this inductive learning class activity by Prud’homme-Généreux (2013), the instructor prepares two sets of cards (A and B). Each card in set A contains an image of a living organism; each card in set B contains an image of an object that is not living.

Before distributing the cards, teams of 3 are formed and asked:

Why do we need a definition of life?

Each team then generates a new definition of life. Afterwards, the teams receive 3 cards from both sets.

For class discussion, one characteristic of a team’s definition is written on the board. Teams examine their cards and determine if that characteristic applies.

Prud’homme-Généreux states:

“…that the approach elicits curiosity, triggers questions, and leads to a more nuanced understanding of the concept…leads to confidence in their ability to think.”

2. Inductive Learning in Peer Assessment

Inductive learning methods can be applied in a wide range of circumstances. One strategy is aimed at helping students understand grading criteria and how to develop a critical eye for their work and the work of others.

The procedure involves having students form teams of 3-5. The instructor then supplies each team with 5 essays that vary in terms of quality and assigned grade.

Each team examines the essays, discuss them amongst themselves, and then try to identify the grading criteria.

Class discussion can ensue with the instructor projecting new essays on the board and asking the class to apply their team’s criteria.

This activity is an excellent way for students to develop a deeper understanding of the grading process.

3. Problem-Based Inductive Learning in Medical School

The conventional approach to teaching involves the teacher presenting the principles of a subject and then having students apply that knowledge to different situations. As effective as that approach is, medical schools have found that student learning is more advanced with a problem-based inductive approach.

So, instead of students being told what the symptoms are for a specific disease, students are presented with a clinical case and then work together to identify the ailment.

Although each team is assigned an experienced tutor, they try to provide as little assistance as possible.

Medical schools have found that this form of inductive learning leads to a much deeper understanding of medical conditions and helps students develop the kind of advanced critical-thinking skills they will need throughout their careers.

4. Inductive Learning in Traffic Management

Traffic management involves controlling the movement of people and vehicles. The goal is to ensure safety and improve flow efficiency. In the early days of traffic management, personnel would monitor traffic conditions at various times of the day, and try to identify patterns in traffic dynamics and the causal factors involved.

Those insights were then extrapolated to the broader city context and various rules and regulations were devised.

Today, much of that inductive analysis is conducted through sophisticated software algorithms. Through carefully placed cameras, the software tracks traffic flow, identifies operating paramenters, and then devises solutions to improve flow rate and safety.

For example, the software will monitor average traffic speed, congestion detection, journey times between key locations, as well as vehicle counts and flow rate estimates.

Traffic management is an example of software that is capable of inductive learning.

5. Inductive Learning in Theory Development

Inductive learning is a key way in which scholars and researchers come up with ground-breaking theories. One example is in Mary Ainsworth’s observational research, where she used observations to induce a theory, as explained below.

Although most people mention the Strange Situations test developed by Dr. Mary Ainsworth, she conducted naturalistic observations many years prior to its initial creation.

For two years, starting in 1954, she visited the homes of families in Uganda. She took detailed notes on infant/caregiver interactions, in addition to interviewing mothers about their parenting practices.

Through inductive reasoning and learning, she was able to identify patterns of behavior that could be categorized into several distinct attachment profiles.

Along with her work with John Bowlby, these notes formed the basis of her theory of attachment.

As reported by Bretherton (2013),

“…secure-attached infants cried little and engaged in exploration when their mother was present, while insecure-attached infants were frequently fussy even with mother in the same room” (p. 461).

Inductive learning is when students are presented with examples and case studies from which they are to derive fundamental principles and characteristics.

It many ways, it is the opposite of conventional instructional strategies where teachers define the principles and then students apply them to examples.

Inductive learning is a powerful approach. It leads to students developing a very rich understanding of the subject under study, increases student engagement, prolongs retention, and helps build student confidence in their ability to learn.

We can see examples of inductive learning in the world’s best medical schools, research that has had a profound impact on our understanding of infant/caregiver relations, and even its use by sophisticated algorithms that control traffic in our largest cities.

Ainsworth, M. D. S. (1967). Infancy in Uganda . Baltimore: Johns Hopkins University Press.

Bretherton, I. (2013). Revisiting Mary Ainsworth’s conceptualization and assessments of maternal sensitivity-insensitivity. Attachment & Human Development, 15 (5–6), 460–484. http://dx.doi.org/10.1080/14616734.2013.835128

Prince, M. & Felder, R. (2006). Inductive teaching and learning methods: Definitions, comparisons, and research bases. Journal of Engineering Education, 95 , 123-137. https://doi.org/10.1002/j.2168-9830.2006.tb00884.x

Prud’homme-Généreux, A. (2013). What Is Life? An Activity to Convey the Complexities of This Simple Question. The American Biology Teacher, 75 (1), 53-57.

Shemwell, J. T., Chase, C. C., & Schwartz, D. L. (2015). Seeking the general explanation: A test of inductive activities for learning and transfer. Journal of Research in Science Teaching, 52 (1), 58-83.

Lahav, N. (1999). Biogenesis: Theories of life’s origin . Oxford, U.K.: Oxford University Press.

Dave

Dave Cornell (PhD)

Dr. Cornell has worked in education for more than 20 years. His work has involved designing teacher certification for Trinity College in London and in-service training for state governments in the United States. He has trained kindergarten teachers in 8 countries and helped businessmen and women open baby centers and kindergartens in 3 countries.

  • Dave Cornell (PhD) https://helpfulprofessor.com/author/dave-cornell-phd/ 25 Positive Punishment Examples
  • Dave Cornell (PhD) https://helpfulprofessor.com/author/dave-cornell-phd/ 25 Dissociation Examples (Psychology)
  • Dave Cornell (PhD) https://helpfulprofessor.com/author/dave-cornell-phd/ 15 Zone of Proximal Development Examples
  • Dave Cornell (PhD) https://helpfulprofessor.com/author/dave-cornell-phd/ Perception Checking: 15 Examples and Definition

Chris

Chris Drew (PhD)

This article was peer-reviewed and edited by Chris Drew (PhD). The review process on Helpful Professor involves having a PhD level expert fact check, edit, and contribute to articles. Reviewers ensure all content reflects expert academic consensus and is backed up with reference to academic studies. Dr. Drew has published over 20 academic articles in scholarly journals. He is the former editor of the Journal of Learning Development in Higher Education and holds a PhD in Education from ACU.

  • Chris Drew (PhD) #molongui-disabled-link 25 Positive Punishment Examples
  • Chris Drew (PhD) #molongui-disabled-link 25 Dissociation Examples (Psychology)
  • Chris Drew (PhD) #molongui-disabled-link 15 Zone of Proximal Development Examples
  • Chris Drew (PhD) #molongui-disabled-link Perception Checking: 15 Examples and Definition

Leave a Comment Cancel Reply

Your email address will not be published. Required fields are marked *

A concept Learning Task and Inductive Learning Hypothesis

Concept Learning is a way to find all the consistent hypotheses or concepts. This article will help you understand the concept better. 

We have already covered designing the learning system in the previous article and to complete that design we need a good representation of the target concept. 

Why Concept learning? 

A lot of our learning revolves around grouping or categorizing a large data set. Each concept of learning can be viewed as describing some subset of objects or events defined over a larger set. For example, a subset of vehicles that constitute cars. 

Alternatively, each dataset has certain attributes. For example, if you consider a car, its attributes will be color, size, number of seats, etc. And these attributes can be defined as Binary valued attributes. 

Let’s take another elaborate example of EnjoySport, The attribute EnjoySport shows if a person is participating in his favorite water activity on this particular day.

The goal is to learn to anticipate the value of EnjoySport on any given day based on its other qualities’ values.

To simplify,

Task T: Determine the value of EnjoySport for every given day based on the values of the day’s qualities.

The total proportion of days (EnjoySport) accurately anticipated is the performance metric P .

Experience E: A collection of days with pre-determined labels (EnjoySport: Yes/No).

Each hypothesis can be considered as a set of six constraints, with the values of the six attributes Sky, AirTemp, Humidity, Wind, Water, and Forecast specified.

Here the concept = < Sky, Air Temp, Humidity, Wind, Forecast>.

The number of possible instances = 2^d.

The total number of Concepts = 2^(2^d). 

Where d is the number of features or attributes. In this case, d = 5

=> The number of possible instances = 2^5 = 32.

=> The total number of Concepts = 2^(2^5) = 2^(32). 

From these 2^(32) concepts we got, Your machine doesn’t have to learn about all of these topics. You’ll select a few of the concepts from 2^(32) concepts to teach the machine. 

The concepts chosen need to be consistent all the time. This hypothesis is called target concept (or) hypothesis space. 

Hypothesis Space:

To formally define Hypothesis space, The collection of all feasible legal hypotheses is known as hypothesis space. This is the set from which the machine learning algorithm will select the best (and only) function or outputs that describe the target function.

The hypothesis will either 

  • Indicate with a “?” that any value is acceptable for this attribute.
  • Define a specific necessary value (e.g., Warm).
  • Indicate with a “0” that no value is acceptable for this attribute.
  • The expression that represents the hypothesis that the person loves their favorite sport exclusively on chilly days with high humidity (regardless of the values of the other criteria) is –

  < ?, Cold, High, ?, ? >

  • The most general hypothesis that each day is a positive example is represented by 

                   <?, ?, ?, ?, ?, ?> 

  • The most specific possible hypothesis that none of the day is a positive example is represented by

                         <0, 0, 0, 0, 0, 0>

Concept Learning as Search: 

The main goal is to find the hypothesis that best fits the training data set. 

Consider the examples X and hypotheses H in the EnjoySport learning task, for example.

With three potential values for the property Sky and two for AirTemp, Humidity, Wind, Water, and Forecast, the instance space X contains precisely,

=> The number of different instances possible = 3*2*2*2*2*2 = 96. 

Inductive Learning Hypothesis

The learning aim is to find a hypothesis h that is similar to the target concept c across all instances X, with the only knowledge about c being its value throughout the training examples.

Inductive Learning Hypothesis can be referred to as, Any hypothesis that accurately approximates the target function across a large enough collection of training examples will likewise accurately approximate the target function over unseen cases. 

Over the training data, inductive learning algorithms can only ensure that the output hypothesis fits the goal notion.

The optimum hypothesis for unseen occurrences, we believe, is the hypothesis that best matches the observed training data. This is the basic premise of inductive learning.

Inductive Learning Algorithms Assumptions:

  • The population is represented in the training sample.
  • Discrimination is possible thanks to the input characteristics.

The job of searching through a wide set of hypotheses implicitly described by the hypothesis representation may be considered as concept learning.

The purpose of this search is to identify the hypothesis that most closely matches the training instances.

← ^ →

Inductive Learning Hypothesis

  • With n attributes, each with 3 values, we have that | H | = 3 n
  • We assume that one of those hypothesis will match the target function c ( x ) .
  • Furthermore, all we know about c ( x ) is given by the examples we have seen. We must assume that the future examples will resemble past ones.
  • The inductive learning hypothesis states that any hypothesis found to approximate the target function well over a sufficiently large set of training examples will also approximate the target function well over other unobserved examples.
  • Why should this be true? Its not true for the stock market, or is it?

Chapter 2: Concept Learning and the General-to-Specific Ordering

  • Concept Learning: Inferring a boolean valued function from training examples of its input and output.
  • X: set of instances
  • x: one instance
  • c: target concept, c:X → {0, 1}
  • < x, c(x) >, training instance, can be a positive example or a negative example
  • D: set of training instances
  • H: set of possible hypotheses
  • h: one hypothesis, h: X → { 0, 1 }, the goal is to find h such that h(x) = c(x) for all x in X

Inductive Learning Hypothesis

Any hypothesis found to approximate the target function well over a sufficiently large set of training examples will also approximate the target function well over other unobserved examples.

Let h j and h k be boolean-valued functions defined over X. h j is more general than or equal to h k (written h j ≥ g h k ) if and only if (∀ x ∈ X) [ (h k (x) = 1) → (h j (x) = 1)]

This is a partial order since it is reflexive, antisymmetric and transitive.

Find-S Algorithm

Outputs a description of the most specific hypothesis consistent with the training examples.

  • Initialize h to the most specific hypothesis in H
  • If the constraint a i is NOT satisfied by x, then replace a i in h by the next more general constraint that is satisfied by x.
  • Output hypothesis h

For this particular algorithm, there is a bias that the target concept can be represented by a conjunction of attribute constraints.

Candidate Elimination Algorithm

Outputs a description of the set of all hypotheses consistent with the training examples.

A hypothesis h is consistent with a set of training examples D if and only if h(x) = c(x) for each example < x, c(x) > in D. Consistent(h, D) ≡ (∀ < x, c(x) > ∈ D) h(x) = c(x)

The version space denoted VS H,D with respect to hypothesis space H and training examples D, is the subset of hypotheses from H consistent with the training examples in D. VS H,D ≡ { h ∈ H | Consistent(h, D) }

The general boundary G, with respect to hypothesis space H and training data D, is the set of maximally general members of H consistent with D.

The specific boundary S, with respect to hypothesis space H and training data D, is the set of maximally specific members of H consistent with D.

Version Space Representation

Let X be an arbitrary set of instances and let H be a set of boolean-valued hypotheses defined over X. Let c:X → {0,1} be an arbitrary target concept defined over X, and let D be an arbitrary set of training examples {<x, c(x)>}. For all X, H, c and D such that S and G are well defined, VS H,D = {h ∈ H | (∃s ∈ S) (∃g ∈ G) (g ≥ g h ≥ g s)}

  • Initialize G to the set of maximally general hypotheses in H
  • Initialize S to the set of maximally specific hypotheses in H
  • Remove from G any hypothesis inconsistent with d
  • Remove s from S
  • Add to S all minimal generalizations h of s such that h is consistent with d, and some member of G is more general than h
  • Remove from S any hypothesis that is more general than another hypothesis in S
  • Remove from S any hypothesis inconsistent with d
  • Remove g from G
  • Add to G all minimal specializations h of g such that h is consistent with d, and some member of S is more specific than h
  • Remove from G any hypothesis that is less general than another hypothesis in G

Candidate Elimination Algorithm Issues

  • Will it converge to the correct hypothesis? Yes, if (1) the training examples are error free and (2) the correct hypothesis can be represented by a conjunction of attributes.
  • If the learner can request a specific training example, which one should it select?
  • How can a partially learned concept be used?

Inductive Bias

  • Definition: Consider a concept learning algorithm L for the set of instances X. Let c be an arbitrary concept defined over X and let D c = {<x, c(x)>} be an arbitrary set of training examples of c. Let L(x i , D c ) denote the classification assigned to the instance x i by L after training on the data D c . The inductive bias of L is any minimal set of assertions B such that for any target concept c and corresponding training examples D c (∀ x i ∈ X) [ L(x i , D c ) follows deductively from (B ∧ D c ∧ x i ) ]
  • Thus, one advantage of an inductive bias is that it gives the learner a rational basis for classifying unseen instances.
  • What is another advantage of bias?
  • What is one disadvantage of bias?
  • What is the inductive bias of the candidate elimination algorithm? Answer: the target concept c is a conjunction of attributes.
  • What is meant by a weak bias versus a strong bias?

Sample Exercise

Work exercise 2.4 on page 48.

Valid XHTML 1.0!

SEP home page

  • Table of Contents
  • Random Entry
  • Chronological
  • Editorial Information
  • About the SEP
  • Editorial Board
  • How to Cite the SEP
  • Special Characters
  • Advanced Tools
  • Support the SEP
  • PDFs for SEP Friends
  • Make a Donation
  • SEPIA for Libraries
  • Entry Contents

Bibliography

Academic tools.

  • Friends PDF Preview
  • Author and Citation Info
  • Back to Top

Formal Learning Theory

Formal learning theory is the mathematical embodiment of a normative epistemology. It deals with the question of how an agent should use observations about her environment to arrive at correct and informative conclusions. Philosophers such as Putnam, Glymour and Kelly have developed learning theory as a normative framework for scientific reasoning and inductive inference.

Terminology . Cognitive science and related fields typically use the term “learning” for the process of gaining information through observation— hence the name “learning theory”. To most cognitive scientists, the term “learning theory” suggests the empirical study of human and animal learning stemming from the behaviourist paradigm in psychology. The epithet “formal” distinguishes the subject of this entry from behaviourist learning theory. Philosophical terms for learning-theoretic epistemology include “logical reliability” (Kelly [1996], Glymour [1991]) and “means-ends epistemology” (Schulte [1999]).

Because many developments in, and applications of, formal learning theory come from computer science, the term “computational learning theory” is also common. Many results on learning theory in computer science are concerned with Valiant’s and Vapnik’s notion of learning generalizations that are probably approximately correct (PAC learning) (Valiant [1984]). This notion of empirical success was introduced to philosophers by Gilbert Harman in his Nicod lectures, and elaborated in a subsequent book [Harman and Kulkarni 2007]. Valiant himself provides an accessible account of PAC learning and its relationship to the problem of induction in a recent book (Valiant [2013, Ch. 5]). The present article describes a nonstatistical tradition of learning theory stemming from the seminal work of Hilary Putnam [1963] and Mark E. Gold [1967]. Recent research has extended the reliabilist means-ends approach to a statistical setting where inductive methods assign probabilities to statistical hypotheses from random samples. The new statistical framework is described at the end of this entry, in the Section on Reliable Statistical Inquiry.

Philosophical characteristics . In contrast to other philosophical approaches to inductive inference, learning theory does not aim to describe a universal inductive method or explicate general axioms of inductive rationality. Rather, learning theory pursues a context-dependent means-ends analysis [Steele 2010]: For a given empirical problem and a set of cognitive goals, what is the best method for achieving the goals? Most of learning theory examines which investigative strategies reliably and efficiently lead to correct beliefs about the world.

Article Overview . Compared to traditional philosphical discussions of inductive inference, learning theory provides a radically new way of thinking about induction and scientific method. The main aim of this article is explain the main concepts of the theory through examples. Running examples are repeated throughout the entry; at the same time, the sections are meant to be as independent of each other as possible. We use the examples to illustrate some theorems of philosophical interest, and to highlight the key philosophical ideas and insights behind learning theory.

Readers interested in the mathematical substance of learning theory will find some references in the Bibliography , and a summary of the basic definitions in the Supplementary Document . A text by Jain et al. collects many of the main definitions and theorems [1999]. New results appear in proceedings of annual conferences, such as the Conferences on Learning Theory (COLT) and Algorithmic Learning Theory (ALT). The philosophical issues and motivation pertaining to learning-theoretic epistemology are discussed extensively in the works of philosophers such as Putnam, Glymour and Kelly (Putnam [1963], Glymour [1991], Glymour and Kelly [1992], Kelly [1996]).

1.1 Simple Universal Generalization

1.2 the new riddle of induction, 1.3 discussion, 1.4 falsificationism and generalizations with exceptions, 2.1 conservation laws in particle physics, 2.2 causal connections, 2.3 models of cognitive architecture, 2.4 discussion, 3.1 verifiable and refutable hypotheses, 3.2 point-set topology and the axioms of verifiability, 3.3 identifiability in the limit of inquiry, 4.1 example: the new riddle of induction, 4.2 more examples, 4.3 regressive mind changes, 5.1 defining simplicity, 5.2 examples, 5.3 stable belief and simplicity: an ockham theorem, 5.4 regressive mind changes and simplicity: another ockham theorem, 6. reliable learning for statistical hypotheses, 7. other approaches: categorical vs. hypothetical imperatives, supplementary document: basic formal definitions, other internet resources, related entries, 1. convergence to the truth and nothing but the truth.

Learning-theoretic analysis assesses dispositions for forming beliefs. Several terms for belief acquisition processes are in common use in philosophy; I will use “inductive strategy”, “inference method” and most frequently “inductive method” to mean the same thing. The best way to understand how learning theory evaluates inductive methods is to work through some examples. The following presentation begins with some very simple inductive problems and moves on to more complicated and more realistic settings.

Let’s revisit the classic question of whether all ravens are black. Imagine an ornithologist who tackles this problem by examining one raven after another. There is exactly one observation sequence in which only black ravens are found; all others feature at least one nonblack raven. The figure below illustrates the possible observation sequences. Dots in the figure denote points at which an observation may be made. A black bird to the left of a dot indicates that at this stage, a black raven is observed. Similarly, a white bird to the right of a dot indicates that a nonblack raven is observed. Given a complete sequence of observations, either all observed ravens are black or nonblack; the figure labels complete observation sequences with the statement that is true of them. The gray fan indicates that after the observation of a white raven, the claim that not all ravens are black holds on all observation sequences resulting from further observations.

a diagram: link to extended description below

Figure 1 [An extended description of figure 1 is in a supplement.]

If the world is such that only black ravens are found, we would like the ornithologist to settle on this generalization. (It may be possible that some nonblack ravens remain forever hidden from sight, but even then the generalization “all ravens are black” at least gets the observations right.) If the world is such that eventually a nonblack raven is found, then we would like the ornithologist to arrive at the conclusion that not all ravens are black. This specifies a set of goals of inquiry. For any given inductive method that might represent the ornithologist’s disposition to adopt conjectures in the light of the evidence, we can ask whether that method measures up to these goals or not. There are infinitely many possible methods to consider; we’ll look at just two, a skeptical one and one that boldly generalizes. The bold method conjectures that all ravens are black after seeing that the first raven is black. It hangs on to this conjecture unless some nonblack raven appears. The skeptical method does not go beyond what is entailed by the evidence. So if a nonblack raven is found, the skeptical method concludes that not all ravens are black, but otherwise the method does not make a conjecture one way or another. The figure below illustrates both the bold and the skeptical method.

two part diagram: link to extended description below

Figure 2 [An extended description of figure 2 is in a supplement.]

Do these methods attain the goals we set out? Consider the bold method. There are two possibilities: either all observed ravens are black, or some nonblack raven is found. In the first case, the method conjectures that all ravens are black and never abandons this conjecture . In the second case, the method concludes that not all ravens are black as soon as the first nonblack raven is found. Hence no matter how the evidence comes in, eventually the method gives the right answer as to whether all ravens are black and sticks with this answer. Learning theorists call such methods reliable because they settle on the right answer no matter what observations the world provides.

The skeptical method does not measure up so well. If a nonblack raven appears, then the method does arrive at the correct conclusion that not all ravens are black. But if all ravens are black, the skeptic never takes an “inductive leap” to adopt this generalization. So in that case, the skeptic fails to provide the right answer to the question of whether all ravens are black.

This illustrates how means-ends analysis can evaluate methods: the bold method meets the goal of reliably arriving at the right answer, whereas the skeptical method does not. Note the character of this argument against the skeptic: The problem, in this view, is not that the skeptic violates some canon of rationality, or fails to appreciate the “ uniformity of nature” . The learning-theoretic analysis concedes to the skeptic that no matter how many black ravens have been observed in the past, the next one could be white. The issue is that if all observed ravens are indeed black, then the skeptic never answers the question “are all ravens black?”. Getting the right answer to that question requires generalizing from the evidence even though the generalization could be wrong.

As for the bold method, it’s important to be clear on what it does and does not achieve. The method will eventually settle on the right answer—but it (or we) may never be certain that it has done so. As William James put it, “no bell tolls” when science has found the right answer. We are certain that the method will eventually settle on the right answer; but we may never be certain that the current answer is the right one. This is a subtle point; the next example illustrates it further.

Nelson Goodman posed a famous puzzle about inductive inference known as the (New) Riddle of Induction ([Goodman 1983]). Our next example is inspired by his puzzle. Goodman considered generalizations about emeralds, involving the familiar colours of green and blue, as well as certain unusual ones:

Suppose that all emeralds examined before a certain time \(t\) are green …. Our evidence statements assert that emerald \(a\) is green, that emerald \(b\) is green, and so on…. Now let us introduce another predicate less familiar than “green”. It is the predicate “grue” and it applies to all things examined before \(t\) just in case they are green but to other things just in case they are blue. Then at time \(t\) we have, for each evidence statement asserting that a given emerald is green, a parallel evidence statement asserting that emerald is grue. The question is whether we should conjecture that all emeralds are green rather than that all emeralds are grue when we obtain a sample of green emeralds examined before time \(t\), and if so, why.

Clearly we have a family of grue predicates in this problem, one for each different “critical time” \(t\); let’s write grue\((t)\) to denote these grue predicates. Following Goodman, let us refer to methods as projection rules in discussing this example. A projection rule succeeds in a world just in case it settles on a generalization that is correct in that world. Thus in a world in which all examined emeralds are found to be green, we want our projection rule to converge to the proposition that all emeralds are green. If all examined emeralds are grue\((t)\), we want our projection rule to converge to the proposition that all emeralds are grue\((t)\). Note that this stipulation treats green and grue predicates completely on a par, with no bias towards either. As before, let us consider two rules: the natural projection rule which conjectures that all emeralds are green as long as only green emeralds are found, and the gruesome rule which keeps projecting the next grue predicate consistent with the available evidence. Expressed in the green-blue vocabulary, the gruesome projection rule conjectures that after observing some number of \(n\) green emeralds, all future ones will be blue. The figures below illustrates the possible observation sequences, the natural projection rule, and the gruesome projection rule.

natural projection rule: link to extended description below

Figure 3 [An extended description of figure 3 is in a supplement.]

The following figure shows the gruesome projection rule.

gruesome projection rule: link to extended description below

Figure 4 [An extended description of figure 4 is in a supplement.]

How do these rules measure up to the goal of arriving at a true generalization? Suppose for the sake of the example that the only serious possibilities under consideration are: (1) Either all emeralds are green or (2) all emeralds are grue\((t)\) for some critical time \(t\). Then the natural projection rule settles on the correct generalization no matter what the correct generalization is. For if all emeralds are green, the natural projection rule asserts this fact from the beginning. And suppose that all emeralds are grue\((t)\) for some critical time \(t\). Then at time \(t\), a blue emerald will be observed. At this point the natural projection rule settles on the conjecture that all emeralds are grue\((t)\), which must be correct given our assumption about the possible observation sequences. Thus no matter what evidence is obtained in the course of inquiry—consistent with our background assumptions—the natural projection rule eventually settles on a correct generalization about the colour of emeralds.

The gruesome rule does not do as well. For if all emeralds are green, the rule will never conjecture this fact because it keeps projecting grue predicates. Hence there is a possible observation sequence—namely those on which all emeralds are green—on which the gruesome rule fails to converge to the right generalization. So means-ends analysis would recommend the natural projection rule over the gruesome rule.

The means-ends analysis of the Riddle of Induction illustrates a number of philosophically important points that holds for learning-theoretic analysis in general.

Equal Treatment of All Hypotheses . As in the previous example, nothing in this argument hinges on arguments to the effect that certain possibilities are not to be taken seriously a priori. In particular, nothing in the argument says that generalizations with grue predicates are ill-formed, unlawlike, or in some other way a priori inferior to “all emeralds are green”.

Language Invariance . The analysis does not depend on the vocabulary in which the evidence and generalizations are framed. For ease of exposition, I have mostly used the green-blue reference frame. However, grue-bleen speakers would agree that the aim of reliably settling on a correct generalization requires the natural projection rule rather than the gruesome one, even if they would want to express the conjectures of the natural rule in their grue-bleen language rather than the blue-green language that we have used so far.

Dependence on Context . Though the analysis does not depend on language, it does depend on assumptions about what the possible observation sequences are. The example as described above seems to comprise the possibilities that correspond to the colour predicates Goodman himself discussed. But means-ends analysis applies just as much to other sets of possible predicates. Schulte [1999] and Chart [2000] discuss a number of other versions of the Riddle of Induction, in some of which means-ends analysis favours projecting that all emeralds are grue on a sample of all green emeralds.

Our first two examples feature simple universal generalizations. Some subtle aspects of the concept of long-run reliability, particularly its relationship to falsificationism, become apparent if we consider generalizations that allow for exceptions. To illustrate, let us consider another ornithological example. Two competing hypotheses are under investigation.

  • All but finitely many swans are white. That is, basically all swans are white, except for a finite number of exceptions to the rule.
  • All but finitely many swans are black. That is, basically all swans are black, except for a finite number of exceptions to the rule.

Assuming that one or the other of these hypotheses is correct, is there an inductive method that reliably settles on the right one? What makes this problem more difficult than our first two is that each hypothesis under investigation is consistent with any finite amount of evidence. If 100 white swans and 50 black swans are found, either the 50 black swans or the 100 white swans may be the exception to the rule. In terminology made familiar by Karl Popper ’s work, we may say that neither hypothesis is falsifiable. As a consequence, the inductive strategy from the previous two examples will not work here. This strategy was basically to adopt a “bold” universal generalization, such as “all ravens are black” or “all emeralds are green”, and to hang on to this conjecture as long as it “passes muster”. However, when rules with possible exceptions are under investigation, this strategy is unreliable. For example, suppose that an inquirer first adopts the hypothesis that “all but finitely many swans are white”. It may be the case that from then on, only black swans are found. But each of these apparent counterinstances can be “explained away” as an exception. If the inquirer follows the principle of hanging on to her conjecture until the evidence is logically inconsistent with the conjecture, she will never abandon her false belief that all but finitely many swans are white, much less arrive at the correct belief that all but finitely many swans are black.

Reliable inquiry requires a more subtle investigative strategy. Here is one (of many). Begin inquiry with either competing hypothesis, say “all but finitely many swans are black”. Choose some cut-off ratio to represent a “clear majority”; for definiteness, let’s say 70%. If the current conjecture is that all but finitely many swans are black, change your mind to conjecture that all but finitely many swans are white just in case over 70% of observed swans are in fact white. Proceed likewise if the current conjecture is that all but finitely many swans are white when over 70% of observed swans are in fact black.

A bit of thought shows that this rule reliably identifies the correct hypothesis in the long run, no matter which of the two competing hypotheses is correct. For if all but finitely many swans are black, eventually the nonblack exceptions to the rule will be exhausted, and an arbitrarily large majority of observed swans will be black. Similarly if all but finitely many swans are white.

Generalizations with exceptions illustrate the relationship between Popperian falsificationism and the learning-theoretic idea of reliable convergence to the truth. In some settings of inquiry, notably those involving universal generalizations, a naively Popperian “conjectures-and-refutations” approach of hanging on to conjectures until the evidence falsifies them does yield a reliable inductive method. In other problems, like the current example, it does not. Relying on falsifications is sometimes, but not always , the best way for inquiry to proceed. Learning theory has provided mathematical theorems that clarify the relationship between a conjectures-and-refutations approach and reliable inquiry. The details are discussed in Section 3 (The Limits of Inquiry and the Complexity of Empirical Problems). Generally speaking methods that solve a learning problem with unfalsifiable hypotheses can be represented as employing a refined hypothesis space where the original hypotheses are replaced by strengthened hypotheses that are falsifiable.

2. Case Studies from Scientific Practice

This section provides further examples to illustrate learning-theoretic analysis. The examples in this section are more realistic and address methodological issues arising in scientific practice. They are not probabilistic; statistical hypotheses are discussed in Section 6 . This entry provides an outline of the full analysis; there are references to more detailed discussions below. More case studies may be found in [Kelly 1996, Ch. 7.7, Harrell 2000]. Readers who wish to proceed to the further development of the theory and philosophy of means-ends epistemology can skip this section without loss of continuity.

One of the hallmarks of elementary particle physics is the discovery of new conservation laws that apply only in the subatomic realm [Ford 1963, Ne’eman and Kirsh 1983, Feynman 1965]. (Feynman groups one of them, the conservation of Baryon Number, with the other “great conservation laws” of energy, charge and momentum.) Simplifying somewhat, conservation principles serve to explain why certain processes involving elementary particles do not occur: the explanation is that some conservation principle was violated (cf. Omnes [1971, Ch.2] and Ford [1963]). So a goal of particle inquiry is to find a set of conservation principles such that for every process that is possible according to the (already known) laws of physics, but fails to be observed experimentally, there is some conservation principle that rules out that process. And if a process is in fact observed to occur, then it ought to satisfy all conservation laws that we have introduced.

This constitutes an inference problem to which we may apply means-ends analysis. An inference method produces a set of conservation principles in response to reports of observed processes. Means-ends analysis asks which methods are guaranteed to settle on conservation principles that account for all observations, that is, that rule out unobserved processes and allow observed processes. Schulte [2008] describes an inductive method that accomplishes this goal. Informally the method may be described as follows.

  • Suppose we have observed a set of reactions among elementary particles.
  • Conjecture a set of conservation laws that permits the observed reactions and rules out as many unobserved reactions as possible.

The logic of conservation laws is such that the observation of some reactions entails the possibility of other unobserved ones. The learning-theoretic method rules out all reactions that are not entailed. It turns out that the conservation principles that this method would posit on the currently available evidence are empirically equivalent to the ones that physicists have introduced. Specifically, their predictions agree exactly with the conservation of charge, baryon number, muon number, tau number and Lepton number that is part of the Standard Model of particle physics.

For some physical processes, the only way to get empirically adequate conservation principles is by positing that some hidden particles have gone undetected. Schulte [2009] extends the analysis such that an inductive method may not only introduce conservation laws, but also posit unseen particles. The basic principle is again to posit unseen particles in such a way that we rule out as many unobserved reactions as possible. When this method is applied to the known particle data, it rediscovers the existence of an electron antineutrino. This is one of the particles of key concern in current particle physics.

There has been a substantive body of research on learning causal relationships as represented in a causal graph [Spirtes et al. 2000]. Kelly suggested a learning-theoretic analysis of inferring causality where the evidence is provided in the form of observed significant correlations among variables of interest (a modern version of Hume’s “constant conjunctions”). The following inductive method is guaranteed to converge to an empirically adequate causal graph as more and more correlations are observed [Schulte, Luo and Greiner 2007].

  • Suppose we have observed a set of correlations or associations among a set of variables of interest.
  • Select a causal graph that explains the observed correlations with a minimum number of direct causal links.

Some philosophers of mind have argued that the mind is composed of fairly independent modules. Each module has its own “input” from other modules and sends “output” to other modules. For example, an “auditory analysis system” module might take as input a heard word and send a phonetic analysis to an “auditory input lexicon”. The idea of modular organization raises the empirical question of what mental modules there are and how they are linked to each other. A prominent tradition of research in cognitive neuroscience has attempted to develop a model of mental architecture along these lines by studying the responses of normal and abnormal subjects to various stimuli. The idea is to compare normal reactions with abnormal ones—often caused by brain damage—so as to draw inferences about which mental capacities depend on each other and how.

Glymour [1994] asked the reliabilist question whether there are inference methods that are guaranteed to eventually settle on a true theory of mental organization, given exhaustive evidence about normal and abnormal capacities and reactions. He argued that for some possible mental architectures, no amount of evidence of the stimulus-response kind can distinguish between them. Since the available evidence determines the conjectures of an inductive method, it follows that there is no guarantee that a method will settle on the true model of cognitive architecture. Glymour has also explored to what extent richer kinds of evidence would resolve underdetermination of mental architecture. (One example of richer evidence are double disassocations. An example of a double dissocation would be a pair of patients, one who has a normal capacity for understanding spoken words, but fails to understand written ones, and another who understands written words but not spoken ones.)

In further discussion, Bub [1994] showed that if we grant certain restrictive assumptions about how mental modules are connected, then a complete set of behavioural observations would allow a neuropsychologist to ascertain the module structure of a (normal) mind. In fact, under Bub’s assumptions there is a reliable method for identifying the modular structure. The high-level idea of the procedure is as follows.

  • Every hypothesized modular structure can be identified with a graph \(G\) containing an edge from module \(M_1 \rightarrow M_2\) if module \(M_1\) calls on module \(M_2\).
  • Each module graph \(G\) is consistent with a set of possible paths among modules. Say that a graph \(G\) is more constrained than another graph \(G'\) if the paths defined by \(G\) are a subset of those constrained by \(G'\).
  • Conjecture any module graph \(G\) that is maximally constrained, that is, there is no other graph \(G'\) more constrained than \(G\).

These studies illustrate some general features of learning theory:

Generality . The basic notions of the theory are very general. Essentially, the theory applies whenever one has a question that prompts inquiry, a number of candidate answers, and some evidence for deciding among the answers. Thus means-ends analysis can be applied in any discipline aimed at empirical knowledge, for example physics or psychology.

Context Dependence . Learning theory is pure normative a priori epistemology in the sense that it deals with standards for assessing methods in possible settings of inquiry. But the approach does not aim for universal, context-free methodological maxims. The methodological recommendations depend on contingent factors, such as the operative methodological norms, the questions under investigation, the background assumptions that the agent brings to inquiry, the observational means at her disposal, her cognitive capacities, and her epistemic aims. As a consequence, to evaluate specific methods in a given domain, as in the case studies mentioned, one has to study the details of the case in question. The means-ends analysis often rewards this study by pointing out what the crucial methodological features of a given scientific enterprise are, and by explaining precisely why and how these features are connected to the success of the enterprise in attaining its epistemic aims.

Trade-offs . In the perspective of means-ends epistemology, inquiry involves an ongoing struggle with hard choices, rather than the execution of a universal “scientific method”. The inquirer has to balance conflicting values, and may consider various strategies such as accepting difficulties in the short run hoping to resolve them in the long run. For example in the conservation law problem, there can be conflicts between theoretical parsimony, i.e., positing fewer conservation laws, and ontological parsimony, i.e., introducing fewer hidden particles. For another example, a particle theorist may accept positing undetected particles in the hopes that they will eventually be observed as science progresses. The search for the Higgs boson illustrates this strategy. An important learning-theoretic project is to examine when such tradeoffs arise and what the options for resolving them are. Section 4 extends learning-theoretic analysis to consider goals in addition to long-run reliability.

3. The Limits of Inquiry and the Complexity of Empirical Problems

After seeing a number of examples like the ones described above, one begins to wonder what the pattern is. What is it about an empirical question that allows inquiry to reliably arrive at the correct answer? What general insights can we gain into how reliable methods go about testing hypotheses? Learning theorists answer these questions with characterization theorems . Characterization theorems are generally of the form “it is possible to attain this standard of empirical success in a given inductive problem if and only if the inductive problem meets the following conditions”.

We first cover the case where inquiry can provide certainty as to whether an empirical hypothesis is correct (relative to background knowledge). Then we consider when and how inquiry can converge to a correct hypothesis without ever arriving at certain conclusions, as described in Section 1 . We will introduce enough definitions and formal concepts to state the results precisely; the supplementary document provides a full formalization.

A learning problem is defined by a finite or countably infinite set of possible hypotheses \(\mathbf{H} = H_1, H_2 , \ldots ,H_n,\ldots\). These hypotheses are mutually exclusive and jointly cover all possibilities consistent with the inquirer’s background assumptions.

  • In the raven color problem of Section 1.1 , there are two hypotheses \(H_1 =\) “all (observed) ravens are black” and \(H_2 =\) “some (observed) raven is not black”.
  • In the New Riddle of Induction of from Section 1.2 , there are infinitely many alternative hypotheses: We have \(H_{green} =\) “all (observed) emeralds are green” and countably many alternatives of the form \(H_t =\) “all (observed) emeralds are grue\((t)\)” where \(t\) is a natural number.
  • A hypothesis \(H\) is consistent with a finite number of observations if \(H\) is correct for some complete data sequence that extends the finite observations.
  • A finite number of observations falsifies hypothesis \(H\) if \(H\) is inconsistent with the observations.
  • A finite number of observations entails hypothesis \(H\) relative to a hypothesis set \(\mathbf{H}\) if \(H\) is the only hypothesis in \(\mathbf{H}\) consistent with the observations.

Note that since logical entailment does not depend on the language we use to frame evidence and hypotheses, the concepts of consistency, entailment, and falsification do not depend on the language we use to frame evidence and hypotheses.

Examples . Recall the raven scenario from Section 1.1 (diagram repeated for convenience).

same as figure 1: link to extended description below

The observation that the first raven is black is consistent with both hypotheses \(H_1 =\) “all (observed) ravens are black” and \(H_2 =\) “some (observed) raven is not black”. The observation that the first raven, or any raven, is white falsifies the hypothesis \(H_1\) and entails the hypothesis \(H_2\). The entailment is illustrated by the gray fan structure, which means that after the observation of any white raven, the hypothesis \(H_1\) is correct for any extending complete data sequence that records the color of all further observed ravens.

The next set of concepts we need to understand the structure of hypotheses that can be settled by reliable inquiry are the notions of verifiable and falsifiable hypotheses. The verifiability and falsifiability of claims have been extensively discussed in epistemology and the philosophy of science, especially by philosophers concerned with issues in logical empiricism. This subsection describes how these concepts are used in learning theory, then compares the learning-theoretic concepts with discussions in broader epistemology.

  • A hypothesis \(H\) is verifiable if whenever \(H\) is correct, eventually evidence is observed that entails that \(H\) is correct. More formally: \(H\) is verifiable with respect to a hypothesis set \(\mathbf{H}\) if for every complete data sequence for which \(H\) is correct, there is a finite number of observations that falsify all alternative hypotheses \(H'\) from \(\mathbf{H}\).
  • A hypothesis \(H\) is refutable if whenever \(H\), eventually evidence is observed that falsifies \(H\). More formally: \(H\) is refutable with respect to a hypothesis set \(\mathbf{H}\) if for every complete data sequence for which \(H\) is not correct (but some other hypothesis in \(\mathbf{H}\) is), there is a finite number of observations that falsifies \(H\).
  • The hypothesis \(H_2 =\) “some (observed) raven is not black” is verifiable but not refutable. It is verifiable because any data sequence for which it is correct, features a non-black raven at some finite time. The observation of the non-black raven entails \(H_2\). The hypothesis \(H_2\) is not falsifiable, because if only black ravens are observed forever, then \(H_2\) is incorrect, but there is no finite number of observations that falsifies \(H_2\).
  • The hypothesis \(H_1 =\) “all (observed) ravens are black” is refutable but not verifiable. It is refutable because any data sequence for which it is not correct, features a non-black raven at some finite time. The observation of the non-black raven falsifies \(H_1\). \(H_1\) is not verifiable, because if only black ravens are observed forever, then \(H_1\) is correct, but there is no finite number of observations that entails \(H_1\).
  • In the New Riddle of Induction of Section 1.2 (diagram repeated below for convenience), the hypothesis “all (observed) emeralds are green” is falsifiable but not verifiable, for the same reason as “all (observed) ravens are black” is refutable but not verifiable.
  • Any of the grue hypotheses \(H_t =\) “all (observed) emeralds are grue(t) ” is verifiable and refutable. \(H_t\) is refutable because any complete data sequence for which a gruesome generalization is incorrect will feature a counterexample that falsifies it. \(H_t\) is verifiable because if it is correct, the first observation of a blue emerald at time \(t\) falsifies the hypothesis “all (observed) emeralds are green ” and also falsifies all other \(H_t\) hypotheses.

The example of the gruesome hypotheses shows that an empirical hypothesis can be both verifiable and refutable (sometimes called “decidable” in analogy with computation theory ). Other typical examples of decidable empirical claims are singular observations, such as “the first raven is black”, and Boolean combinations of singular observations.

same as figure 4: link to extended description below

We will briefly discuss similarities and differences to related concepts from epistemology and the philosophy of science.

Verificationism is part of the philosophy of logical empiricism . The core idea is that for a claim to be meaningful, it must be empirically verifiable. The main difference with our concept is the philosophical objective: the goal of learning theory is not to separate meaningful from meaningless claims, but to characterize the standards of empirical success we can expect from inquiry for a given set of hypotheses. A hypothesis that is verifiable according to the definition above allows inquiry to provide a positive test : when the hypothesis is correct, inquiry will eventually indicate its correctness with certainty (given background knowledge). The specific definitions of “verifiability” offered by the logical empiricists are not equivalent to verifiability in the learning-theoretic sense. For example, strict verificationism holds that “in order to be meaningful a claim must be implied by a finite number of observation sentences.”. No finite number of observation sentences is equivalent to the hypothesis that \(H_2 =\) “some (observed) raven is not black”, because this hypothesis is equivalent to an infinite disjunction of observation sentences (i.e., a non-black raven at time 1, a non-black raven at time 2, …).

Falsificationism is a well-known view in the philosophy of science. The core idea is that for a hypothesis to be scientific, rather than pseudo-scientific or metaphysical, it must be falsifiable in the following sense: “statements …, in order to be ranked as scientific, must be capable of conflicting with possible, or conceivable observations”. (Popper 1962, 39). The main difference with our development is the philosophical objective: the goal of learning theory is not to demarcate scientific hypotheses from pseudo-scientific theories, but to characterize the standards of empirical success we can expect from inquiry for a given set of hypotheses. A hypothesis that is refutable according to the definition above allows inquiry to provide a negative test : when the hypothesis is incorrect, inquiry will eventually indicate its incorrectness with certainty (given background knowledge). The specific definition of “falsifiability” in the Popper quote above is not equivalent to refutability in the learning-theoretic sense [Schulte and Juhl 1996]. For example, the hypothesis \(H =\) “the first raven is black and some other raven is non-black” conflicts with the possible observation that the first raven is white. However, if in fact all observed ravens are black, then \(H\) is incorrect but not falsified by any finite number of observations, hence not refutable according to the learning-theoretic definition. For further discussion of the relationship between Popperian falsification and learning theory see [Genin 2018].

To further elucidate the learning-theoretic concepts of verifiability and refutability, we note that they satisfy the following fundamental properties. We give informal but rigorous proofs.

Proof: Let \(H = H_1\) or \(H_2,\ldots\) or \(H_n\) or … be a disjunction of verifiable hypotheses \(H_i\) (the disjunction may be infinite). Suppose that \(H\) is correct for a complete data sequence. Then some \(H_i\) is correct for the data sequence. Since \(H_i\) is verifiable, there is a finite number of observations that entails \(H_i\), which entails \(H\). So if \(H\) is correct for any complete data sequence, there is a finite number of observations from the sequence that entails \(H\), as required for verifiability. For example, let \(H_i\) be the verifiable hypothesis that there is a non-black raven at time \(i\). Then the hypothesis \(H =\) “some (observed) raven is not black” is equivalent to the disjunction \(H_1\) or \(H_2,\ldots\) or \(H_n\) or …. Since each hypothesis \(H_i\) is verifiable, so is \(H\).
Proof: Let \(H = H_1\) and \(H_2,\ldots\) and \(H_n\) be a finite conjunction of verifiable hypotheses \(H_i\). Suppose that \(H\) is correct for a complete data sequence. Then each \(H_i\) is correct for the data sequence. Since \(H_i\) is verifiable, there is a finite number of observations that entails \(H_i\). Because there are only finite many hypotheses \(H_i\), eventually each hypothesis will be verified by a finite number of observations, which entails their conjunction \(H\). So if \(H\) is correct for any complete data sequence, there is a finite number of observations from the sequence that entails \(H\), as required for verifiability. For example, let \(H_1\) be the verifiable hypothesis that the first raven is non-black and let \(H_2\) be the verifiable hypothesis that the second raven is non-black. If the conjunction \(H = H_1\) and \(H_2\) is correct for a data sequence, then the first two ravens are not black. The observation of the first two ravens therefore entails \(H\).
Proof: A tautology (like “the first observed raven is black or is not black”) is correct for any data sequence and entailed by any evidence sequence. A contradiction (like “the first observed raven is black and is not black”) is trivially verified if it is correct because it is never correct.
Proof: We consider the only-if direction; the converse is similar. Suppose that the negation not H of a hypothesis is refutable. Consider any complete data sequence for which hypothesis \(H\) is correct. Then not H is incorrect, and will be falsified by a finite number of observations, since it is refutable. This finite observation set entails \(H\). So if \(H\) is correct for any complete data sequence, there is a finite number of observations from the sequence that entails \(H\), as required for verifiability. For example, \( H=\) “some (observed) raven is not black” is the negation of the refutable hypothesis not \(H =\) “all (observed) ravens are black”. If not \(H\) is incorrect for a complete data sequence, it will be eventually falsified by the observation of a non-black raven. This observation entails \(H\).

Remarkably, the properties listed are exactly the fundamental axioms of an important branch of mathematics known as point-set topology [Abramsky 1987, Vickers 1986]. A topological space is defined by a collection of sets known as open sets or neighbourhoods, that satisfy the axiomatic properties of verifiable hypotheses (closure under arbitrary union and finite disjunction, both the empty set and the entire space are open). The set-theoretic complements of open sets are called closed sets, so refutable hypotheses correspond exactly to closed sets. Point-set topology was invented to support a kind of generalized functional analysis without numbers (more precisely, without distances). It is striking that the foundational axioms of topology have an exact epistemological interpretation in terms of the properties of empirical hypotheses that allow verification or falsification with certainty. Current mathematical developments of learning theory often begin by taking as a basic concept a set of verifiable hypotheses satisfying the properties listed. This approach has two advantages.

  • Learning theory can draw on, and contribute to, the rich body of concepts and results from one of the most developed branches of modern mathematics [Kelly 1996, Baltag et al. 2015, de Brecht and Yamamoto 2008].
  • The flexibility to adapt the notion of evidence item to the context of an application makes it easier to apply the general theory in different domains. For example, consider the problem of obtaining increasing precisely measurements of a quantity of interest (e.g. the speed of light in physics). We can take the basic set of verifiable hypotheses to be (unions of) open intervals around the true value of the quantity [Baltag et al. 2015, MONIST, Genin and Kelly 2017]. Another example is the concept of statistical verifiability covered in Section 6 below.

For the sake of concreteness, this entry describes examples where the basic verifiable hypotheses are disjunctions of finite sequences of evidence items. We will describe definitions and results in such a way that they assume only the axiomatic properties listed so that they are easy to apply in other settings.

A fundamental result describes the conditions under which a method can reliably find the correct hypothesis among a countably infinite or finite number \(\mathbf{H}\) of mutually exclusive hypotheses that jointly cover all possibilities consistent with the inquirer’s background assumptions. A learner for \(\mathbf{H}\) maps a finite sequence of observations to a hypothesis in \(\mathbf{H}\). For example in the New Riddle of Induction, the natural projection is a learner for the hypothesis set \(\mathbf{H}\) that comprises “all emeralds are green”, \(H_1 =\) “all emeralds are grue(1)”, \(H_2 =\) “all emeralds are grue(2)”, etc. for all critical times \(t\). A learner reliably identifies , or simply identifies, a correct hypothesis from \(\mathbf{H}\) if for every complete data sequence the following holds: if \(H\) from \(\mathbf{H}\) is correct hypothesis for the data sequence, then there is a finite number of observations such that the learner conjectures the correct hypothesis \(H\) for any further observations consistent with the data sequence. The generalizing method and the natural projection rule are examples of reliable learners for their hypothesis sets.

Theorem . There is a learner that reliably identifies a correct hypothesis from \(\mathbf{H}\) if and only if each hypothesis \(\mathbf{H}\) is a finite or countable disjunction of refutable hypotheses . For the proof see Kelly [1996, Ch. 3.3].

Example . For illustration, let’s return to the ornithological example with two alternative hypotheses: (1) all but finitely many swans are white, and (2) all but finitely many swans are black. As we saw, it is possible in the long run to reliably settle which of these two hypotheses is correct. Hence by the characterization theorem, each of the two hypotheses must be a disjunction of refutable empirical claims. To see that this indeed is so, observe that “all but finitely many swans are white” is logically equivalent to the disjunction

at most 1 swan is black or at most 2 swans are black … or at most \(n\) swans are black … or … ,

and similarly for “all but finitely many swans are black”. Each of the claims in the disjunction is refutable. For example, take the claim that “at most 3 swans are black”. If this is false, more than 3 black swans will be found, at which point the claim is conclusively falsified. The figure below illustrates how the identifiable hypotheses are structured as disjunctions of refutable hypotheses.

two part diagram: link to extended description below

Figure 5 [An extended description of figure 5 is in a supplement.]

The characterization theorem implies that we can think of a reliable method as adopting internal strengthened versions of the original hypotheses under investigation that are refutable. As the example above shows, the theorem does not imply that the strengthened hypotheses are mutually exclusive (e.g.“at most 3 swans are black” is consistent with “at most 2 swans are black”.). A recent alternative characterization theorem due to Baltag, Gierasimczuk, and Smets [2015] provides an alternative structural analysis where identifiable hypotheses are decomposed into mutually exclusive components , as follows.

A hypothesis \(H\) is verirefutable if it is equivalent to the conjunction of a verifiable and a refutable hypothesis (given background knowledge): \(H = (V\) and \(R)\) where \(V\) is verifiable and R is refutable. For example, the hypothesis “exactly 2 swans are black” is verirefutable, since it is equivalent to the conjunction of the verifiable hypothesis “at least 2 swans are white” and the refutable hypothesis “at most 2 swans are white”. The term “verirefutable” is due to [Genin and Kelly 2015]; it signifies that when a verirefutable hypothesis is true, there is some initial condition after which the hypothesis is refutable, that is, the hypothesis will be falsified by data if it is false. Baltag et al. refer to verirefutable hypotheses as locally closed. They establish the following characterization theorem for reliable learning [Baltag et al. 2015].

Theorem . There is a learner that reliably identifies a correct hypothesis from \(\mathbf{H}\) if and only if each hypothesis \(\mathbf{H}\) is equivalent to a finite or countable disjunction of mutually exclusive verirefutable hypotheses.

Since the verirefutable hypotheses are mutually exclusive, they constitute a valid refined hypothesis space whose members entail exactly one of the original hypothesis. The characterization theorem entails that without loss of learning power, inductive methods can transform the original hypothesis space into a verirefutable one. The figure below illustrates the decomposition into verirefutable hypotheses.

two part diagram: link to extended description below

Figure 6 [An extended description of figure 6 is in a supplement.]

A few points will help explain the significance of characterization theorems.

Structure of Reliable Methods . Characterization theorems tell us how the structure of reliable methods is attuned to the structure of the hypotheses under investigation. For example, the theorem mentioned establishes a connection between falsifiability and testability, but one that is more attenuated than the naïve Popperian envisions: it is not necessary that the hypotheses under test be directly falsifiable; rather, there must be ways of strengthening each hypothesis that yield a countable number of refutable “subhypotheses”. We can think of these refutable subhypotheses as different ways in which the main hypothesis may be true. (For example, one way in which “all but finitely many swans are white” is true is if there are are at most 10 black swans; another is if there are at most 100 black swans, etc.). Strengthening the original hypotheses so they become empirically refutable matches the spirit of Lakatos’s methodology in which a general scientific paradigm is articulated with auxiliary hypotheses to define testable (i.e., falsifiable) claims.

Import of Background Assumptions . The characterization result draws a line between the solvable and unsolvable problems. Background knowledge reduces the inductive complexity of a problem; with enough background knowledge, the problem crosses the threshold between the unsolvable and the solvable. In many domains of empirical inquiry, the pivotal background assumptions are those that make reliable inquiry feasible. (Kuhn [1970] makes related points about the importance of background assumptions embodied in a “paradigm”).

Language Invariance . Learning-theoretic characterization theorems concern what Kelly calls the “temporal entanglement” of various observation sequences [Kelly 2000]. Ultimately they rest on entailment relations between given evidence, background assumptions and empirical claims. Since logical entailment does not depend on the language we use to frame evidence and hypotheses, the inductive complexity of an empirical problem as determined by the characterization theorems is language-invariant.

4. The Long Run in the Short Run: Reliable and Stable Beliefs

A longstanding criticism of convergence to the truth as an aim of inquiry is that, while fine in itself, this aim is consistent with any crazy behaviour in the short run [Salmon 1991]. For example, we saw in the New Riddle of Induction that a reliable projection rule can conjecture that the next emerald will be blue no matter how many green emeralds have been found—as long as eventually the rule projects “all emeralds are green”. One response is that if means-ends analysis takes into account other epistemic aims in addition to long-run convergence, then it can provide strong guidance for what to conjecture in the short run.

To illustrate this point, let us return to the Goodmanian Riddle of Induction. Ever since Plato, philosophers have considered the idea that stable true belief is better than unstable true belief, and epistemologists such as Sklar [1975] have advocated similar principles of “epistemic conservatism”. Kuhn tells us that a major reason for conservatism in paradigm debates is the cost of changing scientific beliefs [Kuhn 1970]. In this spirit, learning theorists have examined methods that minimize the number of times that they change their theories before settling on their final conjecture [Putnam 1965, Kelly 1996, Jain 1999]. Such methods are said to minimize mind changes .

The New Riddle of Induction turns out to be a nice illustration of this idea. Consider the natural projection rule (conjecture that all emeralds are green on a sample of green emeralds). If all emeralds are green, this rule never changes its conjecture. And if all emeralds are grue\((t)\) for some critical time \(t\), then the natural projection rule abandons its conjecture “all emeralds are green” at time \(t\)—one mind change—and thereafter correctly projects “all emeralds are grue\((t)\)”. Remarkably, rules that project grue rather than green do not do as well. For example, consider a rule that conjectures that all emeralds are grue(3) after observing one green emerald. If two more green emeralds are observed, the rule’s conjecture is falsified and it must eventually change its mind, say to conjecture that all emeralds are green (supposing that green emeralds continue to be found). But then at that point, a blue emerald may appear, forcing a second mind change. This argument can be generalized to show that the aim of minimizing mind changes allows only the green predicate to be projected on a sample of all green emeralds [Schulte 1999]. We saw in Section 1.2 above how the natural projection rule changes its mind at most once; the figure below illustrates in a typical case how an unnatural projection rule may have to change its mind twice or more.

rule example: link to extended description below

Figure 7 [An extended description of figure 7 is in a supplement.]

The same reasoning applies to the question about whether all ravens are black. The bold generalizer that conjectures that all ravens are black after observing samples of only black ravens succeeds with at most one mind change: if indeed all ravens are black, the generalizer never changes its mind at all. And if there is a nonblack raven, the refutation occasions one mind change, but afterwards the question is settled.

Contrast this with the contrary method that asserts that there is a nonblack raven after observing a sample of all black ones. If only black ravens continue to be observed, the contrary method has to eventually change its mind and assert that “all ravens are black”, or else it fails to arrive at the correct generalization. But then at that point, a nonblack raven may appear, forcing a second mind change. Thus the goal of stable belief places strong constraints on what a method may conjecture in the short run for this problem: on observing only black ravens, the options are “all ravens are black” or “no opinion yet”, but not “there is a nonblack raven”.

In the conservation law problem, the restrictive method described in Section 2.1 is the only method that minimizes mind changes. Recall that the restrictive method adopts a set of conservation laws that rule out as many unobserved reactions as possible. It can be shown that if there are \(n\) known elementary particles whose reactions are observed, this method requires at most \(n\) mind changes. (The number of elementary particles in the Standard Model is around \(n = 200).\)

For learning causal graphs, the following variant of the method described in Section 2.2 minimizes the number of mind changes.

  • If there is a unique causal graph that explains the observed correlations with a minimum number of direct causal links , select this graph.
  • If there is more than one causal graph that explains the observed correlations with a minimum number of direct causal links, output “no opinion yet” (or conjecture the disjunction of the minimum edge graphs).

This example illustrates that sometimes minimizing mind changes requires withholding beliefs. Intuitively, this occurs when there are two or more equally simple explanations of the data, and the inquirer has to wait until further observations decide between these possibilities. Jumping to one of the simple conclusions might lead to an unnecessary mind change in case an alternative equally simple explanation turns out to be correct. In such cases there is a trade-off between the goals of achieving stable belief, on the one hand, and quickly settling on a true belief on the other [Schulte 1999]. We discuss the connection between simplicity and stable belief in the next section on simplicity.

Genin and Kelly [2015] refine the mind change approach by distinguishing different kinds of mind changes.

  • Abandoning a true hypothesis in favor of a false one. This is an undesirable regressive mind change.
  • Abandoning a false hypothesis in favor of a true one. This is a desirable progressive mind change.
  • Abandoning a false hypothesis in favor of another false one.

The table below illustrates these distinctions in the New Riddle of Induction and the Raven example. Genin and Kelly investigate the principle that inductive methods should minimize the number of regressive mind changes , that is, the number of times new evidence leads the method to abandon a true hypothesis in favor of a false one. The notion that regressive mind changes are a mark of epistemic failure matches a long tradition in epistemology. Defeasibility theories of knowledge (see the link in the Other Internet Resources section below) hold that in order for an agent’s true belief to count as knowledge, it must be indefeasible in the sense that accepting further propositions should not lead the agent to abanon her belief. Translated into the language of mind changes, this means that an inquirer’s true current conjecture can count as knowledge only if there is no further evidence that would lead her to change her mind and adopt an alternative false conjecture. Plato’s Meno conveys this point vividly.

Now this is an illustration of the nature of true opinions: while they abide with us they are beautiful and fruitful, but they run away out of the human soul, and do not remain long, and therefore they are not of much value …. But when they are bound, in the first place, they have the nature of knowledge; and, in the second place, they are abiding.

Illustrating regressive and progressive mind changes

While minimizing regressive mind changes is an even more important epistemic goal than avoiding mind changes in general, it leads to weaker strictures on inductive learning. At the same time any strictures that do follow from it carry even more normative force. The table above illustrates the differences between the two principles in the New Riddle of Induction and the Raven problem. In the New Riddle of Induction, if only green emeralds are ever observed, a projection rule may keep projecting any number of gruesome predicates without producing a regressive mind change: it simply abandons one false gruesome predicate for another false gruesome predicate. Therefore even unnatural projection rules incur 0 regressive mind changes, provided they never abandon “the all green hypothesis” once adopted.

The consequences of minimizing regressive mind changes are different for the question of whether all ravens are black. Consider again the contrary method that asserts that there is a nonblack raven after observing a sample of black ones. As shown in the Table and discussed above, the contrary method has to eventually change its hypothesis after seeing more black ravens to conjecture that all ravens are black, and then, upon observing a white raven, return to its true initial hypothesis that there is a nonblack raven. Thus the contrary method undergoes at least one regressive mind change in the worst case. On the other hand, the generalizing method that asserts that all ravens are black after observing a sample of black ones changes its conjecture only when a nonblack raven is observed---a progressive mind change from a false hypothesis to a true hypothesis. Therefore the principle of avoiding regressive mind changes singles out the generalizing method over the contrary one.

As the example illustrates, regressive mind changes are associated with cycles of conjectures . This is because a reliable method must eventually return a true hypothesis after adopting a false one, so a regressive mind change leads to at least one cycle true conjecture-false conjecture-true conjecture. Methods that avoid regressive mind changes are therefore studied under the heading of cycle-free learning [Genin and Kelly 2015] or minimizing U-turns [Carlucci et al. 2005]. Genin and Kelly [2015, 2019] provide a general result that elucidates the general methodological import of avoiding regressive mind changes and cycles of conjectures (described in Section 5.4 ). Their result belongs to a family of theorems that establish a striking connection between avoiding mind changes and Ockham’s razor, which we discuss in the next section.

5. Simplicity, Stable Belief, and Ockham’s Razor

A strong intuition about inductive inference and scientific method is that we should prefer simpler hypotheses over complex ones; see the entry on simplicity . Statisticians, computer scientists, and other researchers concerned with learning from observations have made extensive use of a preference for simplicity to solve practical inductive problems [Domingos 1999]. From a foundational point of view, simplicity is problematic for at least two reasons.

The justification problem: Why adopt simple hypotheses? One obvious answer is that the world is simple and therefore a complex theory is false. However, the apriori claim that the world is simple is highly controversial—see the entry on simplicity . From a learning-theoretic perspective, dismissing complex hypotheses impairs the reliability of inductive methods. In Kelly’s metaphor, a fixed bias is like a stopped watch: We may happen to use the watch when it is pointing at the right time, but the watch is not a reliable instrument for telling time [Kelly 2007a, 2010].

The description problem: Epistemologists have worried that simplicity is not an objective feature of a hypothesis, but rather “depends on the mode of presentation”, as Nozick puts it. Goodman’s Riddle illustrates this point. If generalizations are framed in blue-green terms, “all emeralds are green” appears simpler than “all emeralds are first green and then blue”. But in a grue-bleen language, “all emeralds are grue” appears simpler than “all emeralds are first grue and then bleen”.

Learning theorists have engaged in recent and ongoing efforts to apply means-ends epistemology to develop a theory of the connection between simplicity and induction that addresses these concerns [Kelly 2010, Harman and Kulkarni 2007, Luo and Schulte 2006, Steel 2009]. It turns out that a fruitful perspective is to examine the relationship between the structure of a hypothesis space and the mind change complexity of the corresponding inductive problem. The fundamental idea is that, while simplicity does not enjoy an a priori connection with truth, choosing simple hypotheses can help an inquirer find the truth more efficiently , in the sense of avoiding mind changes. Kelly’s road metaphor illustrates the idea. Consider two routes to the destination, one via a straight highway, the other via back roads. Both routes eventually lead to the same point, but the back roads entail more twists and turns [Kelly 2007a, 2010].

A formalization of this idea takes the form of an Ockham Theorem : A theorem that shows (under appropriate restrictions) that an inductive method finds the truth as efficiently as possible for a given problem if and only if the method is the Ockham method , that is, it selects the simplest hypothesis consistent with the data. An Ockham theorem provides a justification for Ockham’s inductive razor as a means towards epistemic aims.

Whether an Ockham theorem is true depends on the description of the Ockham method, that is, on the exact definition of simplicity for a set of hypotheses. There is a body of mathematical results that establish Ockham theorems using a language-invariant simplicity measure, which we explain next.

Say that a hypothesis \(H\) from a background set of possible hypotheses \(\mathbf{H}\) is verifiable if there is an evidence sequence such that \(H\) is the only hypothesis from \(\mathbf{H}\) that is consistent with the evidence sequence. For example, in the black raven problem above, the hypothesis “there is a nonblack raven” is verifiable since it is entailed by an observation of a nonblack raven. The hypothesis “all ravens are black” is not verifiable, since it is not entailed by any finite evidence sequence. The following procedure assigns a simplicity rank to each hypothesis \(H\) from a set of hypotheses \(\mathbf{H}\) [Apsitis 1994, Luo and Schulte 2006].

  • Assign all verifiable hypotheses simplicity rank 0.
  • Remove the verifiable hypotheses from the hypothesis space to form a new hypothesis space \(\mathbf{H}_1.\)
  • Assign simplicity rank 1 to the hypotheses that are verifiable given \(\mathbf{H}_1.\)
  • Remove the newly verifiable hypotheses with simplicity rank 1 from the hypothesis space to form a new hypothesis space \(\mathbf{H}_2.\)
  • Continue removing hypotheses until no new hypotheses are verifiable given the current hypothesis space.
  • The simplicity rank of each hypothesis \(H\) is the first stage at which it is removed by this procedure. In other words, it is the index of the first restricted hypothesis space that makes \(H\) verifiable.

Hypotheses with higher simplicity rank are regarded as simpler than those with lower ranks. Simplicity ranks are defined in terms of logical entailment relations, hence are language-invariant. Simplicity ranks as defined can be seen as degrees of falsifiability in the following sense. Consider a hypothesis of simplicity rank 1. Such a hypothesis is falsifiable because an evidence sequence that verifies an alternative hypothesis of rank 0 falsifies it. Moreover, a hypothesis of simplicity rank 1 is persistently falsifiable in the sense that it remains falsifiable no matter what evidence sequence consistent with it is observed. A hypothesis of simplicity rank \(n+1\) is persistently falsifiable by hypotheses of rank \(n.\) Let us illustrate the definition in our running examples.

In the Riddle of Induction, the verifiable hypotheses are the grue hypotheses with critical time t: any sequence of t green emeralds followed by blue ones entails the corresponding grue(t) generalization. Thus the grue hypotheses receive simplicity rank 0. After the grue hypotheses are eliminated, the only remaining hypothesis is “all emeralds are green”. Given that it is the only possibility in the restricted hypothesis space, “all emeralds are green” is entailed by any sequence of green emeralds. Therefore “all emeralds are green” has simplicity rank 1. After removing the all green hypothesis, no hypotheses remain.

In the raven color problem, the verifiable hypothesis is “a nonblack raven will be observed”, which receives simplicity rank 0. After removing the hypothesis that a nonblack raven will be observed, the only remaining possibility is that only black ravens will be observed, hence this hypothesis is verifiable in the restricted hypothesis space and receives simplicity rank 1.

The simplicity rank of a causal graph is given by the number of direct links not contained in the graph. Therefore the fewer direct links are posited by the causal model, the higher its simplicity rank.

The simplicity rank of a set of conservation laws is given by the number of independent laws. (Independence in the sense of linear algebra.) Therefore the more nonredundant laws are introduced by a theory, the higher its simplicity rank. Each law rules out some reactions, so maximizing the number of independent laws given the observed reactions is equivalent to ruling out as many unobserved reactions as possible.

The following theorem shows the connection between the mind-change complexity of an inductive problem and the simplicity ranking as defined.

Theorem . Let \(\mathbf{H}\) be a set of empirical hypotheses. Then there is a method that reliably identifies a correct hypothesis from \(\mathbf{H}\) in the limit with at most n mind changes if and only if the elimination procedure defined above terminates with an empty set of hypotheses after n stages.

Thus for an inductive problem to be solvable with at most \(n\) mind changes, the maximum simplicity rank of any possible hypothesis is \(n.\) In the Riddle of Induction, the maximum simplicity rank is 1, and therefore this problem can be solved with at most 1 mind change. The next result provides an Ockham theorem connecting simplicity and mind change performance.

Ockham Theorem . Let \(\mathbf{H}\) be a set of empirical hypotheses with optimal mind change bound n. Then an inductive method is mind change optimal if and only if it satisfies the following conditions. Whenever the method adopts one of the hypotheses from \(\mathbf{H},\) this hypothesis is the uniquely simplest one consistent with the evidence. If the method changes its mind at inquiry time \(t+1\), the uniquely simplest hypothesis at time \(t\) is falsified at time \(t+1.\)

This theorem says that a mind-change optimal method may withhold a conjecture as a skeptic would, but if it does adopt a definite hypothesis, the hypothesis must be the simplest one, in the sense of having the maximum simplicity rank. Thus the mind change optimal methods discussed in Section 4 are all Ockham methods that adopt the simplest hypothesis consistent with the data. The Ockham theorem shows a remarkable reversal from the long-standing objection that long-run reliability imposes too few constraints on short-run conjectures: If we add to long-run convergence to the truth the goal of achieving stable belief, then in fact there is a unique inductive method that achieves this goal in a given empirical problem. Thus the methodological analysis switches from offering no short-run prescriptions to offering a complete prescription.

The previous subsection defines a complete simplicity ranking for every hypothesis under investigation. This means that any hypothesis can be compared to another as simpler or equally simple. A less demanding concept is a partial order, which allows that some hypotheses may simply not be comparable, like apples and oranges. Genin and Kelly [2015] show that the following partial order leads to an Ockham principle for avoiding regressive mind changes (see Section 4.3 ).

  • An observation sequence separates hypothesis \(H_1\) from hypothesis \(H_2\) if the observations are consistent with \(H_1\) and falsify \(H_2\) (given background knowledge).
  • Hypothesis \(H_1\) is inseparable from \(H_2\), written \(H_1 \lt H_2\), if no observation sequence separates \(H_1\) from \(H_2\). Equivalently, \(H_1 \lt H_2\) if and only if any evidence consistent with \(H_1\) is also consistent with \(H_2.\)

The separation terminology is due to Smets et al., who relate it to separation principles in point-set topology. In terms of the epistemological interpretation of point-set topology from Section 3.2 , we have \(H_1 \lt H_2\) if and only if every complete data sequence for \(H_1\) is a boundary point for the data sequences of \(H_2.\) In an epistemologically resonant phrase, Genin and Kelly say that hypothesis \(H_1\) “faces the problem of induction” with respect to \(H_2\) whenever \(H_1 \lt H_2\). This is because whenever \(H_1\) is correct, a reliable learner will have to take an “inductive leap” and conjecture \(H_1\) although any finite amount of evidence is also consistent with \(H_2\).

  • In the raven problem, \(H_1 =\) “all ravens are black” \(\lt H_2 =\) “some raven is not black”. But it is not the case that “some raven is not black” \(\lt\) “all ravens are black” because the observation of a white raven separates \(H_2\) from \(H_1.\)
  • In causal graph learning , if graph \(G_1\) contains a subset of edges (direct causal links) of those in alternative graph \(G_2\), then \(G_1 \lt G_2\). This is because any correlations that can be explained by \(G_1\) can also be explained by the larger graph \(G_2\).
  • In curve fitting, \(L \lt Q\) where \(L\) is the set of linear functions, and \(Q\) is the set of quadratic functions. This is because any set of points that can be fit by a linear function can also be fit by a quadratic function.

These examples suggest that the \(\lt\) partial order corresponds to our intuitive simplicity judgements about empirical hypotheses; Genin and Kelly [2019] provide an extensive defense of this claim. It can be shown that the \(\lt\) ordering agrees with the simplicity ranks defined in the previous subsection, in the sense that if \(H_1 \lt H_2\) but not \(H_2 \lt H_1\), then the simplicity rank of \(H_1\) is less than the rank of \(H_2\). These observations motivate an Ockham principle: An inductive method satisfies the Ockham principle with respect to separability if it always conjectures a maximally simple hypothesis \(H\) consistent with the evidence. In our notation, if an Ockham method adopts a hypothesis \(H\) given a finite observation sequence, then there is no alternative simpler hypothesis \(H'\) such that \(H' \lt H\). That is, every alternative hypothesis \(H'\) will eventually be separated from \(H\) by the evidence if \(H'\) is true. In the raven example, the generalizing method satisfies the Ockham principle, but the contrary method does not, because it adopts \(H_2 =\) “some raven is not black”. The following theorem shows that the connection between the Ockham principle and regressive mind changes is general.

Theorem . If an inductive method avoids conjecture cycles (and hence regressive mind changes), it satisfies the Ockham principle with respect to separability.

For a proof see Genin and Kelly [2015; Theorem 10]. Genin and Kelly also provide sufficient conditions for avoiding conjecture cycles.

While the results in this section establish a fruitful connection between simplicity and mind-change optimality, a limitation of the approach is that it requires that some hypotheses must be conclusively entailed or falsified by some evidence sequence. This is typically not the case for statistical models, where the probability of a hypothesis may become arbitrarily small but usually not 0. For instance, consider a coin flip problem and the hypothesis “the probability of heads is 90%”. If we observe one million tails, the probability of the hypothesis is very small indeed, but it is not 0, because any number of tails is logically consistent with a high probability of heads. The next section discusses how a reliabilist approach can be adapted to statistical hypotheses.

Statistical hypotheses are the most common in practical data-driven decision making, for example in the sciences and engineering. It is therefore important for a philosophical framework of inductive inference to include statistical hypotheses. There are two key differences between statistical hypotheses and the hypotheses sets we have considered so far [Sober 2015].

  • The relationship between observations and a hypothesis is probabilistic, not deductive: A statistical hypothesis assigns a probability to an observation sequence, typically between 0 and 1. A deductive hypothesis is either consistent with an observation sequence or falsified.
  • The analysis of statistical hypotheses typically assumes that observations form a random sample : successive observations are independent of each other and follow the same distribution. It is possible to analyze statistical methods where later observations depend on current observations, but the mathematical complexity of inductive methodology is much greater than with independent data.

Because of these properties, learning theory for nonstatistical methods is a more straightforward framework than statistics for traditional philosophical discussions in epistemology, inductive inference, and the philosophy of science. For example, epistemological discussions of justified true belief concern a deductive concept of belief where the inquirer accepts a proposition, rather than assign a probability to data. Scientific theories typically make a deterministic prediction of future data from past observations (initial conditions), so an independence requirement makes it more difficult to apply a methodological framework to understand scientific inquiry (see our case studies ).

Normative means-ends epistemology can be applied to statistical hypotheses as well as deductive ones. In particular we will discuss how the ideas of reliable convergence to the truth and minimizing regressive mind changes can be adapted to the statistical setting. The key idea is to shift the unit of analysis: Whereas previously we considered the behavior of an inductive method for a specific data sequence, in statistical analysis we consider its aggregate behavior over a set of data sequences of the same length. In particular, we consider the probability that a method conjectures a hypothesis H for a given number of observations \(n.\)

Preliminaries on Statistical Hypotheses

We will illustrate the main ideas with a classic simple example, observing coin flips, and indicate how they can be generalized to more complex hypotheses. For more details please see [Genin and Kelly 2017, Genin 2018]. Suppose that an investigator has a question about the unknown bias \(p\) of a coin, where \(p\) represents the chance that a single flip comes out “Heads”. Different possible hypotheses correspond to different ranges of the bias \(p\), that is, a partition of [0,1], the range of the bias. Let us say that the investigator asks a simple point hypothesis: is the coin fair? Then we have

  • \(H_1 =\) “\(p = 0.5\)”
  • \(H_2 =\) “it is not the case that \(p = 0.5\)”. That is, either \(p \lt 0.5\) or \(p \gt 0.5.\)

Extending our previous terminology, we shall say that a true bias value \(p\) is for a hypothesis H if it lies within the set specified by \(H\). In our example, a bias value p is correct for \(H_1\) if and only if \(p = 0.5\); otherwise \(p\) is correct for \(H_2\). Given a true bias value \(p\), and assuming independence, we can compute a probability for any finite sequence of observations. This probability is known as the sample distribution. For example, for a fair coin with \(p = 0.5,\) the probability of observing 3 heads is \(0.5 \times 0.5 \times 0.5 = 0.125.\) If the chance of heads is 0.7, the probability of observing 3 heads is \(0.7 \times 0.7 \times 0.7 = 0.343.\) Notice how the independence assumption allows us to compute the probability of a sequence of observations as the product of single observation probabilities. Without the independence assumption, we cannot infer the probability of multiple observations from the probability of a single observation, and the sample distribution is not defined.

As usual in this entry, an inductive method conjectures a hypothesis after observing a finite sequence of observations. A method that conjectures a statistical hypothesis is called a statistical test (see the link in the Other Internet Resource section below). The statistical literature provides an extensive collection of computationally efficient statistical tests for different types of statistical hypothesis. In the following discussion we consider the general learning performance of such methods, with respect to reliable convergence to a true hypothesis and avoiding mind changes. Consider a fixed observation length \(n,\) called the sample size . For sample size \(n\), there is a set of samples of length \(n\) such that the method conjectures hypothesis \(H\)given the sample. For example, for \(n = 3\), the method might conjecture \(H_2 =\) “the coin is not fair” after observing 3 heads. The aggregate probability that the method outputs hypothesis \(H\) given some sample of length \(n\), is the sum of the sample probabilities of the samples such that the method conjectures \(H\) given the sample. In the supplement we give example computations of the aggregate probability. Because this aggregate probability is the key quantity for the methodology of statistical hypotheses, we introduce the following notation for it.

\(P_{n,p}(H) =\) the probability that a given inductive method conjectures hypothesis \(H\) after \(n\) observations, given that the true probability of a single observation is \(p\)

In nonstatistical learning, we required a reliable method to eventually settle on true hypothesis after sufficiently many observations. The statistical version of this criterion is that after sufficiently many observations, the chance of conjecturing the true hypothesis should approach 100%. More technically, say that a method identifies a true statistical hypothesis in chance if for every bias value \(p\), and for every threshold \(0\lt t \lt 1\), there is a sample size \(n\) such that for all larger sample sizes, the method conjectures the hypothesis \(H\) that is true for \(p\) with at least probability \(t\). In symbols, we have \(P_{n',p}(H) \gt t\) for all sample sizes \(n' \gt n\), where \(H\) is the hypothesis that is true for \(p\). The figure below illustrates how the chance of conjecturing the true hypothesis increases with sample size, whereas the chance of conjecturing a false hypothesis decreases with sample size. The definition can be generalized to more complex statistical hypotheses by replacing the true bias value \(p\) with a list of parameters.

inductive learning hypothesis

Figure 8 [An extended description of figure 8 is in a supplement.]

The notion of limiting identification in chance is similar to the concept of limiting convergence to a probability estimate in Reichenbach’s pragmatic vindication . Translated to our example, Reichenbach considered inductive rules that output an estimate of the true bias value \(p,\) and required that such a rule converges to the true value, in the sense that for every every bias value \(p\), and for every threshold \(0\lt t \lt 1\), there is a sample size \(n\) such that for all larger sample sizes, with probability 1 the rule outputs an estimate that differs from the true value \(p\) by at most \(t\). In statistics, a method is called consistent if with increasing sample size, the method’s chance of conjecturing a correct answer converges to 100% (see the link in the Other Internet Resources section below). The terminology is unfortunate in that it suggests to philosophical readers a connection with the consistency of a formal proof system. In fact, the statistical concept of consistency has nothing to do with deductive logic; rather, it is a probabilistic analogue of the notion of identification in the limit of inquiry that is the main subject of this entry.

Genin and Kelly provide a characterization theorem that provides necessary and sufficient conditions for a set of statistical hypotheses to be identifiable in chance, analogous to the structural conditions we discussed in Section 3.3 [2017; Theorem 4.3]. Genin [2018] discusses a statistical analogue of the requirement of minimizing mind changes. Recall that a regressive mind change occurs when an inquirer abandons a true hypothesis in favor of a false one Section 4.3 . The probabilistic analogue is a chance reversal , which occurs when the chance of conjecturing a true hypotheses decreases as the sample size increases. For instance, consider the question of whether a vaccine is effective for an infectious disease. Suppose the vaccine manufacture runs a trial with 1000 patients and has designed a statistical method that has a chance of 90% of correctly indicating that the vaccine is effective when that is indeed the case. Now another trial is run with 1500 patients using the same statistical method. A chance reversal would occur if the method’s chance of correctly indicating that the vaccine is effective drops to 80%. As this example illustrates, a chance reversal corresponds to a failure to replicate a true result. A chance reversal is illustrated in the Figure above, where the chance of conjecturing the true hypothesis is smaller for 2 samples than for 3. Although chance reversals are clearly undesirable, they are difficult to avoid, and in fact commonly used statistical methods are liable to such reversals [Genin 2018]. A more feasible goal is to bound the reversals by a threshold \(t\), such that if the chance of conjecturing the truth does decreases with increasing sample size, it decreases by at most \(t.\) (In symbols, \(P_{n,p}(H) - P_{n+1,p}(H) \lt t\) for all sample sizes \(n\) and true bias values \(p\), where \(H\) is the hypothesis correct for \(p\).) Genin [2018] shows that bounded chance reversal are feasible in many situations, and provides an Ockham theorem that elucidates the constraints that bounding chance reversals provides on statistical hypothesis learning.

Kant distinguished between categorical imperatives that one ought to follow regardless of one’s personal aim and circumstances, and hypothetical imperatives that direct us to employ our means towards our chosen end. One way to think of learning theory is as the study of hypothetical imperatives for empirical inquiry. Many epistemologists have proposed various categorical imperatives for inductive inquiry, for example in the form of an “inductive logic” or norms of “epistemic rationality”. In principle, there are three possible relationships between hypothetical and categorical imperatives for empirical inquiry.

1. The categorical imperative will lead an inquirer to obtain his cognitive goals. In that case means-ends analysis vindicates the categorical imperative. For example, when faced with a simple universal generalization such as “all ravens are black”, we saw above that following the Popperian recipe of adopting the falsifiable generalization and sticking to it until a counterexample appears leads to a reliable method.

2. The categorical imperative may prevent an inquirer from achieving his aims. In that case the categorical imperative restricts the scope of inquiry. For example, in the case of the two alternative generalizations with exceptions, the principle of maintaining a universal generalization until it is falsified leads to an unreliable method (cf. [Kelly 1996, Ch. 9.4]).

3. Some methods meet both the categorical imperative and the goals of inquiry, and others don’t. Then we may take the best of both worlds and choose those methods that attain the goals of inquiry and satisfy categorical imperatives. (See the further discussion in this section.)

For a proposed norm of inquiry, we can apply means-ends analysis to ask whether the norm helps or hinders the aims of inquiry. This was the spirit of Putnam’s critique of Carnap’s confirmation functions [Putnam 1963]: the thrust of his essay was that Carnap’s methods were not as reliable in detecting general patterns as other methods would be. More recently, learning theorists have investigated the power of Bayesian conditioning (see the entry on Bayesian epistemology ). John Earman has conjectured that if there is any reliable method for a given problem, then there is a reliable method that proceeds by Bayesian updating [Earman 1992, Ch.9, Sec.6]. Cory Juhl [1997] provided a partial confirmation of Earman’s conjecture: He proved that it holds when there are only two potential evidence items (e.g., “emerald is green” vs. “emerald is blue”). The general case is still open.

Epistemic conservatism is a methodological norm that has been prominent in philosophy at least since Quine’s notion of “minimal mutilation” of our beliefs [1951]. One version of epistemic conservatism, as we saw above, holds that inquiry should seek stable belief. Another formulation, closer to Quine’s, is the general precept that belief changes in light of new evidence should be minimal. Fairly recent work in philosophical logic has proposed a number of criteria for minimal belief change known as the AGM axioms [Gärdenfors 1988]. Learning theorists have shown that whenever there is a reliable method for investigating an empirical question, there is one that proceeds via minimal changes (as defined by the AGM postulates). The properties of reliable inquiry with minimal belief changes are investigated in [Martin and Osherson 1998, Kelly 1999, Baltag et al. 2011, Baltag et al. 2015].

Much of computational learning theory focuses on inquirers with bounded rationality , that is, agents with cognitive limitations such as a finite memory or bounded computational capacities. Many categorical norms that do not interfere with empirical success for logically omniscient agents nonetheless limit the scope of cognitively bounded agents. For example, consider the norm of consistency: Believe that a hypothesis is false as soon as the evidence is logically inconsistent with it. The consistency principle is part of both Bayesian confirmation theory and AGM belief revision. Kelly and Schulte [1995] show that consistency prevents even agents with infinitely uncomputable cognitive powers from reliably assessing certain hypotheses. The moral is that if a theory is sufficiently complex, agents who are not logically omniscient may be unable to determine immediately whether a given piece of evidence is consistent with the theory, and need to collect more data to detect the inconsistency. But the consistency principle—and a fortiori, Bayesian updating and AGM belief revision— do not acknowledge the usefulness of “wait and see more” as a scientific strategy.

More reflection on these and other philosophical issues in means-ends epistemology can be found in sources such as Huber [2018], [Glymour 1991], [Kelly 1996, Chs. 2,3], [Glymour and Kelly 1992], [Kelly et al . 1997], [Glymour 1994], [Bub 1994]. Of particular interest in the philosophy of science may be learning-theoretic models that accommodate historicist and relativist conceptions of inquiry, chiefly by expanding the notion of an inductive method so that methods may actively select paradigms for inquiry; for more details on this topic, see [Kelly 2000, Kelly 1996, Ch.13]. Booklength introductions to the mathematics of learning theory are [Kelly 1996, Martin and Osherson 1998, Jain et al. 1999]. “Induction, Algorithmic Learning Theory and Philosophy” is a recent collection of writings on learning theory [Friend et al. 2007]. Contributions include introductory papers (Harizanov, Schulte), mathematical advances (Martin, Sharma, Stephan, Kalantari), philosophical reflections on the strengths and implications of learning theory (Glymour, Larvor, Friend), applications of the theory to philosophical problems (Kelly), and a discussion of learning-theoretic thinking in the history of philosophy (Goethe).

  • Abramsky, S., 1987. Domain Theory and the Logic of Observable Properties , Ph.D. Dissertation, University of London.
  • Apsitis, K., 1994. “Derived sets and inductive inference”, in Proceedings of the 5th International Work on Algorithmic Learning Theory , S. Arikawa, K.P. Jantke (eds.), Berlin, Heidelberg: Springer, pp. 26–39.
  • Baltag, A. and Smets, S., 2011. “Keep changing your beliefs, aiming for the truth”, Erkenntnis , 75(2): 255–270.
  • Baltag, A., Gierasimczuk, N., Smets, S., 2015. “On the Solvability of Inductive Problems: A Study in Epistemic Topology”, Proceedings of the 15th Conference on Theoretical Aspects of Rationality and Knowledge (TARK 2015), , pp. 65–74. Electronic Proceedings in Theoretical Computer Science available online .
  • Bub, J., 1994. “Testing Models of Cognition Through the Analysis of Brain-Damaged Performance”, British Journal for the Philosophy of Science , 45: 837–55.
  • Carlucci, L., Case, J., Jain, S. and Stephan, F., 2005. “Non U-shaped vacillatory and team learning”, in International Conference on Algorithmic Learning Theory , Berlin, Heidelberg: Springer, pp. 241–255.
  • Chart, D., 2000. “Schulte and Goodman’s Riddle”, British Journal for the Philosophy of Science , 51: 837–55.
  • de Brecht, M. and Yamamoto, A., 2008. “Topological properties of concept spaces”, in International Conference on Algorithmic Learning Theory , Berlin, Heidelberg: Springer, pp. 374–388.
  • Domingos, P., 1999. “The role of Occam’s razor in knowledge discovery”, Data mining and Knowledge discovery , 3(4): 409–425.
  • Earman, J., 1992. Bayes or Bust? , Cambridge, Mass.: MIT Press.
  • Feynman, R., 1965. The Character of Physical Law , Cambridge, Mass.: MIT Press; 19th edition, 1990.
  • Friend, M. and N. Goethe and V. Harazinov (eds.), 2007. Induction, Algorithmic Learning Theory, and Philosophy , Dordrecht: Springer, pp. 111–144.
  • Ford, K., 1963. The World of Elementary Particles , New York: Blaisdell Publishing.
  • Gärdenfors, P., 1988. Knowledge In Flux: modeling the dynamics of epistemic states , Cambridge, Mass.: MIT Press.
  • Genin, K., 2018. “The Topology of Statistical Inquiry”, Ph.D. Dissertation, Department of Philosophy, Carnegie Mellon University, Genin 2018 available online.
  • Genin, K. and Kelly, K., 2015. “Theory Choice, Theory Change, and Inductive Truth-Conduciveness”, Proceedings of the 15th Conference on Theoretical Aspects of Rationality and Knowledge (TARK 2015). Publisher: Electronic Proceedings in Theoretical Computer Science. Extended Abstract, Genin & Kelly 2015 available online.
  • –––, 2017. “The Topology of Statistical Verifiability”, Proceedings of the 17th Conference on Theoretical Aspects of Rationality and Knowledge (TARK 2017). Electronic Proceedings in Theoretical Computer Science, preprint available online .
  • –––, 2019. “Theory Choice, Theory Change, and Inductive Truth-Conduciveness”, Studia Logica , 107: 949–989.
  • Glymour, C., 1991. “The Hierarchies of Knowledge and the Mathematics of Discovery”, Minds and Machines , 1: 75–95.
  • –––, 1994. “On the Methods of Cognitive Neuropsychology”, British Journal for the Philosophy of Science , 45: 815–35.
  • Glymour, C. and Kelly, K., 1992. “Thoroughly Modern Meno”, in Inference, Explanation and Other Frustrations , John Earman (ed.), Berkeley: University of California Press.
  • Gold, E., 1967. “Language Identification in the Limit”, Information and Control , 10: 447–474.
  • Goodman, N., 1983. Fact, Fiction and Forecast , Cambridge, MA: Harvard University Press.
  • Harrell, M., 2000. Chaos and Reliable Knowledge , Ph.D. Dissertation, University of California at San Diego.
  • Harman, G. and Kulkarni, S., 2007. Reliable Reasoning: Induction and Statistical Learning Theory , Cambridge, MA: The MIT Press.
  • Huber, F., 2018. A Logical Introduction to Probability and Induction , Oxford: Oxford University Press.
  • Jain, S., et al., 1999. Systems That Learn , 2 nd edition, Cambridge, MA: MIT Press.
  • James, W., 1982. “The Will To Believe”, in Pragmatism , H.S. Thayer (ed.), Indianapolis: Hackett.
  • Juhl, C., 1997. “Objectively Reliable Subjective Probabilities”, Synthese , 109: 293–309.
  • Kelly, K., 1996. The Logic of Reliable Inquiry , Oxford: Oxford University Press.
  • –––, 1999. “ Iterated Belief Revision, Reliability, and Inductive Amnesia”, Erkenntnis , 50: 11–58.
  • –––, 2000. “The Logic of Success”, British Journal for the Philosophy of Science , 51(4): 639–660.
  • –––, 2007a. “How Simplicity Helps You Find the Truth Without Pointing at it”, in Induction, Algorithmic Learning Theory, and Philosophy , M. Friend, N. Goethe and V. Harazinov (eds.), Dordrecht: Springer, pp. 111–144.
  • –––, 2008. ‘Ockham’s Razor, Truth, and Information’, in Handbook of the Philosophy of Information , J. van Behthem and P. Adriaans (eds.), Dordrecht: Elsevier.
  • –––, 2010. “Simplicity, Truth, and Probability”, in Handbook for the Philosophy of Statistics , Prasanta S. Bandyopadhyay and Malcolm Forster (eds.), Dordrecht: Elsevier.
  • Kelly, K., and Schulte, O., 1995. “The Computable Testability of Theories Making Uncomputable Predictions”, Erkenntnis , 43: 29–66.
  • Kelly, K., Schulte, O. and Juhl, C., 1997. “Learning Theory and the Philosophy of Science”, Philosophy of Science , 64: 245–67.
  • Kuhn, T., 1970. The Structure of Scientific Revolutions . Chicago: University of Chicago Press.
  • Luo, W. and Schulte O., 2006. “Mind Change Efficient Learning”, in Logic and Computation , 204: 989–1011.
  • Martin, E. and Osherson, D., 1998. Elements of Scientific Inquiry , Cambridge, MA: MIT Press.
  • Ne’eman, Y. and Kirsh, Y., 1983. The Particle Hunters , Cambridge: Cambridge University Press.
  • Omnes, R., 1971. Introduction to Particle Physics , London, New York: Wiley Interscience.
  • Popper, Karl, 1962. Conjectures and refutations. The growth of scientific knowledge , New York: Basic Books.
  • Putnam, H., 1963. “Degree of Confirmation and Inductive Logic”, in The Philosophy of Rudolf Carnap , P.A. Schilpp (ed.), La Salle, Ill: Open Court.
  • Putnam, H., 1965. “Trial and Error Predicates and the Solution to a Problem of Mostowski”, Journal of Symbolic Logic , 30(1): 49–57.
  • Quine, W., 1951. “Two Dogmas of Empiricism”, Philosophical Review , 60: 20–43.
  • Salmon, W., 1991. “Hans Reichenbach’s Vindication of Induction”, Erkenntnis , 35: 99–122.
  • Schulte, O., 1999. “Means-Ends Epistemology”, The British Journal for the Philosophy of Science , 50: 1–31.
  • –––, 2008. “The Co-Discovery of Conservation Laws and Particle Families”, Studies in History and Philosophy of Modern Physics , 39(2): 288–314.
  • –––, 2009. “Simultaneous Discovery of Conservation Laws and Hidden Particles With Smith Matrix Decomposition”, in Proceedings of the Twenty-First International Joint Conference on Artificial Intelligence (IJCAI-09), Palo Alto: AAAI Press pp. 1481-1487.
  • Schulte, O., Luo, W., and Greiner, R., 2007. “Mind Change Optimal Learning of Bayes Net Structure”, in Proceedings of the 20th Annual Conference on Learning Theory (COLT’07, San Diego, CA, June 12–15), N. Bshouti and C. Gentile (eds.), Berlin, Heidelberg: Springer, pp. 187–202.
  • Schulte, O., and Cory Juhl, 1996. “Topology as Epistemology”, The Monist , 79(1): 141–147.
  • Sklar, L., 1975. “Methodological Conservatism”, Philosophical Review , 84: 374–400.
  • Sober, E., 2015. Ockham’s Razors , Cambridge: Cambridge University Press.
  • Spirtes, P., Glymour, C., Scheines, R., 2000. Causation, prediction, and search , Cambridge, MA: MIT Press.
  • Steel, D., 2009. “Testability and Ockham’s Razor: How Formal and Statistical Learning Theory Converge in the New Riddle of Induction,” Journal of Philosophical Logic , 38: 471–489.
  • –––, 2010. “What if the principle of induction is normative? Formal learning theory and Hume’s problem”, International Studies in the Philosophy of Science , 24(2): 171–185.
  • Valiant, L. G., 1984. “A theory of the learnable”, Proceedings of the Sixteenth Annual ACM Symposium on Theory of Computing (STOC 84), New York: ACM Press, pp. 436–445.
  • Vickers, S., 1996. Topology Via Logic , Cambridge: Cambridge University Press.
How to cite this entry . Preview the PDF version of this entry at the Friends of the SEP Society . Look up topics and thinkers related to this entry at the Internet Philosophy Ontology Project (InPhO). Enhanced bibliography for this entry at PhilPapers , with links to its database.
  • Learning Theory in Computer Science
  • Inductive Logic Website on Formal Learning Theory and Belief Revision
  • Defeasibility analyses , Section 2 of the entry on the Gettier Problem in the Routledge Encyclopedia of Philosophy .
  • Statistical hypothesis testing , entry in Wikipedia .
  • Consistency (in Statistics) entry in Wikipedia .

confirmation | epistemology: Bayesian | induction: problem of | James, William | logic: inductive | Peirce, Charles Sanders | Popper, Karl | simplicity | underdetermination, of scientific theories

Copyright © 2022 by Oliver Schulte < oschulte @ sfu . ca >

  • Accessibility

Support SEP

Mirror sites.

View this site from another server:

  • Info about mirror sites

The Stanford Encyclopedia of Philosophy is copyright © 2024 by The Metaphysics Research Lab , Department of Philosophy, Stanford University

Library of Congress Catalog Data: ISSN 1095-5054

Help | Advanced Search

Computer Science > Machine Learning

Title: hypothesis search: inductive reasoning with language models.

Abstract: Inductive reasoning is a core problem-solving capacity: humans can identify underlying principles from a few examples, which can then be robustly generalized to novel scenarios. Recent work has evaluated large language models (LLMs) on inductive reasoning tasks by directly prompting them yielding "in context learning." This can work well for straightforward inductive tasks, but performs very poorly on more complex tasks such as the Abstraction and Reasoning Corpus (ARC). In this work, we propose to improve the inductive reasoning ability of LLMs by generating explicit hypotheses at multiple levels of abstraction: we prompt the LLM to propose multiple abstract hypotheses about the problem, in natural language, then implement the natural language hypotheses as concrete Python programs. These programs can be directly verified by running on the observed examples and generalized to novel inputs. Because of the prohibitive cost of generation with state-of-the-art LLMs, we consider a middle step to filter the set of hypotheses that will be implemented into programs: we either ask the LLM to summarize into a smaller set of hypotheses, or ask human annotators to select a subset of the hypotheses. We verify our pipeline's effectiveness on the ARC visual inductive reasoning benchmark, its variant 1D-ARC, and string transformation dataset SyGuS. On a random 40-problem subset of ARC, our automated pipeline using LLM summaries achieves 27.5% accuracy, significantly outperforming the direct prompting baseline (accuracy of 12.5%). With the minimal human input of selecting from LLM-generated candidates, the performance is boosted to 37.5%. (And we argue this is a lower bound on the performance of our approach without filtering.) Our ablation studies show that abstract hypothesis generation and concrete program representations are both beneficial for LLMs to perform inductive reasoning tasks.

Submission history

Access paper:.

  • Other Formats

References & Citations

  • Google Scholar
  • Semantic Scholar

BibTeX formatted citation

BibSonomy logo

Bibliographic and Citation Tools

Code, data and media associated with this article, recommenders and search tools.

  • Institution

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs .

A theory of formal synthesis via inductive learning

  • Original Article
  • Published: 15 February 2017
  • Volume 54 , pages 693–726, ( 2017 )

Cite this article

  • Susmit Jha 1 &
  • Sanjit A. Seshia 2  

1355 Accesses

56 Citations

3 Altmetric

Explore all metrics

Formal synthesis is the process of generating a program satisfying a high-level formal specification. In recent times, effective formal synthesis methods have been proposed based on the use of inductive learning. We refer to this class of methods that learn programs from examples as formal inductive synthesis. In this paper, we present a theoretical framework for formal inductive synthesis. We discuss how formal inductive synthesis differs from traditional machine learning. We then describe oracle-guided inductive synthesis (OGIS), a framework that captures a family of synthesizers that operate by iteratively querying an oracle. An instance of OGIS that has had much practical impact is counterexample-guided inductive synthesis (CEGIS). We present a theoretical characterization of CEGIS for learning any program that computes a recursive language. In particular, we analyze the relative power of CEGIS variants where the types of counterexamples generated by the oracle varies. We also consider the impact of bounded versus unbounded memory available to the learning algorithm. In the special case where the universe of candidate programs is finite, we relate the speed of convergence to the notion of teaching dimension studied in machine learning theory. Altogether, the results of the paper take a first step towards a theoretical foundation for the emerging field of formal inductive synthesis.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price includes VAT (Russian Federation)

Instant access to the full article PDF.

Rent this article via DeepDyve

Institutional subscriptions

Similar content being viewed by others

inductive learning hypothesis

Abstract Learning Frameworks for Synthesis

inductive learning hypothesis

Machine learning and logic: a new frontier in artificial intelligence

Vijay Ganesh, Sanjit A. Seshia & Somesh Jha

inductive learning hypothesis

Satisfiability and Synthesis Modulo Oracles

CEGIS techniques in literature [ 37 , 60 ] initiate search for correct program using positive examples and use specification to obtain additional positive examples corresponding to counterexamples.

Note that we can extend this definition to include counterexamples of size bounded by that of the largest positive example seen so far plus a constant. The proof arguments given in Sect.  5 continue to work with only minor modifications.

This holds due to the specialization of \(\Phi \) to a partial specification, and as a trace property. For general \(\Phi \) , the learner need not exclude all counterexamples.

In this framework, a synthesis engine is only required to converge to the correct concept without requiring it to recognize it has converged and terminate. For a finite concept or language, termination can be trivially guaranteed when the oracle is assumed to be non-redundant and does not repeat examples.

Alur, R., Bodik, R., Juniwal, G., Martin, M.M.K., Raghothaman, M., Seshia, S.A., Singh, R., Solar-Lezama, A., Torlak, E., Udupa, A.: Syntax-Guided Synthesis . In: Proceedings of the IEEE International Conference on Formal Methods in Computer-Aided Design (FMCAD) (2013)

Angluin, D.: Inductive inference of formal languages from positive data. Inf. Control 45 , 117–135 (1980). doi: 10.1016/S0019-9958(80)90285-5

Article   MathSciNet   MATH   Google Scholar  

Angluin, D.: Learning regular sets from queries and counterexamples. Inf. Comput. 75 (2), 87–106 (1987)

Angluin, D.: Queries and concept learning. Mach. Learn. 2 (4), 319–342 (1988). doi: 10.1023/A:1022821128753

MathSciNet   Google Scholar  

Angluin, D.: Queries revisited. Theoretical computer science. Algorithmic learning theory 313 (2), 175–194 (2004). doi: 10.1016/j.tcs.2003.11.004 . http://www.sciencedirect.com/science/article/pii/S030439750300608X

Angluin, D., Smith, C.H.: Inductive inference: theory and methods. ACM Comput. Surv. 15 , 237–269 (1983)

Article   MathSciNet   Google Scholar  

Atig, M.F., Bouajjani, A., Qadeer, S.: Context-bounded analysis for concurrent programs with dynamic creation of threads. Log. Methods Comput. Sci. (2011). doi: 10.2168/LMCS-7(4:4)2011

MathSciNet   MATH   Google Scholar  

Barrett, C., Sebastiani, R., Seshia, S.A., Tinelli, C.: Satisfiability modulo theories. In: Biere, A., van Maaren, H., Walsh, T. (eds.) Handbook of Satisfiability, Chapter 8, vol. 4. IOS Press, Amsterdam (2009)

Google Scholar  

Bengio, Y., Goodfellow, I.J., Courville, A.: Deep Learning . Book in preparation for MIT Press (2015). http://www.iro.umontreal.ca/~bengioy/dlbook

Biere, A.: Bounded model checking. In: Handbook of Satisfiability , pp. 457–481 (2009). doi: 10.3233/978-1-58603-929-5-457

Blum, L., Blum, M.: Toward a mathematical theory of inductive inference. Inf. Control 28 (2), 125–155 (1975). doi: 10.1016/s0019-9958(75)90261-2

Blumer, A., Ehrenfeucht, A., Haussler, D., Warmuth, M.K.: Learnability and the Vapnik–Chervonenkis dimension. J. ACM 36 (4), 929–965 (1989). doi: 10.1145/76359.76371

Bryant, R.E.: Graph-based algorithms for Boolean function manipulation. IEEE Trans. Comput. C–35 (8), 677–691 (1986)

Article   MATH   Google Scholar  

Chen, Y., Safarpour, S., Marques-Silva, J.: Automated design debugging with maximum satisfiability. IEEE Trans. CAD Integr. Circuits Syst. 29 (11), 1804–1817 (2010). doi: 10.1109/TCAD.2010.2061270

Article   Google Scholar  

Clarke, E.M., Emerson, E.A.: Design and synthesis of synchronization skeletons using branching-time temporal logic. In: Kozen, D. (ed.) Logic of Programs, Workshop. Springer, London (1981)

Clarke, E.M., Grumberg, O., Peled, D.A.: Model Checking. MIT Press, Cambridge (2000)

Clarkson, M.R., Schneider, F.B.: Hyperproperties. J. Comput. Secur. 18 (6), 1157–1210 (2010)

Giannakopoulou, D., Pasareanu, C.S. (eds.): Special issue on learning techniques for compositional reasoning. Formal Methods in System Design 32 (3), pp. 173–174 (2008)

Gold, E.M.: Language identification in the limit. Inf. Control 10 (5), 447–474 (1967). doi: 10.1016/S0019-9958(67)91165-5

Goldman, S.A., Kearns, M.J.: On the complexity of teaching. J. Comput. Syst. Sci. 50 , 303–314 (1992)

Goldman, S.A., Rivest, R.L., Schapire, R.E.: Learning binary relations and total orders. SIAM J. Comput. 22 (5), 1006–1034 (1993). doi: 10.1137/0222062

Gordon, M.J.C., Melham, T.F.: Introduction to HOL: A Theorem Proving Environment for Higher-Order Logic. Cambridge University Press, Cambridge (1993)

MATH   Google Scholar  

Grebenshchikov, S., Lopes, N.P., Popeea, C., Rybalchenko, A.: Synthesizing software verifiers from proof rules. In: ACM SIGPLAN Notices , 47, ACM, pp. 405–416 (2012)

Gulwani, S., Jha, S., Tiwari, A., Venkatesan, R.: Synthesis of loop-free programs. In: PLDI , pp. 62–73 (2011). doi: 10.1145/1993498.1993506

Hegedűs, T.: Geometrical concept learning and convex polytopes. In: Proceedings of the Seventh Annual Conference on Computational Learning Theory, COLT ’94, ACM, New York, NY, USA, pp. 228–236 (1994). doi: 10.1145/180139.181124

Jackson, J.C.: An efficient membership-query algorithm for learning DNF with respect to the uniform distribution. J. Comput. Syst. Sci. 55 (3), 414–440 (1997). doi: 10.1006/jcss.1997.1533

Jain, S.: Systems that Learn: An Introduction to Learning Theory. MIT Press, Cambridge (1999)

Jain, S., Kinber, E.: Iterative learning from positive data and negative counterexamples. Inf. Comput. 205 (12), 1777–1805 (2007). doi: 10.1016/j.ic.2007.09.001

Jantke, K.P., Beick, H.-R.: Combining Postulates of Naturalness in Inductive Inference. Elektronische Informationsverarbeitung und Kybernetik 17 (8/9), 465–484 (1981)

Jha, S., Seshia, S.A.: A theory of formal synthesis via inductive learning. ArXiv e-prints (2015)

Jha, S., Gulwani, S., Seshia, S.A., Tiwari, A.: Oracle-guided Component-based Program Synthesis. ICSE ’10, ACM, New York, NY, USA, pp. 215–224 (2010). doi: 10.1145/1806799.1806833

Jha, S., Gulwani, S., Seshia, S.A., Tiwari, A.: Synthesizing switching logic for safety and dwell-time requirements. In: Proceedings of the International Conference on Cyber-Physical Systems (ICCPS), pp. 22–31 (2010)

Jha, S., Seshia, S.A.: Are there good mistakes? a theoretical analysis of CEGIS. In: 3rd Workshop on Synthesis (SYNT) (2014)

Jha, S., Seshia, S.A., Tiwari, A.: Synthesis of optimal switching logic for hybrid systems. In: Proceedings of the international conference on embedded software (EMSOFT), pp. 107–116 (2011)

Jha, S., Seshia, S.A., Zhu, X.: On the teaching dimension of octagons for formal synthesis. In: 5th Workshop on Synthesis (SYNT) (2016)

Jha, S.K.: Towards automated system synthesis using SCIDUCTION. Ph.D. thesis, EECS Department, University of California, Berkeley (2011). http://www.eecs.berkeley.edu/Pubs/TechRpts/2011/EECS-2011-118.html

Jin, X., Donzé, A., Deshmukh, J., Seshia, S.A.: Mining requirements from closed-loop control models. In: HSCC (2013)

Kaufmann, M., Manolios, P., Moore, J.S.: Computer-Aided Reasoning: An Approach. Kluwer Academic Publishers, Dordrecht (2000)

Kuncak, V., Mayer, M., Piskac, R., Suter, P.: Software synthesis procedures. Commun. ACM 55 (2), 103–111 (2012)

Lange, S.: Algorithmic Learning of Recursive Languages. Mensch-und-Buch-Verlag, Berlin (2000)

Lange, S., Zeugmann, T., Zilles, S.: Learning indexed families of recursive languages from positive data: a survey. Theor. Comput. Sci. 397 (1–3), 194–232 (2008). doi: 10.1016/j.tcs.2008.02.030

Lange, S., Zilles, S.: Formal language identification: query learning vs. gold-style learning. Inf. Process. Lett. 91 (6), 285–292 (2004). doi: 10.1016/j.ipl.2004.05.010

Li, W.: Specification mining: new formalisms, algorithms and applications. Ph.D. thesis, EECS Department, University of California, Berkeley (2014)

Li, W., Dworkin, L., Seshia, S.A.: Mining assumptions for synthesis. In: 2011 9th IEEE/ACM International Conference on Formal Methods and Models for Codesign (MEMOCODE), pp. 43–50 (2011)

Malik, S., Zhang, L.: Boolean satisfiability: from theoretical hardness to practical success. Commun. ACM (CACM) 52 (8), 76–82 (2009). doi: 10.1145/1536616.1536637

Manna, Z., Waldinger, R.: A deductive approach to program synthesis. ACM Trans. Program. Lang. Syst. 2 (1), 90–121 (1980). doi: 10.1145/357084.357090

Mitchell, T.M.: Machine Learning, 1st edn. McGraw-Hill Inc, New York (1997)

Morgado, A., Liffiton, M., Marques-Silva, J.: MaxSAT-based MCS enumeration. In: Biere, A., Nahir, A., Vos, T. (eds.) Hardware and Software: Verification and Testing , Lecture Notes in Computer Science 7857, Springer Berlin Heidelberg, pp. 86–101 (2013). doi: 10.1007/978-3-642-39611-3_13

Owre, S., Rushby, J.M., Shankar, N.: PVS: a prototype verification system. In: Kapur, D., (ed.) In: 11th International Conference on Automated Deduction (CADE), Lecture Notes in Artificial Intelligence 607, Springer-Verlag, pp. 748–752 (1992)

Pnueli, A., Rosner, R.: On the synthesis of a reactive module. In: ACM Symposium on Principles of Programming Languages (POPL), pp. 179–190 (1989)

Queille, J.-P., Sifakis, J.: Specification and verification of concurrent systems in CESAR. In: Symposium on programming, LNCS 137 , pp. 337–351 (1982)

Quinlan, J.R.: Induction of decision trees. Mach. Learn. 1 (1), 81–106 (1986). doi: 10.1023/A:1022643204877

Rogers Jr., H.: Theory of Recursive Functions and Effective Computability. MIT Press, Cambridge (1987)

Salzberg, S., Delcher, A.L., Heath, D., Kasif, S.: Best-case results for nearest-neighbor learning. IEEE Trans. Pattern Anal. Mach. Intell. 17 (6), 599–608 (1995). doi: 10.1109/34.387506

Seshia, S.A.: Sciduction: combining induction, deduction, and structure for verification and synthesis. In: Proceedings of the Design Automation Conference (DAC), pp. 356–365 (2012)

Seshia, S.A.: Combining induction, deduction, and structure for verification and synthesis. Proc. IEEE 103 (11), 2036–2051 (2015)

Shapiro, E.Y.: Algorithmic Program Debugging. MIT Press, Cambridge (1982)

Shinohara, A., Miyano, S.: Teachability in computational learning. In: ALT, pp. 247–255 (1990)

Solar-Lezama, A., Rabbah, R., Bodík, R., Ebcioglu, K.: Programming by sketching for bit-streaming programs. In: PLDI (2005)

Solar-Lezama, A., Tancau, L., Bodk, R., Seshia, S.A., Saraswat, V.A.: Combinatorial sketching for finite programs. In: ASPLOS, pp. 404–415 (2006). doi: 10.1145/1168857.1168907

Srivastava, S., Gulwani, S., Foster, J.S.: From program verification to program synthesis. In: Proceedings of ACM Symposium on Principles of Programming Languages, pp. 313–326 (2010)

Summers, P.D.: A methodology for LISP program construction from examples. J. ACM 24 (1), 161–175 (1977)

Udupa, A., Raghavan, A., Deshmukh, J.V., Mador-Haim, S., Martin, M.M.K., Alur, R.: Transit : Specifying protocols with concolic snippets, In: Proceedings of the 34th ACM SIGPLAN conference on Programming Language Design and Implementation, pp. 287–296 (2013)

Valiant, L.G.: A theory of the learnable. Commun. ACM 27 , 1134–1142 (1984)

Vapnik, V.N., Chervonenkis, A.Y.: On the uniform convergence of relative frequencies of events to their probabilities 16(2), pp. 264–280 (1971). doi: 10.1137/1116025

Weisberg, S.: Applied linear regression, 3rd edn. Wiley, Hoboken (2005). http://www.stat.umn.edu/alr

Wiehagen, R.: Limit detection of recursive functions by specific strategies. Electron. Inf. Process. Cybernet. 12 (1/2), 93–99 (1976)

Wiehagen, R.: A thesis in inductive inference. In: Dix, J., Jantke, K.P., Schmitt, P.H. (eds.) Nonmonotonic and inductive logic, Lecture Notes in Computer Science 543, Springer, pp. 184–207 (1990). doi: 10.1007/BFb0023324

Winskel, G.: The Formal Semantics of Programming Languages: An Introduction. MIT Press, Cambridge (1993)

Download references

Acknowledgements

We thank the anonymous reviewers for their detailed and helpful comments. This work was supported in part by the National Science Foundation (Grants CCF-1139138 and CNS-1545126), DARPA under agreement number FA8750-16-C-0043, the Toyota Motor Corporation under the CHESS center, a gift from Microsoft Research, and the TerraSwarm Research Center, one of six centers of STARnet, a Semiconductor Research Corporation program sponsored by MARCO and DARPA.

Author information

Authors and affiliations.

SRI International, Menlo Park, CA, USA

EECS Department, UC Berkeley, Berkeley, CA, USA

Sanjit A. Seshia

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Susmit Jha .

Rights and permissions

Reprints and permissions

About this article

Jha, S., Seshia, S.A. A theory of formal synthesis via inductive learning. Acta Informatica 54 , 693–726 (2017). https://doi.org/10.1007/s00236-017-0294-5

Download citation

Received : 23 April 2015

Accepted : 23 January 2017

Published : 15 February 2017

Issue Date : November 2017

DOI : https://doi.org/10.1007/s00236-017-0294-5

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Find a journal
  • Publish with us
  • Track your research

Trending Articles on Technical and Non Technical topics

  • Selected Reading
  • UPSC IAS Exams Notes
  • Developer's Best Practices
  • Questions and Answers
  • Effective Resume Writing
  • HR Interview Questions
  • Computer Glossary

Difference between inductive and deductive learning

Introduction.

In the field of artificial intelligence known as machine learning, algorithms are developed that can learn from data and make judgments or predictions without being explicitly programmed. Inductive learning and deductive learning are the two main methods used in machine learning. Although either strategy may be used to build models that rely on data for choices or predictions, the techniques used to do so vary. We'll examine the distinction between inductive and deductive learning in this article.

Inductive Learning

An technique of machine learning called inductive learning trains a model to generate predictions based on examples or observations. During inductive learning, the model picks up knowledge from particular examples or instances and generalizes it such that it can predict outcomes for brand-new data.

When using inductive learning, a rule or method is not explicitly programmed into the model. Instead, the model is trained to spot trends and connections in the input data and then utilize this knowledge to predict outcomes from fresh data. Making a model that can precisely anticipate the result of subsequent instances is the aim of inductive learning.

In supervised learning situations, where the model is trained using labeled data, inductive learning is frequently utilized. A series of samples with the proper output labels are used to train the model. The model then creates a mapping between the input data and the output data using this training data. The output for fresh instances may be predicted using the model after it has been trained.

Inductive learning is used by a number of well-known machine learning algorithms, such as decision trees, k-nearest neighbors, and neural networks. Because it enables the development of models that can accurately anticipate new data, even when the underlying patterns and relationships are complicated and poorly understood, inductive learning is an essential method for machine learning.

Because inductive learning models are flexible and adaptive, they are well suited for handling difficult, complex, and dynamic information.

Finding hidden patterns and relationships in data: Inductive learning models are ideally suited for tasks like pattern recognition and classification because they can identify links and patterns in data that may not be immediately apparent to humans.

Huge datasets − Inductive learning models are suitable for applications requiring the processing of massive quantities of data because they can efficiently handle enormous volumes of data.

Appropriate for situations where the rules are ambiguous − Since inductive learning models may learn from examples without explicit programming, they are suitable for situations when the rules are not precisely described or understood beforehand.

Disadvantages

May overfit to particular data − Inductive learning models that have overfit to specific training data, or that have learned the noise in the data rather than the underlying patterns, may perform badly on fresh data.

computationally costly possible − The employment of inductive learning models in real-time applications may be constrained by their computationally costly nature, especially for complex datasets.

Limited interpretability − Inductive learning models may be difficult to understand, making it difficult to understand how they arrive at their predictions, in applications where the decision-making process must be transparent and explicable.

Inductive learning models are only as good as the data they are trained on, therefore if the data is inaccurate or inadequate, the model may not perform effectively.

Deductive Learning

Deductive learning is a method of machine learning in which a model is built using a series of logical principles and steps. In deductive learning, the model is specifically designed to adhere to a set of guidelines and processes in order to produce predictions based on brand-new, unexplored data.

In rule-based systems, expert systems, and knowledge-based systems, where the rules and processes are clearly set by domain experts, deductive learning is frequently utilized. The model is trained to adhere to the guidelines and processes in order to derive judgments or predictions from the input data.

Deductive learning begins with a set of rules and processes and utilizes these rules to generate predictions on incoming data, in contrast to inductive learning, which learns from particular examples. Making a model that can precisely adhere to a set of guidelines and processes in order to generate predictions is the aim of deductive learning.

Deductive learning is used by a number of well-known machine learning algorithms, such as decision trees, rule-based systems, and expert systems. Deductive learning is a crucial machine learning strategy because it enables the development of models that can generate precise predictions in accordance with predetermined rules and guidelines.

More effective − Since deductive learning begins with broad concepts and applies them to particular cases, it is frequently quicker than inductive learning.

Deductive learning can sometimes yield more accurate findings than inductive learning since it starts with certain principles and applies them to the data.

Deductive learning is more practical when data are sparse or challenging to collect since it requires fewer data than inductive learning.

Deductive learning is constrained by the rules that are currently in place, which may be insufficient or obsolete.

Deductive learning is not appropriate for complicated issues that lack precise rules or correlations between variables, nor is it appropriate for ambiguous problems.

Results that are biased − The quality of the rules and knowledge base, which might add biases and mistakes to the results, determines how accurate deductive learning is.

The Main Distinctions Between Inductive and Deductive Learning in Machine Learning are Outlined in the Following Table

Inductive learning is an important method for machine learning, as it enables the development of models that can accurately anticipate new data even when the underlying patterns and relationships are complicated and poorly understood. Deductive learning is a method of machine learning that is computationally costly, limited interpretability and is dependent on the quality of data. Deductive learning is a key machine learning strategy that enables the development of precise predictions in accordance with predetermined rules and guidelines.

Premansh Sharma

Related Articles

  • Difference between Deep Learning and Reinforcement Learning
  • Difference Between Active Learning and Passive Learning
  • Difference between Bayesian Machine Learning and Deep Learning
  • Difference between supervised and unsupervised learning.
  • Difference between Deep Learning and NLP
  • Difference between Machine learning and Artificial Intelligence
  • Difference between Data Mining and Machine Learning
  • Difference between Computer Vision and Machine Learning
  • Difference between Computer Vision and Deep Learning
  • Difference between Cognitive Computing and Machine Learning
  • Difference Between Deep Learning and Neural Network
  • Difference between Big Data and Machine Learning
  • Difference between Data Science and Machine Learning
  • Difference between Bias and Variance in Machine Learning
  • Difference Between Social Cognitive Theory and Social Learning Theory

Kickstart Your Career

Get certified by completing the course

Library homepage

  • school Campus Bookshelves
  • menu_book Bookshelves
  • perm_media Learning Objects
  • login Login
  • how_to_reg Request Instructor Account
  • hub Instructor Commons
  • Download Page (PDF)
  • Download Full Book (PDF)
  • Periodic Table
  • Physics Constants
  • Scientific Calculator
  • Reference & Cite
  • Tools expand_more
  • Readability

selected template will load here

This action is not available.

Mathematics LibreTexts

3.6: Mathematical Induction - An Introduction

  • Last updated
  • Save as PDF
  • Page ID 24565

  • Harris Kwong
  • State University of New York at Fredonia via OpenSUNY

Mathematical induction can be used to prove that an identity is valid for all integers \(n\geq1\). Here is a typical example of such an identity: \[1+2+3+\cdots+n = \frac{n(n+1)}{2}.\] More generally, we can use mathematical induction to prove that a propositional function \(P(n)\) is true for all integers \(n\geq a\).

Principal of Mathematical Induction (PMI)

Given a propositional function \(P(n)\) defined for integers \(n\), and a fixed integer \(a.\)

Then, if these two conditions are true

  • \(P(a)\) is true.
  • if \(P(k)\) is true for some integer \(k\geq a\), then \(P(k+1)\) is also true.

then the  \(P(n)\) is true for all integers \(n\geq a\).

Outline for Mathematical Induction

To show that a propositional function \(P(n)\) is true for all integers \(n\geq a\), follow these steps:

  • Base Step : Verify that \(P(a)\) is true.
  • Assume \(P(n)\) is true for an arbitrary integer, \(k\) with  \(k\geq a\).  This is the inductive hypothesis .
  • With this assumption (the inductive hypothesis) , show \(P(k+1)\) is true.
  • Conclude, by the Principle of Mathematical Induction (PMI) that \(P(n)\) is true for all integers \(n\geq a\).

The base step is also called the basis step or the  anchor step or the initial step . 

The base step and the inductive step, together, prove that \[P(a) \Rightarrow P(a+1) \Rightarrow P(a+2) \Rightarrow \cdots\,.\] Therefore, \(P(n)\) is true for all integers \(n\geq a\). Compare induction to falling dominoes. When the first domino falls, it knocks down the next domino. The second domino in turn knocks down the third domino. Eventually, all the dominoes will be knocked down. But it will not happen unless these conditions are met:

  • The first domino must fall to start the motion. If it does not fall, no chain reaction will occur. This is the base step.
  • The distance between adjacent dominoes must be set up correctly. Otherwise, a certain domino may fall down without knocking over the next. Then the chain reaction will stop, and will never be completed. Maintaining the right inter-domino distance ensures that \(P(k)\Rightarrow P(k+1)\) for each integer \(k\geq a\).

To prove the implication \[P(k) \Rightarrow P(k+1)\] in the inductive step, we need to carry out two steps: assuming that \(P(k)\) is true, then using it to prove \(P(k+1)\) is also true. So we can refine an induction proof into a 3-step procedure:

  • Verify that \(P(a)\) is true.
  • Assume that \(P(k)\) is true for some integer \(k\geq a\).
  • Show that \(P(k+1)\) is also true.

The second step, the assumption that \(P(k)\) is true, is referred to as the inductive hypothesis.  This is how a mathematical induction proof may look:

The idea behind mathematical induction is rather simple. However, it must be delivered with precision.

  • Be sure to say “Assume \(P(n)\) holds for some integer \(k\geq a\).” Do not say “Assume it holds for all integers \(k\geq a\).” If we already know the result holds for all \(k\geq a\), then there is no need to prove anything at all.
  • Be sure to specify the requirement \(k\geq a\). This ensures that the chain reaction of the falling dominoes starts with the first one.
  • Do not say “let \(n=k\)” or “let \(n=k+1\).” The point is, you are not assigning the value of \(k\) and \(k+1\) to \(n\). Rather, you are assuming that the statement is true when \(n\) equals \(k\), and using it to show that the statement also holds when \(n\) equals \(k+1\).

Some proofs by induction 

\(1+2+3+\cdots+n\).

Use mathematical induction to show proposition \(P(n)\) :   \[1+2+3+\cdots+n = \frac{n(n+1)}{2}\] for all integers \(n\geq1\).

Base Step:  consider n = 1

On the Left-Hand Side (LHS) we get 1.  On the Right-Hand Side ( RHS) we get \(\frac{1(1+1)}{2}=\frac{2}{2}=1.\)  Thus \(P(n)\) is true for \(n =1.\)

Inductive step: Assume  \(P(n)\) is true for \(n =k, k \geq 1.\)  In other words, \(P(k)\) is true so our  inductive hypothesis is  \[1+2+3+\cdots+k = \frac{k(k+1)}{2}.\]   

Consider the left-hand side of \(P(k+1)\).   \[1+2+3+\cdots+(k+1) = 1+2+\cdots+k+(k+1),\]

we can regroup this as

\[1+2+3+\cdots+(k+1) = [1+2+\cdots+k]+(k+1),\]

so that \(1+2+\cdots+k\) can be replaced by \(\frac{k(k+1)}{2}\), by the inductive hypothesis.

Using the inductive hypothesis, we find

\[\begin{aligned} 1+2+3+\cdots+(k+1) &=& 1+2+3+\cdots+k+(k+1) \\ &=& \frac{k(k+1)}{2}+(k+1) \\ &=& (k+1)\left(\frac{k}{2}+1\right) \\ &=& (k+1)\cdot\frac{k+2}{2}\\ &=& \frac{(k+1)(k+2)}{2}. \end{aligned}\]

Therefore, the identity also holds when \(n=k+1\).

Thus, by the Principle of Mathematical Induction (PMI),  \[1+2+3+\cdots+n = \frac{n(n+1)}{2}\] for all integers \(n\geq1\).

We can use the summation notation (also called the sigma notation ) to abbreviate a sum. For example, the sum in the last example can be written as

\[\sum_{i=1}^n i.\]

The letter \(i\) is the index of summation . By putting \(i=1\) under \(\sum\) and \(n\) above, we declare that the sum starts with \(i=1\), and ranges through \(i=2\), \(i=3\), and so on, until \(i=n\). The quantity that follows \(\sum\) describes the pattern of the terms that we are adding in the summation. Accordingly,

\[\sum_{i=1}^{10} i^2 = 1^2+2^2+3^2+\cdots+10^2.\]

In general, the sum of the first \(n\) terms in a sequence \(\{a_1,a_2,a_3,\ldots\,\}\) is denoted \(\sum_{i=1}^n a_i\). Observe that

\[\sum_{i=1}^{k+1} a_i = \left(\sum_{i=1}^k a_i\right) + a_{k+1},\]

which provides the link between \(P(k+1)\) and \(P(k)\) in an induction proof.

\(\sum_{i=1}^n i^2\)

Example \(\PageIndex{2}\)

Use mathematical induction to show that, for all integers \(n\geq1\), \[\sum_{i=1}^n i^2 = 1^2+2^2+3^2+\cdots+n^2 = \frac{n(n+1)(2n+1)}{6}.\]

Base Step: When \(n=1\), the left-hand side reduces to \(1^2=1\), and the right-hand side becomes \(\frac{1\cdot2\cdot3}{6}=1\); hence, the identity holds when \(n=1\). Inductive Step: Assume it holds when \(n=k\) for some integer \(k\geq1\); that is, assume for some integer \(k\geq1\) that \[\sum_{i=1}^k i^2 = \frac{k(k+1)(2k+1)}{6}\] . Consider \(n=k+1\).   \[\sum_{i=1}^{k+1} i^2 =1^2+2^2+3^2+\cdots+k^2+(k+1)^2. \] From the inductive hypothesis, we find \[\sum_{i=1}^{k+1} i^2 = \sum_{i=1}^k i^2 + (k+1)^2\] \[=\frac{k(k+1)(2k+1)}{6}+(k+1)^2\]  \[=\frac{k(k+1)(2k+1)+6(k+1)^2}{6}\] \[\frac{(k+1)[k(2k+1)+6(k+1)]}{6}\] \[\frac{(k+1)(2k^2+7k+6)}{6}\] \[\frac{(k+1)(k+2)(2k+3)}{6}\] \[\frac{(k+1)(k+2)(2(k+1)+1)}{6}.\] Therefore, the identity also holds when \(n=k+1\).  Thus, by PMI for all integers \(n\geq1\), \[\sum_{i=1}^n i^2 = 1^2+2^2+3^2+\cdots+n^2 = \frac{n(n+1)(2n+1)}{6}.\]

hands-on exercise \(\PageIndex{1}\label{he:induct1-01}\)

It is time for you to write your own induction proof. Prove that \[1\cdot2 + 2\cdot3 + 3\cdot4 + \cdots + n(n+1) = \frac{n(n+1)(n+2)}{3}\] for all integers \(n\geq1\).

hands-on exercise \(\PageIndex{2}\label{he:induct1-02}\)

Use induction to prove that, for all positive integers \(n\), \[1\cdot2\cdot3 + 2\cdot3\cdot4 + \cdots + n(n+1)(n+2) = \frac{n(n+1)(n+2)(n+3)}{4}.\]

hands-on exercise \(\PageIndex{3}\label{he:sumfourn}\)

Use induction to prove that, for all positive integers \(n\), \[1+4^1+4^2+\cdots+4^n = \frac{4^{n+1}-1}{3}.\]

All three steps in an induction proof must be completed; otherwise, the proof may not be correct.

Example \(\PageIndex{3}\label{eg:induct1-03}\)

Can we just use examples?

Never attempt to prove \(P(k)\Rightarrow P(k+1)\) by examples alone . Consider \[P(n): \qquad n^2+n+11 \mbox{ is prime}.\] In the inductive step, we want to prove that \[P(k) \Rightarrow P(k+1) \qquad\mbox{ for ANY } k\geq1.\] The following table verifies that it is true for \(1\leq k\leq 9\): \[\begin{array}{|*{10}{c|}} \hline n & 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 & 9 \\ \hline n^2+n+11 & 13 & 17 & 23 & 31 & 41 & 53 & 67 & 83 & 101 \\ \hline \end{array}\] Nonetheless, when \(n=10\), \(n^2+n+11=121\) is composite. So \(P(9) \Rightarrow P(10)\) is false. The inductive step breaks down when \(k=9\).

Example \(\PageIndex{4}\label{eg:induct1-04}\)

The base step is equally important . Consider proving \[P(n): \qquad 3n+2 = 3q \mbox{ for some integer $q$}\] for all \(n\in\mathbb{N}\). Assume \(P(k)\) is true for some integer \(k\geq1\); that is, assume \(3k+2=3q\) for some integer \(q\). Then \[3(k+1)+2 = 3k+3+2 = 3+3q = 3(1+q).\] Therefore, \(3(k+1)+2\) can be written in the same form. This proves that \(P(k+1)\) is also true. Does it follow that \(P(n)\) is true for all integers \(n\geq1\)? We know that \(3n+2\) cannot be written as a multiple of 3. What is the problem?

The problem is: we need \(P(k)\) to be true for at least one value of \(k\) so as to start the sequence of implications \[P(1) \Rightarrow P(2), \qquad P(2) \Rightarrow P(3), \qquad P(3) \Rightarrow P(4), \qquad\ldots\] The induction fails because we have not established the basis step. In fact, \(P(1)\) is false. Since the first domino does not fall, we cannot even start the chain reaction.

Thus far, we have learned how to use mathematical induction to prove identities. In general, we can use mathematical induction to prove a statement about \(n\). This statement can take the form of an identity, an inequality, or simply a verbal statement about \(n\). We shall learn more about mathematical induction in the next few sections.

Summary and Review

  • Mathematical induction can be used to prove that a statement about \(n\) is true for all integers \(n\geq a\).
  • We have to complete three steps.
  • In the base step, verify the statement for \(n=a\).
  • In the inductive hypothesis, assume that the statement holds when \(n=k\) for some integer \(k\geq a\).
  • In the inductive step, use the information gathered from the inductive hypothesis to prove that the statement also holds when \(n=k+1\).
  • Be sure to complete all three steps.
  • Pay attention to the wording. At the beginning, follow the template closely. When you feel comfortable with the whole process, you can start venturing out on your own.

Exercises 

Exercise \(\PageIndex{1}\label{ex:induct1-01}\)

Use induction to prove that \[1^3+2^3+3^3+\cdots+n^3 = \frac{n^2(n+1)^2}{4}\] for all integers \(n\geq1\).

Exercise \(\PageIndex{2}\)

Use induction to prove that the following identity holds for all integers \(n\geq1\): \[1+3+5+\cdots+(2n-1) = n^2.\]

Base Case: consider \(n=1\).  \(2(1)-1=1\) and \(1^2=1\) so the LHS & RHS are both 1. This works for  \(n=1\).

Inductive Step: Assume this works for some integer, \(k \geq 1.\) In other words,  \(1+3+5+\cdots+(2k-1) = k^2.\)  ( Inductive Hypothesis )

Consider the case of  \(n=k+1.\)   \(1+3+5+\cdots +(2k-1)+(2(k+1)-1)\)

 \[=k^2+(2(k+1)-1) \text{   by inductive hypothesis}\] \[=k^2+2k+2-1=k^2+2k+1=(k+1)^2 \text{   by algebra} \]

\(1+3+5+\cdots+(2(k+1)-1)=(k+1)^2\); assuming our proposition works for \(k\) it will also work for \(k+1.\)

By PMI, \(1+3+5+\cdots+(2n-1) = n^2\)  for all integers,  \(n\geq1\).

Exercise \(\PageIndex{3}\label{ex:induct1-03}\)

Use induction to show that \[1+\frac{1}{3}+\frac{1}{3^2}+\cdots+\frac{1}{3^n} = \frac{3}{2}\left(1-\frac{1}{3^{n+1}}\right)\] for all positive integers \(n\).

Exercise \(\PageIndex{4}\label{ex:induct1-04}\)

Use induction to establish the following identity for any integer \(n\geq1\): \[1-3+9-\cdots+(-3)^n = \frac{1-(-3)^{n+1}}{4}.\]

Exercise \(\PageIndex{5}\label{ex:induct1-05}\)

Use induction to show that, for any integer \(n\geq1\): \[\sum_{i=1}^n i\cdot i! = (n+1)!-1.\]

Exercise \(\PageIndex{6}\label{ex:induct1-06}\)

Use induction to prove the following identity for integers \(n\geq1\): \[\sum_{i=1}^n \frac{1}{(2i-1)(2i+1)} = \frac{n}{2n+1}.\]

Exercise \(\PageIndex{7}\)

Prove \(2^{2n}-1\) is divisible by 3, for all integers \(n\geq0.\)

Base Case: consider \(n=0\).  \(2^{2(0)}-1=1-1=0.\)  \(0\) is divisible by 3 because 0 = 0(3).

Inductive Step: Assume this works for some integer, \(k \geq 0.\) In other words, \(2^{2k}-1\) is divisible by 3. ( Inductive Hypothesis )

Since \(2^{2k}-1\) is divisible by 3, there exists some integer, m such that \(2^{2k}-1=3m,\)  by definition of divides.

Consider the case of  \(n=k+1.\)  By algebra: \[2^{2(k+1)}-1=2^{2k+2}-1=2^{2k}\cdot 2^2-1=2^{2k}\cdot 4 -1=2^{2k}\cdot (3+1)-1=3 \cdot 2^{2k}+2^{2k}-1\] \[=3 \cdot 2^{2k}+3m \text{   by inductive hypothesis}\]

\[=3(2^{2k}+m) \text{   by algebra}\]

\(2^{2(k+1)}-1=3(2^{2k}+m)\) and  \((2^{2k}+m)\in \mathbb{Z}\) since the integers are closed under addition and multiplication.  

So, \(2^{2(k+1)}-1\) is divisible by 3 by the definition of divisible.

Thus assuming our proposition works for \(k\) it will also work for \(k+1.\)

By PMI,  \(2^{2n}-1\) is divisible by 3, for all integers \(n\geq0.\)

Exercise \(\PageIndex{8}\label{ex:induct1-08}\)

Evaluate \(\sum_{i=1}^n \frac{1}{i(i+1)}\) for a few values of \(n\). What do you think the result should be? Use induction to prove your conjecture.

Exercise \(\PageIndex{9}\label{ex:induct1-09}\)

Use induction to prove that \[\sum_{i=1}^n (2i-1)^3 = n^2(2n^2-1)\] whenever \(n\) is a positive integer.

Exercise \(\PageIndex{10}\label{ex:induct1-10}\)

Use induction to show that, for any integer \(n\geq1\): \[1^2-2^2+3^2-\cdots+(-1)^{n-1}n^2 = (-1)^{n-1}\,\frac{n(n+1)}{2}.\]

Exercise \(\PageIndex{11}\label{ex:induct1-11}\)

Use mathematical induction to show that \[\sum_{i=1}^n \frac{i+4}{i(i+1)(i+2)} = \frac{n(3n+7)}{2(n+1)(n+2)}\] for all integers \(n\geq1\).

Exercise \(\PageIndex{12}\)

Use mathematical induction to show that \[3+\sum_{i=1}^n (3+5i) = \frac{(n+1)(5n+6)}{2}\] for all integers \(n\geq1\).

No answer here at this time.

IMAGES

  1. 15 Inductive Reasoning Examples (2024)

    inductive learning hypothesis

  2. PPT

    inductive learning hypothesis

  3. What Is Inductive Reasoning? (Plus Examples of How to Use It)

    inductive learning hypothesis

  4. Inductive Learning: Examples, Definition, Pros, Cons (2024)

    inductive learning hypothesis

  5. Describe the Inductive Scientific Method in Your Own Words

    inductive learning hypothesis

  6. PPT

    inductive learning hypothesis

VIDEO

  1. CS8082-MACHINE LEARNING TECHNIQUES : CONCEPT LEARNING (IN TAMIL)

  2. Hypothesis spaces, Inductive bias, Generalization, Bias variance trade-off in tamil -AL3451 #ML

  3. Inductive & Deductive Hypothesis

  4. जाने मशीन लर्निंग के बेसिक टर्म्स inductive bias, hypothesis class, hypothesis and bias

  5. Data Science Machine Learning Statistics Python Hypothesis Testing Theory to Practical Part 2

  6. 2. What is Concept Learning Task

COMMENTS

  1. Inductive Learning Algorithm

    Inductive Learning Algorithm (ILA) is an iterative and inductive machine learning algorithm that is used for generating a set of classification rules, which produces rules of the form "IF-THEN", for a set of examples, producing rules at each iteration and appending to the set of rules. There are basically two methods for knowledge ...

  2. Inductive Learning: Examples, Definition, Pros, Cons

    Traffic management is an example of software that is capable of inductive learning. 5. Inductive Learning in Theory Development. Inductive learning is a key way in which scholars and researchers come up with ground-breaking theories. One example is in Mary Ainsworth's observational research, where she used observations to induce a theory, as ...

  3. A concept Learning Task and Inductive Learning Hypothesis

    Learn how to find the best hypothesis that matches the target concept across a large set of training data using inductive learning algorithms. The concept learning task is to find all the consistent hypotheses or concepts and the hypothesis space is the collection of all possible hypotheses.

  4. A Theory and Methodology of Inductive Learning

    Abstract. The presented theory views inductive learning as a heuristic search through a space of symbolic descriptions, generated by an application of various inference rules to the initial observational statements. The inference rules include generalization rules, which perform generalizing transformations on descriptions, and conventional ...

  5. A THEORY AND METHODOLOGY OF INDUCTIVE LEARNING

    The presented theory views inductive learning as a heuristic search through a space of symbolic descriptions, generated by an application of various inference rules to the initial observational statements. The inference rules include generalization rules, which perform generalizing transformations on descriptions, and conventional truth ...

  6. PDF Inductive Learning and Decision Trees

    Inductive learning Goal: generate a hypothesis - a function from instances described by attributes to an output - using training examples. Requires inductive bias a restricted hypothesis space, or preferences over hypotheses. Decision Trees Simple representation of hypotheses, recursive learning algorithm Prefer smaller trees!

  7. PDF Motivation Inductive Learning (1/2)

    Inductive Learning Scheme Hypothesis space H {[CONCEPT(x) ⇔S(A,B, …)]} Training set D Inductive hypothesis h 17 Size of Hypothesis Space n observable predicates 2n entries in truth table defining CONCEPT and each entry can be filled with True or False In the absence of any restriction

  8. PDF Inductive Learning

    7-learning.pdf. . Inductive learning hypothesis: any hypothesis found to approximate the target function well over a sufficiently large. . set of training examples will also approximate the target function well over any other unobserved examples. Assumptions for Inductive Learning Algorithms: • The training sample represents the population.

  9. A theory and methodology of inductive learning

    A theory of inductive learning is presented that characterizes it as a heuristic search through a space of symbolic descriptions, generated by an application of certain inference rules to the initial observational statements (the teacher-provided examples of some concepts, or facts about a class of objects or a phenomenon).

  10. Inductive Learning Hypothesis

    The inductive learning hypothesis states that any hypothesis found to approximate the target function well over a sufficiently large set of training examples will also approximate the target function well over other unobserved examples. Why should this be true? Its not true for the stock market, or is it?

  11. Inductive reasoning

    Gillies also provides a rare counterexample "in the machine learning programs of AI." Biases Inductive reasoning is also known as hypothesis construction because any conclusions made are based on current knowledge and predictions. ... Around 1960, Ray Solomonoff founded the theory of universal inductive inference, a theory of prediction based ...

  12. Chapter 2: Concept Learning and the General-to-Specific Ordering

    h: one hypothesis, h: X → { 0, 1 }, the goal is to find h such that h(x) = c(x) for all x in X; Inductive Learning Hypothesis. Any hypothesis found to approximate the target function well over a sufficiently large set of training examples will also approximate the target function well over other unobserved examples. Definition

  13. AI Qual Summary: Learning

    Inductive learning is a kind of learning in which, given a set of examples an agent tries to estimate or create an evaluation function. ... AM has an implicit bias toward learning number theory concepts. BACON [Langley, 1981] A model of data-driven scientific discovery. BACON creates proportionalities in order to derive relations between data ...

  14. Statistical Learning Theory and Induction

    The philosophical problem of induction is whether and how inductive reasoning can be justified. Statistical learning theory (SLT) is a mathematical theory of a certain type of inductive reasoning involving learning from examples. SLT makes relatively minimal assumptions about an assumed background probability distribution responsible for ...

  15. Inductive Learning Including Decision Tree and Rule Induction Learning

    Inductive learning empowers the framework to perceive examples and consistencies in past Data or preparing Data and concentrate complete expectations from them. Two basic classifications of inductive learning methods, what's more, tactics, are introduced. Gap and-Conquer calculations are often referred to as Option Tree calculations and ...

  16. A Theory and Methodology of Inductive Learning

    A New System for Inductive Learning in Attribute-Based Spaces. C. Janikow. Computer Science. ISMIS. 1991. Despite the fact that the theory and methodology of inductive learning has previously been described, actual symbolic systems for learning descriptions in attribute-based spaces use other algorithms.…. Expand.

  17. Inductive Learning Hypothesis

    Inductive Learning Hypothesis. 6.034 Artificial Intelligence - Recitations, fall 2004 online slides on learning : ... Any hypothesis found to approximate the target function well over a sufficiently large set of training examples will also approximate the target function well over other unobserved examples. ...

  18. Formal Learning Theory

    Formal Learning Theory. First published Sat Feb 2, 2002; substantive revision Wed Mar 23, 2022. Formal learning theory is the mathematical embodiment of a normative epistemology. It deals with the question of how an agent should use observations about her environment to arrive at correct and informative conclusions.

  19. Hypothesis Search: Inductive Reasoning with Language Models

    Inductive reasoning is a core problem-solving capacity: humans can identify underlying principles from a few examples, which can then be robustly generalized to novel scenarios. Recent work has evaluated large language models (LLMs) on inductive reasoning tasks by directly prompting them yielding "in context learning." This can work well for straightforward inductive tasks, but performs very ...

  20. A theory of formal synthesis via inductive learning

    Formal synthesis is the process of generating a program satisfying a high-level formal specification. In recent times, effective formal synthesis methods have been proposed based on the use of inductive learning. We refer to this class of methods that learn programs from examples as formal inductive synthesis. In this paper, we present a theoretical framework for formal inductive synthesis. We ...

  21. Difference between inductive and deductive learning

    Inductive learning is used by a number of well-known machine learning algorithms, such as decision trees, k-nearest neighbors, and neural networks. Because it enables the development of models that can accurately anticipate new data, even when the underlying patterns and relationships are complicated and poorly understood, inductive learning is ...

  22. 3.6: Mathematical Induction

    In the inductive hypothesis, assume that the statement holds when \(n=k\) for some integer \(k\geq a\). In the inductive step, use the information gathered from the inductive hypothesis to prove that the statement also holds when \(n=k+1\). Be sure to complete all three steps. Pay attention to the wording. At the beginning, follow the template ...

  23. Inductive Reasoning

    Most students from a sample in a local university prefer hybrid learning environments. Inductive generalization: 73% of all students in the university prefer hybrid learning environments. ... You start with a theory, and you might develop a hypothesis that you test empirically. You collect data from many observations and use a statistical test ...