• Machine Learning Tutorial
  • Data Analysis Tutorial
  • Python - Data visualization tutorial
  • Machine Learning Projects
  • Machine Learning Interview Questions
  • Machine Learning Mathematics
  • Deep Learning Tutorial
  • Deep Learning Project
  • Deep Learning Interview Questions
  • Computer Vision Tutorial
  • Computer Vision Projects
  • NLP Project
  • NLP Interview Questions
  • Statistics with Python
  • 100 Days of Machine Learning

Hypothesis in Machine Learning

  • Bayes Theorem in Machine learning
  • How does Machine Learning Works?
  • Understanding Hypothesis Testing
  • An introduction to Machine Learning
  • Types of Machine Learning
  • How Machine Learning Will Change the World?
  • Difference Between Machine Learning vs Statistics
  • Difference between Statistical Model and Machine Learning
  • Difference Between Machine Learning and Artificial Intelligence
  • ML | Naive Bayes Scratch Implementation using Python
  • Introduction to Machine Learning in R
  • Introduction to Machine Learning in Julia
  • Design a Learning System in Machine Learning
  • Getting started with Machine Learning
  • Machine Learning vs Artificial Intelligence
  • Hypothesis Testing Formula
  • Current Best Hypothesis Search
  • What is the Role of Machine Learning in Data Science
  • Removing stop words with NLTK in Python
  • Decision Tree
  • Linear Regression in Machine learning
  • Agents in Artificial Intelligence
  • Plotting Histogram in Python using Matplotlib
  • One Hot Encoding in Machine Learning
  • Best Python libraries for Machine Learning
  • Introduction to Hill Climbing | Artificial Intelligence
  • Clustering in Machine Learning
  • Digital Image Processing Basics

The concept of a hypothesis is fundamental in Machine Learning and data science endeavours. In the realm of machine learning, a hypothesis serves as an initial assumption made by data scientists and ML professionals when attempting to address a problem. Machine learning involves conducting experiments based on past experiences, and these hypotheses are crucial in formulating potential solutions.

It’s important to note that in machine learning discussions, the terms “hypothesis” and “model” are sometimes used interchangeably. However, a hypothesis represents an assumption, while a model is a mathematical representation employed to test that hypothesis. This section on “Hypothesis in Machine Learning” explores key aspects related to hypotheses in machine learning and their significance.

Table of Content

How does a Hypothesis work?

Hypothesis space and representation in machine learning, hypothesis in statistics, faqs on hypothesis in machine learning.

A hypothesis in machine learning is the model’s presumption regarding the connection between the input features and the result. It is an illustration of the mapping function that the algorithm is attempting to discover using the training set. To minimize the discrepancy between the expected and actual outputs, the learning process involves modifying the weights that parameterize the hypothesis. The objective is to optimize the model’s parameters to achieve the best predictive performance on new, unseen data, and a cost function is used to assess the hypothesis’ accuracy.

In most supervised machine learning algorithms, our main goal is to find a possible hypothesis from the hypothesis space that could map out the inputs to the proper outputs. The following figure shows the common method to find out the possible hypothesis from the Hypothesis space:

Hypothesis-Geeksforgeeks

Hypothesis Space (H)

Hypothesis space is the set of all the possible legal hypothesis. This is the set from which the machine learning algorithm would determine the best possible (only one) which would best describe the target function or the outputs.

Hypothesis (h)

A hypothesis is a function that best describes the target in supervised machine learning. The hypothesis that an algorithm would come up depends upon the data and also depends upon the restrictions and bias that we have imposed on the data.

The Hypothesis can be calculated as:

[Tex]y = mx + b [/Tex]

  • m = slope of the lines
  • b = intercept

To better understand the Hypothesis Space and Hypothesis consider the following coordinate that shows the distribution of some data:

Hypothesis_Geeksforgeeks

Say suppose we have test data for which we have to determine the outputs or results. The test data is as shown below:

hypothesis and hypothesis space

We can predict the outcomes by dividing the coordinate as shown below:

hypothesis and hypothesis space

So the test data would yield the following result:

hypothesis and hypothesis space

But note here that we could have divided the coordinate plane as:

hypothesis and hypothesis space

The way in which the coordinate would be divided depends on the data, algorithm and constraints.

  • All these legal possible ways in which we can divide the coordinate plane to predict the outcome of the test data composes of the Hypothesis Space.
  • Each individual possible way is known as the hypothesis.

Hence, in this example the hypothesis space would be like:

Possible hypothesis-Geeksforgeeks

The hypothesis space comprises all possible legal hypotheses that a machine learning algorithm can consider. Hypotheses are formulated based on various algorithms and techniques, including linear regression, decision trees, and neural networks. These hypotheses capture the mapping function transforming input data into predictions.

Hypothesis Formulation and Representation in Machine Learning

Hypotheses in machine learning are formulated based on various algorithms and techniques, each with its representation. For example:

  • Linear Regression : [Tex] h(X) = \theta_0 + \theta_1 X_1 + \theta_2 X_2 + … + \theta_n X_n[/Tex]
  • Decision Trees : [Tex]h(X) = \text{Tree}(X)[/Tex]
  • Neural Networks : [Tex]h(X) = \text{NN}(X)[/Tex]

In the case of complex models like neural networks, the hypothesis may involve multiple layers of interconnected nodes, each performing a specific computation.

Hypothesis Evaluation:

The process of machine learning involves not only formulating hypotheses but also evaluating their performance. This evaluation is typically done using a loss function or an evaluation metric that quantifies the disparity between predicted outputs and ground truth labels. Common evaluation metrics include mean squared error (MSE), accuracy, precision, recall, F1-score, and others. By comparing the predictions of the hypothesis with the actual outcomes on a validation or test dataset, one can assess the effectiveness of the model.

Hypothesis Testing and Generalization:

Once a hypothesis is formulated and evaluated, the next step is to test its generalization capabilities. Generalization refers to the ability of a model to make accurate predictions on unseen data. A hypothesis that performs well on the training dataset but fails to generalize to new instances is said to suffer from overfitting. Conversely, a hypothesis that generalizes well to unseen data is deemed robust and reliable.

The process of hypothesis formulation, evaluation, testing, and generalization is often iterative in nature. It involves refining the hypothesis based on insights gained from model performance, feature importance, and domain knowledge. Techniques such as hyperparameter tuning, feature engineering, and model selection play a crucial role in this iterative refinement process.

In statistics , a hypothesis refers to a statement or assumption about a population parameter. It is a proposition or educated guess that helps guide statistical analyses. There are two types of hypotheses: the null hypothesis (H0) and the alternative hypothesis (H1 or Ha).

  • Null Hypothesis(H 0 ): This hypothesis suggests that there is no significant difference or effect, and any observed results are due to chance. It often represents the status quo or a baseline assumption.
  • Aternative Hypothesis(H 1 or H a ): This hypothesis contradicts the null hypothesis, proposing that there is a significant difference or effect in the population. It is what researchers aim to support with evidence.

Q. How does the training process use the hypothesis?

The learning algorithm uses the hypothesis as a guide to minimise the discrepancy between expected and actual outputs by adjusting its parameters during training.

Q. How is the hypothesis’s accuracy assessed?

Usually, a cost function that calculates the difference between expected and actual values is used to assess accuracy. Optimising the model to reduce this expense is the aim.

Q. What is Hypothesis testing?

Hypothesis testing is a statistical method for determining whether or not a hypothesis is correct. The hypothesis can be about two variables in a dataset, about an association between two groups, or about a situation.

Q. What distinguishes the null hypothesis from the alternative hypothesis in machine learning experiments?

The null hypothesis (H0) assumes no significant effect, while the alternative hypothesis (H1 or Ha) contradicts H0, suggesting a meaningful impact. Statistical testing is employed to decide between these hypotheses.

Please Login to comment...

Similar reads.

author

  • Machine Learning

advertisewithusBannerImg

Improve your Coding Skills with Practice

 alt=

What kind of Experience do you want to share?

Programmathically

Introduction to the hypothesis space and the bias-variance tradeoff in machine learning.

hypothesis and hypothesis space

In this post, we introduce the hypothesis space and discuss how machine learning models function as hypotheses. Furthermore, we discuss the challenges encountered when choosing an appropriate machine learning hypothesis and building a model, such as overfitting, underfitting, and the bias-variance tradeoff.

The hypothesis space in machine learning is a set of all possible models that can be used to explain a data distribution given the limitations of that space. A linear hypothesis space is limited to the set of all linear models. If the data distribution follows a non-linear distribution, the linear hypothesis space might not contain a model that is appropriate for our needs.

To understand the concept of a hypothesis space, we need to learn to think of machine learning models as hypotheses.

The Machine Learning Model as Hypothesis

Generally speaking, a hypothesis is a potential explanation for an outcome or a phenomenon. In scientific inquiry, we test hypotheses to figure out how well and if at all they explain an outcome. In supervised machine learning, we are concerned with finding a function that maps from inputs to outputs.

But machine learning is inherently probabilistic. It is the art and science of deriving useful hypotheses from limited or incomplete data. Our functions are not axioms that explain the data perfectly, and for most real-life problems, we will never have all the data that exists. Accordingly, we will not find the one true function that perfectly describes the data. Instead, we find a function through training a model to map from known training input to known training output. This way, the model gradually approximates the assumed true function that describes the distribution of the data. So we treat our model as a hypothesis that needs to be tested as to how well it explains the output from a given input. We do this using a test or validation data set.

The Hypothesis Space

During the training process, we select a model from a hypothesis space that is subject to our constraints. For example, a linear hypothesis space only provides linear models. We can approximate data that follows a quadratic distribution using a model from the linear hypothesis space.

model from a linear hypothesis space

Of course, a linear model will never have the same predictive performance as a quadratic model, so we can adjust our hypothesis space to also include non-linear models or at least quadratic models.

model from a quadratic hypothesis space

The Data Generating Process

The data generating process describes a hypothetical process subject to some assumptions that make training a machine learning model possible. We need to assume that the data points are from the same distribution but are independent of each other. When these requirements are met, we say that the data is independent and identically distributed (i.i.d.).

Independent and Identically Distributed Data

How can we assume that a model trained on a training set will perform better than random guessing on new and previously unseen data? First of all, the training data needs to come from the same or at least a similar problem domain. If you want your model to predict stock prices, you need to train the model on stock price data or data that is similarly distributed. It wouldn’t make much sense to train it on whether data. Statistically, this means the data is identically distributed . But if data comes from the same problem, training data and test data might not be completely independent. To account for this, we need to make sure that the test data is not in any way influenced by the training data or vice versa. If you use a subset of the training data as your test set, the test data evidently is not independent of the training data. Statistically, we say the data must be independently distributed .

Overfitting and Underfitting

We want to select a model from the hypothesis space that explains the data sufficiently well. During training, we can make a model so complex that it perfectly fits every data point in the training dataset. But ultimately, the model should be able to predict outputs on previously unseen input data. The ability to do well when predicting outputs on previously unseen data is also known as generalization. There is an inherent conflict between those two requirements.

If we make the model so complex that it fits every point in the training data, it will pick up lots of noise and random variation specific to the training set, which might obscure the larger underlying patterns. As a result, it will be more sensitive to random fluctuations in new data and predict values that are far off. A model with this problem is said to overfit the training data and, as a result, to suffer from high variance .

a model that overfits the data

To avoid the problem of overfitting, we can choose a simpler model or use regularization techniques to prevent the model from fitting the training data too closely. The model should then be less influenced by random fluctuations and instead, focus on the larger underlying patterns in the data. The patterns are expected to be found in any dataset that comes from the same distribution. As a consequence, the model should generalize better on previously unseen data.

a model that underfits the data

But if we go too far, the model might become too simple or too constrained by regularization to accurately capture the patterns in the data. Then the model will neither generalize well nor fit the training data well. A model that exhibits this problem is said to underfit the data and to suffer from high bias . If the model is too simple to accurately capture the patterns in the data (for example, when using a linear model to fit non-linear data), its capacity is insufficient for the task at hand.

When training neural networks, for example, we go through multiple iterations of training in which the model learns to fit an increasingly complex function to the data. Typically, your training error will decrease during learning the more complex your model becomes and the better it learns to fit the data. In the beginning, the training error decreases rapidly. In later training iterations, it typically flattens out as it approaches the minimum possible error. Your test or generalization error should initially decrease as well, albeit likely at a slower pace than the training error. As long as the generalization error is decreasing, your model is underfitting because it doesn’t live up to its full capacity. After a number of training iterations, the generalization error will likely reach a trough and start to increase again. Once it starts to increase, your model is overfitting, and it is time to stop training.

overfitting vs underfitting

Ideally, you should stop training once your model reaches the lowest point of the generalization error. The gap between the minimum generalization error and no error at all is an irreducible error term known as the Bayes error that we won’t be able to completely get rid of in a probabilistic setting. But if the error term seems too large, you might be able to reduce it further by collecting more data, manipulating your model’s hyperparameters, or altogether picking a different model.

Bias Variance Tradeoff

We’ve talked about bias and variance in the previous section. Now it is time to clarify what we actually mean by these terms.

Understanding Bias and Variance

In a nutshell, bias measures if there is any systematic deviation from the correct value in a specific direction. If we could repeat the same process of constructing a model several times over, and the results predicted by our model always deviate in a certain direction, we would call the result biased.

Variance measures how much the results vary between model predictions. If you repeat the modeling process several times over and the results are scattered all across the board, the model exhibits high variance.

In their book “Noise” Daniel Kahnemann and his co-authors provide an intuitive example that helps understand the concept of bias and variance. Imagine you have four teams at the shooting range.

bias and variance

Team B is biased because the shots of its team members all deviate in a certain direction from the center. Team B also exhibits low variance because the shots of all the team members are relatively concentrated in one location. Team C has the opposite problem. The shots are scattered across the target with no discernible bias in a certain direction. Team D is both biased and has high variance. Team A would be the equivalent of a good model. The shots are in the center with little bias in one direction and little variance between the team members.

Generally speaking, linear models such as linear regression exhibit high bias and low variance. Nonlinear algorithms such as decision trees are more prone to overfitting the training data and thus exhibit high variance and low bias.

A linear model used with non-linear data would exhibit a bias to predict data points along a straight line instead of accomodating the curves. But they are not as susceptible to random fluctuations in the data. A nonlinear algorithm that is trained on noisy data with lots of deviations would be more capable of avoiding bias but more prone to incorporate the noise into its predictions. As a result, a small deviation in the test data might lead to very different predictions.

To get our model to learn the patterns in data, we need to reduce the training error while at the same time reducing the gap between the training and the testing error. In other words, we want to reduce both bias and variance. To a certain extent, we can reduce both by picking an appropriate model, collecting enough training data, selecting appropriate training features and hyperparameter values. At some point, we have to trade-off between minimizing bias and minimizing variance. How you balance this trade-off is up to you.

bias variance trade-off

The Bias Variance Decomposition

Mathematically, the total error can be decomposed into the bias and the variance according to the following formula.

Remember that Bayes’ error is an error that cannot be eliminated.

Our machine learning model represents an estimating function \hat f(X) for the true data generating function f(X) where X represents the predictors and y the output values.

Now the mean squared error of our model is the expected value of the squared difference of the output produced by the estimating function \hat f(X) and the true output Y.

The bias is a systematic deviation from the true value. We can measure it as the squared difference between the expected value produced by the estimating function (the model) and the values produced by the true data-generating function.

Of course, we don’t know the true data generating function, but we do know the observed outputs Y, which correspond to the values generated by f(x) plus an error term.

The variance of the model is the squared difference between the expected value and the actual values of the model.

Now that we have the bias and the variance, we can add them up along with the irreducible error to get the total error.

A machine learning model represents an approximation to the hypothesized function that generated the data. The chosen model is a hypothesis since we hypothesize that this model represents the true data generating function.

We choose the hypothesis from a hypothesis space that may be subject to certain constraints. For example, we can constrain the hypothesis space to the set of linear models.

When choosing a model, we aim to reduce the bias and the variance to prevent our model from either overfitting or underfitting the data. In the real world, we cannot completely eliminate bias and variance, and we have to trade-off between them. The total error produced by a model can be decomposed into the bias, the variance, and irreducible (Bayes) error.

hypothesis and hypothesis space

About Author

hypothesis and hypothesis space

Related Posts

hypothesis and hypothesis space

Best Guesses: Understanding The Hypothesis in Machine Learning

Stewart Kaplan

  • February 22, 2024
  • General , Supervised Learning , Unsupervised Learning

Machine learning is a vast and complex field that has inherited many terms from other places all over the mathematical domain.

It can sometimes be challenging to get your head around all the different terminologies, never mind trying to understand how everything comes together.

In this blog post, we will focus on one particular concept: the hypothesis.

While you may think this is simple, there is a little caveat regarding machine learning.

The statistics side and the learning side.

Don’t worry; we’ll do a full breakdown below.

You’ll learn the following:

What Is a Hypothesis in Machine Learning?

  • Is This any different than the hypothesis in statistics?
  • What is the difference between the alternative hypothesis and the null?
  • Why do we restrict hypothesis space in artificial intelligence?
  • Example code performing hypothesis testing in machine learning

learning together

In machine learning, the term ‘hypothesis’ can refer to two things.

First, it can refer to the hypothesis space, the set of all possible training examples that could be used to predict or answer a new instance.

Second, it can refer to the traditional null and alternative hypotheses from statistics.

Since machine learning works so closely with statistics, 90% of the time, when someone is referencing the hypothesis, they’re referencing hypothesis tests from statistics.

Is This Any Different Than The Hypothesis In Statistics?

In statistics, the hypothesis is an assumption made about a population parameter.

The statistician’s goal is to prove it true or disprove it.

prove them wrong

This will take the form of two different hypotheses, one called the null, and one called the alternative.

Usually, you’ll establish your null hypothesis as an assumption that it equals some value.

For example, in Welch’s T-Test Of Unequal Variance, our null hypothesis is that the two means we are testing (population parameter) are equal.

This means our null hypothesis is that the two population means are the same.

We run our statistical tests, and if our p-value is significant (very low), we reject the null hypothesis.

This would mean that their population means are unequal for the two samples you are testing.

Usually, statisticians will use the significance level of .05 (a 5% risk of being wrong) when deciding what to use as the p-value cut-off.

What Is The Difference Between The Alternative Hypothesis And The Null?

The null hypothesis is our default assumption, which we are trying to prove correct.

The alternate hypothesis is usually the opposite of our null and is much broader in scope.

For most statistical tests, the null and alternative hypotheses are already defined.

You are then just trying to find “significant” evidence we can use to reject our null hypothesis.

can you prove it

These two hypotheses are easy to spot by their specific notation. The null hypothesis is usually denoted by H₀, while H₁ denotes the alternative hypothesis.

Example Code Performing Hypothesis Testing In Machine Learning

Since there are many different hypothesis tests in machine learning and data science, we will focus on one of my favorites.

This test is Welch’s T-Test Of Unequal Variance, where we are trying to determine if the population means of these two samples are different.

There are a couple of assumptions for this test, but we will ignore those for now and show the code.

You can read more about this here in our other post, Welch’s T-Test of Unequal Variance .

We see that our p-value is very low, and we reject the null hypothesis.

welch t test result with p-value

What Is The Difference Between The Biased And Unbiased Hypothesis Spaces?

The difference between the Biased and Unbiased hypothesis space is the number of possible training examples your algorithm has to predict.

The unbiased space has all of them, and the biased space only has the training examples you’ve supplied.

Since neither of these is optimal (one is too small, one is much too big), your algorithm creates generalized rules (inductive learning) to be able to handle examples it hasn’t seen before.

Here’s an example of each:

Example of The Biased Hypothesis Space In Machine Learning

The Biased Hypothesis space in machine learning is a biased subspace where your algorithm does not consider all training examples to make predictions.

This is easiest to see with an example.

Let’s say you have the following data:

Happy  and  Sunny  and  Stomach Full  = True

Whenever your algorithm sees those three together in the biased hypothesis space, it’ll automatically default to true.

This means when your algorithm sees:

Sad  and  Sunny  And  Stomach Full  = False

It’ll automatically default to False since it didn’t appear in our subspace.

This is a greedy approach, but it has some practical applications.

greedy

Example of the Unbiased Hypothesis Space In Machine Learning

The unbiased hypothesis space is a space where all combinations are stored.

We can use re-use our example above:

This would start to breakdown as

Happy  = True

Happy  and  Sunny  = True

Happy  and  Stomach Full  = True

Let’s say you have four options for each of the three choices.

This would mean our subspace would need 2^12 instances (4096) just for our little three-word problem.

This is practically impossible; the space would become huge.

subspace

So while it would be highly accurate, this has no scalability.

More reading on this idea can be found in our post, Inductive Bias In Machine Learning .

Why Do We Restrict Hypothesis Space In Artificial Intelligence?

We have to restrict the hypothesis space in machine learning. Without any restrictions, our domain becomes much too large, and we lose any form of scalability.

This is why our algorithm creates rules to handle examples that are seen in production. 

This gives our algorithms a generalized approach that will be able to handle all new examples that are in the same format.

Other Quick Machine Learning Tutorials

At EML, we have a ton of cool data science tutorials that break things down so anyone can understand them.

Below we’ve listed a few that are similar to this guide:

  • Instance-Based Learning in Machine Learning
  • Types of Data For Machine Learning
  • Verbose in Machine Learning
  • Generalization In Machine Learning
  • Epoch In Machine Learning
  • Inductive Bias in Machine Learning
  • Understanding The Hypothesis In Machine Learning
  • Zip Codes In Machine Learning
  • get_dummies() in Machine Learning
  • Bootstrapping In Machine Learning
  • X and Y in Machine Learning
  • F1 Score in Machine Learning
  • Recent Posts

Stewart Kaplan

  • Deploy Applications in Data Science on Heroku [Boost Your Skills Now] - April 26, 2024
  • Mastering the Algorithm for Neural Network [Unlock the Power!] - April 26, 2024
  • How to Make an ISO File Without Any Software? [Create It Like a Pro!] - April 25, 2024

Trending now

Multivariate Polynomial Regression Python

Hypothesis Space

  • Reference work entry
  • Cite this reference work entry

hypothesis and hypothesis space

  • Eyke Hüllermeier 5 ,
  • Thomas Fober 5 &
  • Marco Mernberger 5  

124 Accesses

In machine learning, the goal of a supervised learning algorithm is to perform induction, i.e., to generalize a (finite) set of observations (the training data) into a general model of the domain. In this regard, the hypothesis space is defined as the set of candidate models considered by the algorithm.

More specifically, consider the problem of learning a mapping (model) \( f \in F = Y^X \) from an input space X to an output space Y , given a set of training data \( D = \left\{ {\left( {{x_1},{y_1}} \right),...,\left( {{x_n},{y_n}} \right)} \right\} \subset X \times Y \) . A learning algorithm A takes D as an input and produces a function (model, hypothesis) f ∈ H ⊂ F as an output, where H is the hypothesis space. This subset is determined by the formalism used to represent models (e.g., as logical formulas, linear functions, or non-linear functions implemented as artificial neural networks or decision trees ). Thus, the choice of the hypothesis space produces a representation...

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
  • Available as EPUB and PDF
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Author information

Authors and affiliations.

Philipps-Universität Marburg, Hans-Meerwein-Straße, Marburg, Germany

Eyke Hüllermeier, Thomas Fober & Marco Mernberger

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Eyke Hüllermeier .

Editor information

Editors and affiliations.

Biomedical Sciences Research Institute, University of Ulster, Coleraine, UK

Werner Dubitzky

Department of Computer Science, University of Rostock, Rostock, Germany

Olaf Wolkenhauer

Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Republic of Korea

Kwang-Hyun Cho

Department of Biomedical Engineering, Rensselaer Polytechnic Institute, Troy, NY, USA

Hiroki Yokota

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer Science+Business Media, LLC

About this entry

Cite this entry.

Hüllermeier, E., Fober, T., Mernberger, M. (2013). Hypothesis Space. In: Dubitzky, W., Wolkenhauer, O., Cho, KH., Yokota, H. (eds) Encyclopedia of Systems Biology. Springer, New York, NY. https://doi.org/10.1007/978-1-4419-9863-7_926

Download citation

DOI : https://doi.org/10.1007/978-1-4419-9863-7_926

Publisher Name : Springer, New York, NY

Print ISBN : 978-1-4419-9862-0

Online ISBN : 978-1-4419-9863-7

eBook Packages : Biomedical and Life Sciences Reference Module Biomedical and Life Sciences

Share this entry

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research

Hypothesis Spaces for Deep Learning

This paper introduces a hypothesis space for deep learning that employs deep neural networks (DNNs). By treating a DNN as a function of two variables, the physical variable and parameter variable, we consider the primitive set of the DNNs for the parameter variable located in a set of the weight matrices and biases determined by a prescribed depth and widths of the DNNs. We then complete the linear span of the primitive DNN set in a weak* topology to construct a Banach space of functions of the physical variable. We prove that the Banach space so constructed is a reproducing kernel Banach space (RKBS) and construct its reproducing kernel. We investigate two learning models, regularized learning and minimum interpolation problem in the resulting RKBS, by establishing representer theorems for solutions of the learning models. The representer theorems unfold that solutions of these learning models can be expressed as linear combination of a finite number of kernel sessions determined by given data and the reproducing kernel.

Key words : Reproducing kernel Banach space, deep learning, deep neural network, representer theorem for deep learning

1 Introduction

Deep learning has been a huge success in applications. Mathematically, its success is due to the use of deep neural networks (DNNs), neural networks of multiple layers, to describe decision functions. Various mathematical aspects of DNNs as an approximation tool were investigated recently in a number of studies [ 9 , 11 , 13 , 16 , 20 , 27 , 28 , 31 ] . As pointed out in [ 8 ] , learning processes do not take place in a vacuum. Classical learning methods took place in a reproducing kernel Hilbert space (RKHS) [ 1 ] , which leads to representation of learning solutions in terms of a combination of a finite number of kernel sessions [ 19 ] of a universal kernel [ 17 ] . Reproducing kernel Hilbert spaces as appropriate hypothesis spaces for classical learning methods provide a foundation for mathematical analysis of the learning methods. A natural and imperative question is what are appropriate hypothesis spaces for deep learning. Although hypothesis spaces for learning with shallow neural networks (networks of one hidden layer) were investigated recently in a number of studies, (e.g. [ 2 , 6 , 18 , 21 ] ), appropriate hypothesis spaces for deep learning are still absent. The goal of the present study is to understand this imperative theoretical issue.

The road-map of constructing the hypothesis space for deep learning may be described as follows. We treat a DNN as a function of two variables, one being the physical variable and the other being the parameter variable. We then consider the set of the DNNs as functions of the physical variable for the parameter variable taking all elements of the set of the weight matrices and biases determined by a prescribed depth and widths of the DNNs. Upon completing the linear span of the DNN set in a weak* topology, we construct a Banach space of functions of the physical variable. We establish that the resulting Banach space is a reproducing kernel Banach space (RKBS), on which point-evaluation functionals are continuous, and construct an asymmetric reproducing kernel, for the space, which is a function of the two variables, the physical variable and the parameter variable. We regard the constructed RKBS as the hypothesis space for deep learning. We remark that when deep neural networks reduce to shallow network (having only one hidden layer), our hypothesis space coincides the space for shallow learning studied in [ 2 ] .

Upon introducing the hypothesis space for deep learning, we investigate two learning models, the regularized learning and minimum interpolation problem in the resulting RKBS. We establish representer theorems for solutions of the learning models by employing theory of the reproducing kernel Banach space developed in [ 25 , 26 , 29 ] and representer theorems for solutions of learning in a general RKBS established in [ 4 , 23 , 24 ] . Like the representer theorems for the classical learning in RKHSs, the resulting representer theorems for the two deep learning models in the RKBS reveal that although the learning models are of infinite dimension, their solutions lay in finite dimensional manifolds. More specifically, they can be expressed as a linear combination of a finite number of kernel sessions, the reproducing kernel evaluated the parameter variable at points determined by given data. The representer theorems established in this paper is data-dependent. Even when deep neural networks reduce to a shallow network, the corresponding representer theorem is still new to our best acknowledge. The hypothesis space and the representer theorems for the two deep learning models in it provide us prosperous insights of deep learning and supply deep learning a sound mathematical foundation for further investigation.

We organize this paper in six sections. We describe in Section 2 an innate deep learning model with DNNs. Aiming at formulating reproducing kernel Banach spaces as hypothesis spaces for deep learning, in Section 3 we elucidate the notion of vector-valued reproducing kernel Banach spaces. Section 4 is entirely devoted to the development of the hypothesis space for deep learning. We specifically show that the completion of the linear span of the primitive DNN set, pertaining to the innate learning model, in a weak* topology is an RKBS, which constitutes the hypothesis space for deep learning. In Section 5, we study learning models in the RKBS, establishing representer theorems for solutions of two learning models (regularized learning and minimum norm interpolation) in the hypothesis space. We conclude this paper in Section 6 with remarks on advantages of learning in the proposed hypothesis space.

2 Learning with Deep Neural Networks

We describe in this section an innate learning model with DNNs, considered wildly in the machine learning community.

We first recall the notation of DNNs. Let s 𝑠 s italic_s and t 𝑡 t italic_t be positive integers. A DNN is a vector-valued function from ℝ s superscript ℝ 𝑠 \mathbb{R}^{s} blackboard_R start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT to ℝ t superscript ℝ 𝑡 \mathbb{R}^{t} blackboard_R start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT formed by compositions of functions, each of which is defined by an activation function applied to an affine map. Specifically, for a given univariate function σ : ℝ → ℝ : 𝜎 → ℝ ℝ \sigma:\mathbb{R}\to\mathbb{R} italic_σ : blackboard_R → blackboard_R , we define a vector-valued function by

𝑗 1 f_{j+1} italic_f start_POSTSUBSCRIPT italic_j + 1 end_POSTSUBSCRIPT , for j ∈ ℕ k − 1 𝑗 subscript ℕ 𝑘 1 j\in\mathbb{N}_{k-1} italic_j ∈ blackboard_N start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT , we denote the consecutive composition of f j subscript 𝑓 𝑗 f_{j} italic_f start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , j ∈ ℕ k 𝑗 subscript ℕ 𝑘 j\in\mathbb{N}_{k} italic_j ∈ blackboard_N start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , by

whose domain is that of f 1 subscript 𝑓 1 f_{1} italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT . Suppose that D ∈ ℕ 𝐷 ℕ D\in\mathbb{N} italic_D ∈ blackboard_N is prescribed and fixed. Throughout this paper, we always let m 0 := s assign subscript 𝑚 0 𝑠 m_{0}:=s italic_m start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT := italic_s and m D := t assign subscript 𝑚 𝐷 𝑡 m_{D}:=t italic_m start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT := italic_t . We specify positive integers m j subscript 𝑚 𝑗 m_{j} italic_m start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , j ∈ ℕ D − 1 𝑗 subscript ℕ 𝐷 1 j\in\mathbb{N}_{D-1} italic_j ∈ blackboard_N start_POSTSUBSCRIPT italic_D - 1 end_POSTSUBSCRIPT . For 𝐖 j ∈ ℝ m j × m j − 1 subscript 𝐖 𝑗 superscript ℝ subscript 𝑚 𝑗 subscript 𝑚 𝑗 1 \mathbf{W}_{j}\in\mathbb{R}^{m_{j}\times m_{j-1}} bold_W start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_m start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT × italic_m start_POSTSUBSCRIPT italic_j - 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT and 𝐛 j ∈ ℝ m j subscript 𝐛 𝑗 superscript ℝ subscript 𝑚 𝑗 \mathbf{b}_{j}\in\mathbb{R}^{m_{j}} bold_b start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_m start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUPERSCRIPT , j ∈ ℕ D 𝑗 subscript ℕ 𝐷 j\in\mathbb{N}_{D} italic_j ∈ blackboard_N start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT , a DNN is a function defined by

Note that x 𝑥 x italic_x is the input vector and 𝒩 D superscript 𝒩 𝐷 \mathcal{N}^{D} caligraphic_N start_POSTSUPERSCRIPT italic_D end_POSTSUPERSCRIPT has D − 1 𝐷 1 D-1 italic_D - 1 hidden layers and an output layer, which is the D 𝐷 D italic_D -th layer.

A DNN may be represented in a recursive manner. From definition ( 1 ), a DNN can be defined recursively by

We write 𝒩 D superscript 𝒩 𝐷 \mathcal{N}^{D} caligraphic_N start_POSTSUPERSCRIPT italic_D end_POSTSUPERSCRIPT as 𝒩 D ⁢ ( ⋅ , { 𝐖 j , 𝐛 j } j = 1 D ) superscript 𝒩 𝐷 ⋅ superscript subscript subscript 𝐖 𝑗 subscript 𝐛 𝑗 𝑗 1 𝐷 \mathcal{N}^{D}(\cdot,\{\mathbf{W}_{j},\mathbf{b}_{j}\}_{j=1}^{D}) caligraphic_N start_POSTSUPERSCRIPT italic_D end_POSTSUPERSCRIPT ( ⋅ , { bold_W start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , bold_b start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_D end_POSTSUPERSCRIPT ) when it is necessary to indicate the dependence of DNNs on the parameters. In this paper, when we write the set { 𝐖 j , 𝐛 j } j = 1 D superscript subscript subscript 𝐖 𝑗 subscript 𝐛 𝑗 𝑗 1 𝐷 \{\mathbf{W}_{j},\mathbf{b}_{j}\}_{j=1}^{D} { bold_W start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , bold_b start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_D end_POSTSUPERSCRIPT associated with the neural network 𝒩 D superscript 𝒩 𝐷 \mathcal{N}^{D} caligraphic_N start_POSTSUPERSCRIPT italic_D end_POSTSUPERSCRIPT , we implicitly give it the order inherited from the definition of 𝒩 D superscript 𝒩 𝐷 \mathcal{N}^{D} caligraphic_N start_POSTSUPERSCRIPT italic_D end_POSTSUPERSCRIPT . Throughout this paper, we assume that the activation function σ 𝜎 \sigma italic_σ is continuous.

It is advantageous to consider the DNN 𝒩 D superscript 𝒩 𝐷 \mathcal{N}^{D} caligraphic_N start_POSTSUPERSCRIPT italic_D end_POSTSUPERSCRIPT defined above as a function of two variables, one being the physical variable x ∈ ℝ s 𝑥 superscript ℝ 𝑠 x\in\mathbb{R}^{s} italic_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT and the other being the parameter variable θ := { 𝐖 j , 𝐛 j } j = 1 D assign 𝜃 superscript subscript subscript 𝐖 𝑗 subscript 𝐛 𝑗 𝑗 1 𝐷 \theta:=\{\mathbf{W}_{j},\mathbf{b}_{j}\}_{j=1}^{D} italic_θ := { bold_W start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , bold_b start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_D end_POSTSUPERSCRIPT . Given positive integers m j subscript 𝑚 𝑗 m_{j} italic_m start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , j ∈ ℕ D − 1 𝑗 subscript ℕ 𝐷 1 j\in\mathbb{N}_{D-1} italic_j ∈ blackboard_N start_POSTSUBSCRIPT italic_D - 1 end_POSTSUBSCRIPT , we let

denote the width set and define the primitive set of DNNs of D 𝐷 D italic_D layers by

Clearly, the set 𝒜 𝕎 subscript 𝒜 𝕎 \mathcal{A}_{\mathbb{W}} caligraphic_A start_POSTSUBSCRIPT blackboard_W end_POSTSUBSCRIPT defined by ( 3 ) depends not only on 𝕎 𝕎 \mathbb{W} blackboard_W but also on D 𝐷 D italic_D . For the sake of simplicity, we will not indicate the dependence on D 𝐷 D italic_D in our notation when ambiguity is not caused. For example, we will use 𝒩 𝒩 \mathcal{N} caligraphic_N for 𝒩 D superscript 𝒩 𝐷 \mathcal{N}^{D} caligraphic_N start_POSTSUPERSCRIPT italic_D end_POSTSUPERSCRIPT . Moreover, an element of 𝒜 𝕎 subscript 𝒜 𝕎 \mathcal{A}_{\mathbb{W}} caligraphic_A start_POSTSUBSCRIPT blackboard_W end_POSTSUBSCRIPT is a vector-valued function mapping from ℝ s superscript ℝ 𝑠 \mathbb{R}^{s} blackboard_R start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT to ℝ t superscript ℝ 𝑡 \mathbb{R}^{t} blackboard_R start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT . We shall understand the set 𝒜 𝕎 subscript 𝒜 𝕎 \mathcal{A}_{\mathbb{W}} caligraphic_A start_POSTSUBSCRIPT blackboard_W end_POSTSUBSCRIPT . To this end, we define the parameter space Θ Θ \Theta roman_Θ by letting

Note that Θ Θ \Theta roman_Θ is measurable. For x ∈ ℝ s 𝑥 superscript ℝ 𝑠 {x}\in\mathbb{R}^{s} italic_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT and θ ∈ Θ 𝜃 Θ \theta\in\Theta italic_θ ∈ roman_Θ , we define

For x ∈ ℝ s 𝑥 superscript ℝ 𝑠 {x}\in\mathbb{R}^{s} italic_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT and θ ∈ Θ 𝜃 Θ \theta\in\Theta italic_θ ∈ roman_Θ , there holds 𝒩 ⁢ ( x , θ ) ∈ ℝ t 𝒩 𝑥 𝜃 superscript ℝ 𝑡 \mathcal{N}({x},\theta)\in\mathbb{R}^{t} caligraphic_N ( italic_x , italic_θ ) ∈ blackboard_R start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT . In this notation, set 𝒜 𝕎 subscript 𝒜 𝕎 \mathcal{A}_{\mathbb{W}} caligraphic_A start_POSTSUBSCRIPT blackboard_W end_POSTSUBSCRIPT may be written as

We now describe the innate learning model with DNNs. Suppose that a training dataset

is given and we would like to train a neural network from the dataset. We denote by ℒ ⁢ ( 𝒩 , 𝔻 m ) : Θ → ℝ : ℒ 𝒩 subscript 𝔻 𝑚 → Θ ℝ \mathcal{L}(\mathcal{N},\mathbb{D}_{m}):\Theta\to\mathbb{R} caligraphic_L ( caligraphic_N , blackboard_D start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) : roman_Θ → blackboard_R a loss function determined by the dataset 𝔻 m subscript 𝔻 𝑚 \mathbb{D}_{m} blackboard_D start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT . For example, a loss function may take the form

where ∥ ⋅ ∥ \|\cdot\| ∥ ⋅ ∥ is a norm of ℝ t superscript ℝ 𝑡 \mathbb{R}^{t} blackboard_R start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT . Given a loss function, a typical deep learning model is to train the parameters θ ∈ Θ 𝕎 𝜃 subscript Θ 𝕎 \theta\in\Theta_{\mathbb{W}} italic_θ ∈ roman_Θ start_POSTSUBSCRIPT blackboard_W end_POSTSUBSCRIPT from the training dataset 𝔻 m subscript 𝔻 𝑚 \mathbb{D}_{m} blackboard_D start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT by solving the optimization problem

where 𝒩 𝒩 \mathcal{N} caligraphic_N has the form in equation ( 5 ). Equivalently, optimization problem ( 7 ) may be written as

Model ( 8 ) is an innate learning model considered wildly in the machine learning community. Note that the set 𝒜 𝕎 subscript 𝒜 𝕎 \mathcal{A}_{\mathbb{W}} caligraphic_A start_POSTSUBSCRIPT blackboard_W end_POSTSUBSCRIPT lacks either algebraic or topological structures. It is difficult to conduct mathematical analysis for learning model ( 8 ). Even the existence of its solution is not guaranteed.

We introduce a vector space that contains 𝒜 𝕎 subscript 𝒜 𝕎 \mathcal{A}_{\mathbb{W}} caligraphic_A start_POSTSUBSCRIPT blackboard_W end_POSTSUBSCRIPT and consider learning in the vector space. For this purpose, given a set 𝕎 𝕎 \mathbb{W} blackboard_W of weight widths defined by ( 2 ), we define the set

In the next proposition, we present properties of ℬ 𝕎 subscript ℬ 𝕎 \mathcal{B}_{\mathbb{W}} caligraphic_B start_POSTSUBSCRIPT blackboard_W end_POSTSUBSCRIPT .

Proposition 1 .

If 𝕎 𝕎 \mathbb{W} blackboard_W is the width set defined by ( 2 ), then

(i) ℬ 𝕎 subscript ℬ 𝕎 \mathcal{B}_{\mathbb{W}} caligraphic_B start_POSTSUBSCRIPT blackboard_W end_POSTSUBSCRIPT defined by ( 9 ) is the smallest vector space on ℝ ℝ \mathbb{R} blackboard_R that contains the set 𝒜 𝕎 subscript 𝒜 𝕎 \mathcal{A}_{\mathbb{W}} caligraphic_A start_POSTSUBSCRIPT blackboard_W end_POSTSUBSCRIPT ,

(ii) ℬ 𝕎 subscript ℬ 𝕎 \mathcal{B}_{\mathbb{W}} caligraphic_B start_POSTSUBSCRIPT blackboard_W end_POSTSUBSCRIPT is of infinite dimension,

(iii) ℬ 𝕎 ⊂ ⋃ n ∈ ℕ 𝒜 n ⁢ 𝕎 subscript ℬ 𝕎 subscript 𝑛 ℕ subscript 𝒜 𝑛 𝕎 \mathcal{B}_{\mathbb{W}}\subset\bigcup_{n\in\mathbb{N}}\mathcal{A}_{n\mathbb{W}} caligraphic_B start_POSTSUBSCRIPT blackboard_W end_POSTSUBSCRIPT ⊂ ⋃ start_POSTSUBSCRIPT italic_n ∈ blackboard_N end_POSTSUBSCRIPT caligraphic_A start_POSTSUBSCRIPT italic_n blackboard_W end_POSTSUBSCRIPT .

It is clear that ℬ 𝕎 subscript ℬ 𝕎 \mathcal{B}_{\mathbb{W}} caligraphic_B start_POSTSUBSCRIPT blackboard_W end_POSTSUBSCRIPT may be identified as the linear span of 𝒜 𝕎 subscript 𝒜 𝕎 \mathcal{A}_{\mathbb{W}} caligraphic_A start_POSTSUBSCRIPT blackboard_W end_POSTSUBSCRIPT , that is,

Thus, ℬ 𝕎 subscript ℬ 𝕎 \mathcal{B}_{\mathbb{W}} caligraphic_B start_POSTSUBSCRIPT blackboard_W end_POSTSUBSCRIPT is the smallest vector space containing 𝒜 𝕎 subscript 𝒜 𝕎 \mathcal{A}_{\mathbb{W}} caligraphic_A start_POSTSUBSCRIPT blackboard_W end_POSTSUBSCRIPT . Item (ii) follows directly from the definition ( 9 ) of ℬ 𝕎 subscript ℬ 𝕎 \mathcal{B}_{\mathbb{W}} caligraphic_B start_POSTSUBSCRIPT blackboard_W end_POSTSUBSCRIPT .

It remains to prove Item (iii). To this end, we let f ∈ ℬ 𝕎 𝑓 subscript ℬ 𝕎 f\in\mathcal{B}_{\mathbb{W}} italic_f ∈ caligraphic_B start_POSTSUBSCRIPT blackboard_W end_POSTSUBSCRIPT . By the definition ( 9 ) of ℬ 𝕎 subscript ℬ 𝕎 \mathcal{B}_{\mathbb{W}} caligraphic_B start_POSTSUBSCRIPT blackboard_W end_POSTSUBSCRIPT , there exist n ′ ∈ ℕ superscript 𝑛 ′ ℕ n^{\prime}\in\mathbb{N} italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ blackboard_N , c l ∈ ℝ subscript 𝑐 𝑙 ℝ c_{l}\in\mathbb{R} italic_c start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ∈ blackboard_R , θ l ∈ Θ 𝕎 subscript 𝜃 𝑙 subscript Θ 𝕎 \theta_{l}\in\Theta_{\mathbb{W}} italic_θ start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ∈ roman_Θ start_POSTSUBSCRIPT blackboard_W end_POSTSUBSCRIPT , for l ∈ ℕ n ′ 𝑙 subscript ℕ superscript 𝑛 ′ l\in\mathbb{N}_{n^{\prime}} italic_l ∈ blackboard_N start_POSTSUBSCRIPT italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT such that

It suffices to show that f ∈ 𝒜 n ′ ⁢ 𝕎 𝑓 subscript 𝒜 superscript 𝑛 ′ 𝕎 f\in\mathcal{A}_{n^{\prime}\mathbb{W}} italic_f ∈ caligraphic_A start_POSTSUBSCRIPT italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT blackboard_W end_POSTSUBSCRIPT . Noting that θ l := { 𝐖 j l , 𝐛 j l } j = 1 D assign subscript 𝜃 𝑙 superscript subscript superscript subscript 𝐖 𝑗 𝑙 superscript subscript 𝐛 𝑗 𝑙 𝑗 1 𝐷 \theta_{l}:=\{\mathbf{W}_{j}^{l},\mathbf{b}_{j}^{l}\}_{j=1}^{D} italic_θ start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT := { bold_W start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT , bold_b start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_D end_POSTSUPERSCRIPT , for l ∈ ℕ n ′ 𝑙 subscript ℕ superscript 𝑛 ′ l\in\mathbb{N}_{n^{\prime}} italic_l ∈ blackboard_N start_POSTSUBSCRIPT italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT , we set

Clearly, we have that 𝐖 ~ 1 ∈ ℝ ( n ′ ⁢ m 1 ) × m 0 subscript ~ 𝐖 1 superscript ℝ superscript 𝑛 ′ subscript 𝑚 1 subscript 𝑚 0 \widetilde{\mathbf{W}}_{1}\in\mathbb{R}^{(n^{\prime}m_{1})\times{m_{0}}} over~ start_ARG bold_W end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT ( italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT italic_m start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) × italic_m start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT , 𝐛 ~ j ∈ ℝ n ′ ⁢ m j subscript ~ 𝐛 𝑗 superscript ℝ superscript 𝑛 ′ subscript 𝑚 𝑗 \widetilde{\mathbf{b}}_{j}\in\mathbb{R}^{n^{\prime}m_{j}} over~ start_ARG bold_b end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT italic_m start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUPERSCRIPT , j ∈ ℕ D − 1 𝑗 subscript ℕ 𝐷 1 j\in\mathbb{N}_{D-1} italic_j ∈ blackboard_N start_POSTSUBSCRIPT italic_D - 1 end_POSTSUBSCRIPT , 𝐖 ~ j ∈ ℝ ( n ′ ⁢ m j ) × ( n ′ ⁢ m j − 1 ) subscript ~ 𝐖 𝑗 superscript ℝ superscript 𝑛 ′ subscript 𝑚 𝑗 superscript 𝑛 ′ subscript 𝑚 𝑗 1 \widetilde{\mathbf{W}}_{j}\in\mathbb{R}^{(n^{\prime}m_{j})\times(n^{\prime}m_{% j-1})} over~ start_ARG bold_W end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT ( italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT italic_m start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) × ( italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT italic_m start_POSTSUBSCRIPT italic_j - 1 end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT , j ∈ ℕ D − 1 \ { 1 } 𝑗 \ subscript ℕ 𝐷 1 1 j\in\mathbb{N}_{D-1}\backslash\{1\} italic_j ∈ blackboard_N start_POSTSUBSCRIPT italic_D - 1 end_POSTSUBSCRIPT \ { 1 } , 𝐖 ~ D ∈ ℝ m D × ( n ′ ⁢ m D − 1 ) subscript ~ 𝐖 𝐷 superscript ℝ subscript 𝑚 𝐷 superscript 𝑛 ′ subscript 𝑚 𝐷 1 \widetilde{\mathbf{W}}_{D}\in\mathbb{R}^{m_{D}\times(n^{\prime}m_{D-1})} over~ start_ARG bold_W end_ARG start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_m start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT × ( italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT italic_m start_POSTSUBSCRIPT italic_D - 1 end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT , and 𝐛 ~ D ∈ ℝ m D subscript ~ 𝐛 𝐷 superscript ℝ subscript 𝑚 𝐷 \widetilde{\mathbf{b}}_{D}\in\mathbb{R}^{m_{D}} over~ start_ARG bold_b end_ARG start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_m start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT end_POSTSUPERSCRIPT . Direct computation confirms that f ⁢ ( ⋅ ) = 𝒩 ⁢ ( ⋅ , θ ~ ) 𝑓 ⋅ 𝒩 ⋅ ~ 𝜃 f(\cdot)=\mathcal{N}(\cdot,\widetilde{\theta}) italic_f ( ⋅ ) = caligraphic_N ( ⋅ , over~ start_ARG italic_θ end_ARG ) with θ ~ := { 𝐖 ~ j , 𝐛 ~ j } j = 1 D assign ~ 𝜃 superscript subscript subscript ~ 𝐖 𝑗 subscript ~ 𝐛 𝑗 𝑗 1 𝐷 \widetilde{\theta}:=\{\widetilde{\mathbf{W}}_{j},\widetilde{\mathbf{b}}_{j}\}_% {j=1}^{D} over~ start_ARG italic_θ end_ARG := { over~ start_ARG bold_W end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , over~ start_ARG bold_b end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_D end_POSTSUPERSCRIPT . By definition ( 3 ), f ∈ 𝒜 n ′ ⁢ 𝕎 𝑓 subscript 𝒜 superscript 𝑛 ′ 𝕎 f\in\mathcal{A}_{n^{\prime}\mathbb{W}} italic_f ∈ caligraphic_A start_POSTSUBSCRIPT italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT blackboard_W end_POSTSUBSCRIPT . ∎

Proposition 1 reveals that ℬ 𝕎 subscript ℬ 𝕎 \mathcal{B}_{\mathbb{W}} caligraphic_B start_POSTSUBSCRIPT blackboard_W end_POSTSUBSCRIPT is the smallest vector space that contains 𝒜 𝕎 subscript 𝒜 𝕎 \mathcal{A}_{\mathbb{W}} caligraphic_A start_POSTSUBSCRIPT blackboard_W end_POSTSUBSCRIPT . Hence, it is a reasonable substitute of 𝒜 𝕎 subscript 𝒜 𝕎 \mathcal{A}_{\mathbb{W}} caligraphic_A start_POSTSUBSCRIPT blackboard_W end_POSTSUBSCRIPT . Motivated by Proposition 1 , we propose the following alternative learning model

For a given width set 𝕎 𝕎 \mathbb{W} blackboard_W , unlike learning model ( 8 ) which searches a minimizer in set 𝒜 𝕎 subscript 𝒜 𝕎 \mathcal{A}_{\mathbb{W}} caligraphic_A start_POSTSUBSCRIPT blackboard_W end_POSTSUBSCRIPT , learning model ( 10 ) seeks a minimizer in the vector space ℬ 𝕎 subscript ℬ 𝕎 \mathcal{B}_{\mathbb{W}} caligraphic_B start_POSTSUBSCRIPT blackboard_W end_POSTSUBSCRIPT , which contains 𝒜 𝕎 subscript 𝒜 𝕎 \mathcal{A}_{\mathbb{W}} caligraphic_A start_POSTSUBSCRIPT blackboard_W end_POSTSUBSCRIPT and is contained in 𝒜 := ⋃ n ∈ ℕ 𝒜 n ⁢ 𝕎 assign 𝒜 subscript 𝑛 ℕ subscript 𝒜 𝑛 𝕎 \mathcal{A}:=\bigcup_{n\in\mathbb{N}}\mathcal{A}_{n\mathbb{W}} caligraphic_A := ⋃ start_POSTSUBSCRIPT italic_n ∈ blackboard_N end_POSTSUBSCRIPT caligraphic_A start_POSTSUBSCRIPT italic_n blackboard_W end_POSTSUBSCRIPT . According to Proposition 1 , learning model ( 10 ) is “semi-equivalent” to learning model ( 8 ) in the sense that

where 𝒩 ℬ 𝕎 subscript 𝒩 subscript ℬ 𝕎 \mathcal{N}_{\mathcal{B}_{\mathbb{W}}} caligraphic_N start_POSTSUBSCRIPT caligraphic_B start_POSTSUBSCRIPT blackboard_W end_POSTSUBSCRIPT end_POSTSUBSCRIPT is a minimizer of model ( 10 ), 𝒩 𝒜 𝕎 subscript 𝒩 subscript 𝒜 𝕎 \mathcal{N}_{\mathcal{A}_{\mathbb{W}}} caligraphic_N start_POSTSUBSCRIPT caligraphic_A start_POSTSUBSCRIPT blackboard_W end_POSTSUBSCRIPT end_POSTSUBSCRIPT and 𝒩 𝒜 subscript 𝒩 𝒜 \mathcal{N}_{\mathcal{A}} caligraphic_N start_POSTSUBSCRIPT caligraphic_A end_POSTSUBSCRIPT are the minimizers of model ( 8 ) and model ( 8 ) with the set 𝒜 𝕎 subscript 𝒜 𝕎 \mathcal{A}_{\mathbb{W}} caligraphic_A start_POSTSUBSCRIPT blackboard_W end_POSTSUBSCRIPT replaced by 𝒜 𝒜 \mathcal{A} caligraphic_A , respectively. One might argue that since model ( 8 ) is a finite dimension optimization problem while model ( 10 ) is an infinite dimensional one, the alternative model ( 10 ) may add unnecessary complexity to the original model. Although model ( 10 ) is of infinite dimension, the algebraic structure of the vector space ℬ 𝕎 subscript ℬ 𝕎 \mathcal{B}_{\mathbb{W}} caligraphic_B start_POSTSUBSCRIPT blackboard_W end_POSTSUBSCRIPT and its topological structure that will be equipped later provide us with great advantages for mathematical analysis of learning on the space. As a matter of fact, the vector-valued RKBS to be obtained by completing the vector space ℬ 𝕎 subscript ℬ 𝕎 \mathcal{B}_{\mathbb{W}} caligraphic_B start_POSTSUBSCRIPT blackboard_W end_POSTSUBSCRIPT in a weak* topology will lead to the miraculous representer theorem, of the learned solution, which reduces the infinite dimensional optimization problem to a finite dimension one. This addresses the challenges caused by the infinite dimension of the space ℬ 𝕎 subscript ℬ 𝕎 \mathcal{B}_{\mathbb{W}} caligraphic_B start_POSTSUBSCRIPT blackboard_W end_POSTSUBSCRIPT .

3 Vector-Valued Reproducing Kernel Banach Space

It was proved in the last section that for a given width set 𝕎 𝕎 \mathbb{W} blackboard_W , the set ℬ 𝕎 subscript ℬ 𝕎 \mathcal{B}_{\mathbb{W}} caligraphic_B start_POSTSUBSCRIPT blackboard_W end_POSTSUBSCRIPT defined by ( 9 ) is the smallest vector space that contains the primitive set 𝒜 𝕎 subscript 𝒜 𝕎 \mathcal{A}_{\mathbb{W}} caligraphic_A start_POSTSUBSCRIPT blackboard_W end_POSTSUBSCRIPT . One of the aims of this paper is to establish that the vector space ℬ 𝕎 subscript ℬ 𝕎 \mathcal{B}_{\mathbb{W}} caligraphic_B start_POSTSUBSCRIPT blackboard_W end_POSTSUBSCRIPT is dense in a weak* topology in a vector-valued RKBS. For this purpose, in this section we describe the notion of vector-valued RKBSs.

A Banach space ℬ ℬ \mathcal{B} caligraphic_B with the norm ∥ ⋅ ∥ ℬ \|\cdot\|_{\mathcal{B}} ∥ ⋅ ∥ start_POSTSUBSCRIPT caligraphic_B end_POSTSUBSCRIPT is called a space of vector-valued functions on a prescribed set X 𝑋 X italic_X if ℬ ℬ \mathcal{B} caligraphic_B is composed of vector-valued functions defined on X 𝑋 X italic_X and for each f ∈ ℬ 𝑓 ℬ f\in\mathcal{B} italic_f ∈ caligraphic_B , ‖ f ‖ ℬ = 0 subscript norm 𝑓 ℬ 0 \|f\|_{\mathcal{B}}=0 ∥ italic_f ∥ start_POSTSUBSCRIPT caligraphic_B end_POSTSUBSCRIPT = 0 implies that f ⁢ ( x ) = 𝟎 𝑓 𝑥 0 f({x})=\mathbf{0} italic_f ( italic_x ) = bold_0 for all x ∈ X 𝑥 𝑋 {x}\in X italic_x ∈ italic_X . For each x ∈ X 𝑥 𝑋 {x}\in X italic_x ∈ italic_X , we define the point evaluation operator δ x : ℬ → ℝ n : subscript 𝛿 𝑥 → ℬ superscript ℝ 𝑛 \delta_{{x}}:\mathcal{B}\to\mathbb{R}^{n} italic_δ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT : caligraphic_B → blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT as

We provide the definition of vector-valued RKBSs below.

Definition 2 .

A Banach space ℬ ℬ \mathcal{B} caligraphic_B of vector-valued functions from X 𝑋 X italic_X to ℝ n superscript ℝ 𝑛 \mathbb{R}^{n} blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT is called a vector-valued RKBS if there exists a norm ∥ ⋅ ∥ \|\cdot\| ∥ ⋅ ∥ of ℝ n superscript ℝ 𝑛 \mathbb{R}^{n} blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT such that for each x ∈ X 𝑥 𝑋 x\in X italic_x ∈ italic_X , the point evaluation operator δ x subscript 𝛿 𝑥 \delta_{x} italic_δ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT is continuous with respect to the norm ∥ ⋅ ∥ \|\cdot\| ∥ ⋅ ∥ of ℝ n superscript ℝ 𝑛 \mathbb{R}^{n} blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT on ℬ ℬ \mathcal{B} caligraphic_B , that is, for each x ∈ X 𝑥 𝑋 x\in X italic_x ∈ italic_X , there exists a constant C x > 0 subscript 𝐶 𝑥 0 C_{x}>0 italic_C start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT > 0 such that

Note that since all norms of ℝ n superscript ℝ 𝑛 \mathbb{R}^{n} blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT are equivalent, if a Banach space ℬ ℬ \mathcal{B} caligraphic_B of vector-valued functions from X 𝑋 X italic_X to ℝ n superscript ℝ 𝑛 \mathbb{R}^{n} blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT is a vector-valued RKBS with respect to a norm of ℝ n superscript ℝ 𝑛 \mathbb{R}^{n} blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT , then it must be a vector-valued RKBS with respect to any other norm of ℝ n superscript ℝ 𝑛 \mathbb{R}^{n} blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT . Thus, the property of point evaluation operators being continuous on space ℬ ℬ \mathcal{B} caligraphic_B is independent of the choice of the norm of the output space ℝ n superscript ℝ 𝑛 \mathbb{R}^{n} blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT .

The notion of RKBSs was originally introduced in [ 29 ] , to guarantee the stability of sampling process and to serve as a hypothesis space for sparse machine learning. Vector-valued RKBSs were studied in [ 14 , 30 ] , in which the definition of the vector-valued RKBS involves an abstract Banach space, with a specific norm, as the output space of functions. In Definition 2 , we limit the output space to the Euclidean space ℝ n superscript ℝ 𝑛 \mathbb{R}^{n} blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT without specifying a norm, due to the special property that norms on ℝ n superscript ℝ 𝑛 \mathbb{R}^{n} blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT are all equivalent.

We reveal in the next proposition that point evaluation operators are continuous if and only if component-wise point evaluation functionals are continuous. To this end, for a vector-valued function f : X → ℝ n : 𝑓 → 𝑋 superscript ℝ 𝑛 f:X\to\mathbb{R}^{n} italic_f : italic_X → blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT , for each j ∈ ℕ n 𝑗 subscript ℕ 𝑛 j\in\mathbb{N}_{n} italic_j ∈ blackboard_N start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , we denote by f j : X → ℝ : subscript 𝑓 𝑗 → 𝑋 ℝ f_{j}:X\to\mathbb{R} italic_f start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT : italic_X → blackboard_R the j 𝑗 j italic_j -th component of f 𝑓 f italic_f , that is,

Proposition 3 .

We next identify a reproducing kernel for a vector-valued RKBS. We need the notion of the δ 𝛿 \delta italic_δ -dual space of a vector-valued RKBS. For a Banach space B 𝐵 B italic_B with a norm ∥ ⋅ ∥ B \|\cdot\|_{B} ∥ ⋅ ∥ start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT , we denote by B * superscript 𝐵 B^{*} italic_B start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT the dual space of B 𝐵 B italic_B , which is composed of all continuous linear functionals on B 𝐵 B italic_B endowed with the norm

Suppose that ℬ ℬ \mathcal{B} caligraphic_B is a vector-valued RKBS of functions from X 𝑋 X italic_X to ℝ n superscript ℝ 𝑛 \mathbb{R}^{n} blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT , with the dual space ℬ * superscript ℬ \mathcal{B}^{*} caligraphic_B start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT . We set

Javatpoint Logo

Machine Learning

Artificial Intelligence

Control System

Supervised Learning

Classification, miscellaneous, related tutorials.

Interview Questions

JavaTpoint

  • Send your Feedback to [email protected]

Help Others, Please Share

facebook

Learn Latest Tutorials

Splunk tutorial

Transact-SQL

Tumblr tutorial

Reinforcement Learning

R Programming tutorial

R Programming

RxJS tutorial

React Native

Python Design Patterns

Python Design Patterns

Python Pillow tutorial

Python Pillow

Python Turtle tutorial

Python Turtle

Keras tutorial

Preparation

Aptitude

Verbal Ability

Interview Questions

Company Questions

Trending Technologies

Artificial Intelligence

Cloud Computing

Hadoop tutorial

Data Science

Angular 7 Tutorial

B.Tech / MCA

DBMS tutorial

Data Structures

DAA tutorial

Operating System

Computer Network tutorial

Computer Network

Compiler Design tutorial

Compiler Design

Computer Organization and Architecture

Computer Organization

Discrete Mathematics Tutorial

Discrete Mathematics

Ethical Hacking

Ethical Hacking

Computer Graphics Tutorial

Computer Graphics

Software Engineering

Software Engineering

html tutorial

Web Technology

Cyber Security tutorial

Cyber Security

Automata Tutorial

C Programming

C++ tutorial

Data Mining

Data Warehouse Tutorial

Data Warehouse

RSS Feed

The Puzzlarium

The Puzzlarium

10 Unsolved Space Mysteries That Scientists Are Baffled By

Posted: 26 April 2024 | Last updated: 26 April 2024

<p>Space is a vast and mysterious place, filled with countless wonders that continue to captivate our imaginations. Despite the incredible advancements in technology and scientific understanding, there are still many puzzling phenomena that leave even the brightest minds scratching their heads. </p> <p>From stran6ge signals originating from deep space to the perplexing behavior of celestial bodies, these unsolved mysteries remind us just how much we have yet to learn about the universe we call home. Let’s explore ten of the most fascinating space mysteries that continue to baffle scientists and inspire a sense of awe and curiosity in all of us.</p>

Space is a vast and mysterious place, filled with countless wonders that continue to captivate our imaginations. Despite the incredible advancements in technology and scientific understanding, there are still many puzzling phenomena that leave even the brightest minds scratching their heads.

From stran6ge signals originating from deep space to the perplexing behavior of celestial bodies, these unsolved mysteries remind us just how much we have yet to learn about the universe we call home. Let’s explore ten of the most fascinating space mysteries that continue to baffle scientists and inspire a sense of awe and curiosity in all of us.

<p>In 1977, a powerful radio signal was detected by the Big Ear telescope in Ohio. The signal, which lasted for just 72 seconds, was so strong and unusual that astronomer Jerry R. Ehman circled it on the computer printout and wrote “Wow!” next to it. Despite numerous attempts to locate the signal’s source, no one has been able to explain its origin or detect it again.</p>

The Wow! Signal

In 1977, a powerful radio signal was detected by the Big Ear telescope in Ohio. The signal, which lasted for just 72 seconds, was so strong and unusual that astronomer Jerry R. Ehman circled it on the computer printout and wrote “Wow!” next to it. Despite numerous attempts to locate the signal’s source, no one has been able to explain its origin or detect it again.

<p>Scientists have discovered that our galaxy, along with thousands of others, is being pulled towards a mysterious region of space known as the Great Attractor. This invisible mass is estimated to be about 220 million light-years away and have the mass of a million billion suns. Despite its immense gravitational pull, scientists are unable to directly observe the Great Attractor due to its location behind the Milky Way’s Zone of Avoidance.</p>

The Great Attractor

Scientists have discovered that our galaxy, along with thousands of others, is being pulled towards a mysterious region of space known as the Great Attractor. This invisible mass is estimated to be about 220 million light-years away and have the mass of a million billion suns. Despite its immense gravitational pull, scientists are unable to directly observe the Great Attractor due to its location behind the Milky Way’s Zone of Avoidance.

<p>KIC 8462852, also known as Tabby’s Star, has been puzzling astronomers since its discovery. The star exhibits irregular dimming patterns that cannot be explained by conventional means, such as orbiting planets or dust clouds. Some scientists have even suggested that the dimming could be caused by alien megastructures, although this idea remains highly speculative.</p>

Tabby's Star

KIC 8462852, also known as Tabby’s Star, has been puzzling astronomers since its discovery. The star exhibits irregular dimming patterns that cannot be explained by conventional means, such as orbiting planets or dust clouds. Some scientists have even suggested that the dimming could be caused by alien megastructures, although this idea remains highly speculative.

<p>Fast Radio Bursts (FRBs) are brief, intense pulses of radio waves that originate from distant galaxies. These bursts can release as much energy in a millisecond as the Sun does in 80 years. Despite detecting numerous FRBs, scientists are still unsure about what causes them, with theories ranging from neutron star collisions to advanced alien civilizations.</p>

Fast Radio Bursts

Fast Radio Bursts (FRBs) are brief, intense pulses of radio waves that originate from distant galaxies. These bursts can release as much energy in a millisecond as the Sun does in 80 years. Despite detecting numerous FRBs, scientists are still unsure about what causes them, with theories ranging from neutron star collisions to advanced alien civilizations.

<p>The Pioneer 10 and 11 spacecraft, launched in the early 1970s, have been drifting off course in a way that defies our current understanding of physics. The spacecraft are slowing down more than expected, and scientists have been unable to determine the cause of this anomalous behavior. Theories range from the effects of dark matter to unaccounted-for thermal radiation from the spacecraft themselves.</p>

The Pioneer Anomaly

The Pioneer 10 and 11 spacecraft, launched in the early 1970s, have been drifting off course in a way that defies our current understanding of physics. The spacecraft are slowing down more than expected, and scientists have been unable to determine the cause of this anomalous behavior. Theories range from the effects of dark matter to unaccounted-for thermal radiation from the spacecraft themselves.

<p>The Fermi Paradox poses the question: if the universe is so vast and old, why haven’t we encountered any evidence of alien life? Despite the high probability of extraterrestrial civilizations existing, we have yet to detect any signs of their presence. This paradox has led to numerous hypotheses, from the idea that intelligent life is extremely rare to the possibility that advanced civilizations deliberately avoid contact.</p>

The Fermi Paradox

The Fermi Paradox poses the question: if the universe is so vast and old, why haven’t we encountered any evidence of alien life? Despite the high probability of extraterrestrial civilizations existing, we have yet to detect any signs of their presence. This paradox has led to numerous hypotheses, from the idea that intelligent life is extremely rare to the possibility that advanced civilizations deliberately avoid contact.

<p>The Cosmic Microwave Background (CMB) radiation, a remnant of the Big Bang, is expected to be evenly distributed across the sky. However, scientists have discovered an unexpected alignment of the CMB’s hot and cold spots, dubbed the “Axis of Evil.” This alignment challenges our understanding of the universe’s isotropy and could hint at new physics beyond the standard cosmological model.</p>

The Axis of Evil

The Cosmic Microwave Background (CMB) radiation, a remnant of the Big Bang, is expected to be evenly distributed across the sky. However, scientists have discovered an unexpected alignment of the CMB’s hot and cold spots, dubbed the “Axis of Evil.” This alignment challenges our understanding of the universe’s isotropy and could hint at new physics beyond the standard cosmological model.

<p>The Lithopanspermia Hypothesis suggests that life on Earth may have originated from microorganisms hitchhiking on asteroids or comets from other planets. While this idea remains speculative, the discovery of extremophile organisms capable of surviving in space has lent some credence to the theory. If true, it would imply that life is more common in the universe than we previously thought.</p>

The Lithopanspermia Hypothesis

The Lithopanspermia Hypothesis suggests that life on Earth may have originated from microorganisms hitchhiking on asteroids or comets from other planets. While this idea remains speculative, the discovery of extremophile organisms capable of surviving in space has lent some credence to the theory. If true, it would imply that life is more common in the universe than we previously thought.

<p>The CMB also contains an unusually large and cold region known as the Cold Spot. This area of space is about 1.8 billion light-years across and has a temperature that is 70 microkelvins colder than its surroundings. Scientists have proposed various explanations for the Cold Spot, including a supervoid, a cosmic texture, or even a parallel universe.</p>

The Cold Spot

The CMB also contains an unusually large and cold region known as the Cold Spot. This area of space is about 1.8 billion light-years across and has a temperature that is 70 microkelvins colder than its surroundings. Scientists have proposed various explanations for the Cold Spot, including a supervoid, a cosmic texture, or even a parallel universe.

<p>Despite making up about 85% of the total mass of the universe, dark matter does not emit, absorb, or reflect light, making it completely invisible and detectable only through its gravitational effects. Scientists observe the effects of dark matter through the way it influences the motions of galaxies and galaxy clusters, as well as its impact on cosmic microwave background radiation and the structure of the universe.</p>

Dark Matter

Despite making up about 85% of the total mass of the universe, dark matter does not emit, absorb, or reflect light, making it completely invisible and detectable only through its gravitational effects. Scientists observe the effects of dark matter through the way it influences the motions of galaxies and galaxy clusters, as well as its impact on cosmic microwave background radiation and the structure of the universe.

More for You

Sofia Boutella as Kora in Rebel Moon Part Two: The Scargiver

Netflix hit film watched more than 21,000,000 times in its first three days

myth branded into steak in frying pan

No, fillet is not the best cut and six other steak myths debunked

evolution algae two species one organism.png

Two lifeforms merge into one organism for first time in a billion years

Garda and protesters near Trudder House earlier this month. Pic: PA

Migrants 'heading for Ireland instead of UK'

Freezing temperatures trigger £25 cold weather payments – is your area covered?

Snow to hit parts of UK as temperatures predicted to fall to -3°C

Average salary of 23 degrees,

The average salary of 21 university degrees, one year after graduating

Here’s how much water you should really be drinking each day

Here’s how much water you should really be drinking each day

Red Bull Racing team principal Christian Horner (R) and chief technical officer Adrian Newey smile on the track after a team shot after the Formula One Japanese Grand Prix at the Suzuka International Racing Course in Suzuka, Japan

Red Bull fight to keep Adrian Newey amid exit talk over Christian Horner controversy

BetMGM Premier League 2024 – Night Thirteen – M&S Bank Arena

Luke Littler silences hostile crowd with Premier League Darts victory in Liverpool

How are the signs of love related to chemistry in the brain?

10 scientifically proven signs of love

The most terrifying TV monsters

The most terrifying TV monsters

Perception of when old age starts has increased over time, shows study

Perception of when old age starts has increased over time, shows study

Average US annual salary by age revealed – see how you compare

Average US annual salary by age revealed – see how you compare

Why you shouldn’t take money from your parents to buy a house

Why you shouldn’t take money from your parents to buy a house

Israeli minister in car crash

Israeli minister in car crash

Unusual Smoking Ban

Study finds another surprise consequence of smoking

Chris Packham was involved in a panel discussion about carbon emissions

BBC removes Laura Kuenssberg episode after complaint over Chris Packham comments

A graphic of the far side of the moon where the South Pole-Aitken basin is located – the dashed circle indicates where the deep anomaly is found (NASA/Goddard)

Scientists discover gigantic 'structure' under the surface of the Moon

8 best forms of exercise to lose weight

8 best forms of exercise to lose weight in 2024, according to the experts

Doctor reveals difference between what adults and children see when they die

Doctor reveals difference between what adults and children see when they die

IMAGES

  1. PPT

    hypothesis and hypothesis space

  2. PPT

    hypothesis and hypothesis space

  3. Difference Between Thesis and Hypothesis

    hypothesis and hypothesis space

  4. What is a Hypothesis

    hypothesis and hypothesis space

  5. Difference Between Hypothesis and Theory

    hypothesis and hypothesis space

  6. Machine Learning Terminologies for Beginners

    hypothesis and hypothesis space

VIDEO

  1. Concept of Hypothesis

  2. module 1:Hypothesis space (part2 )and version space

  3. 28 Version Space in Concept Learning

  4. The Multiverse Hypothesis Explained

  5. solved Example on Hypothesis space

  6. 2. Visualisation of hypothesis space

COMMENTS

  1. Hypothesis in Machine Learning

    A hypothesis is a function that best describes the target in supervised machine learning. The hypothesis that an algorithm would come up depends upon the data and also depends upon the restrictions and bias that we have imposed on the data. The Hypothesis can be calculated as: y = mx + b y =mx+b. Where, y = range. m = slope of the lines.

  2. What's a Hypothesis Space?

    Our goal is to find a model that classifies objects as positive or negative. Applying Logistic Regression, we can get the models of the form: (1) which estimate the probability that the object at hand is positive. Each such model is called a hypothesis, while the set of all the hypotheses an algorithm can learn is known as its hypothesis space ...

  3. What is a Hypothesis in Machine Learning?

    There is a tradeoff between the expressiveness of a hypothesis space and the complexity of finding a good hypothesis within that space. — Page 697, Artificial Intelligence: A Modern Approach, Second Edition, 2009. Hypothesis in Machine Learning: Candidate model that approximates a target function for mapping examples of inputs to outputs.

  4. What exactly is a hypothesis space in machine learning?

    The hypothesis space is $2^{2^4}=65536$ because for each set of features of the input space two outcomes (0 and 1) are possible. The ML algorithm helps us to find one function, sometimes also referred as hypothesis, from the relatively large hypothesis space. References. A Few Useful Things to Know About ML;

  5. Introduction to the Hypothesis Space and the Bias-Variance Tradeoff in

    The hypothesis space in machine learning is a set of all possible models that can be used to explain a data distribution given the limitations of that space. A linear hypothesis space is limited to the set of all linear models. If the data distribution follows a non-linear distribution, the linear hypothesis space might not contain a model that ...

  6. On the scope of scientific hypotheses

    Example of hypothesis space. The hypothesis scope is expressed as cuboids in three dimensions (relationship (R), variable (XY), pipeline (P)). The hypothesis space is the entire possible space within the three dimensions. Three hypotheses are shown in the hypothesis space (H 1, H 2, H 3). H 2 and H 3 are subsets of H 1.

  7. Best Guesses: Understanding The Hypothesis in Machine Learning

    In machine learning, the term 'hypothesis' can refer to two things. First, it can refer to the hypothesis space, the set of all possible training examples that could be used to predict or answer a new instance. Second, it can refer to the traditional null and alternative hypotheses from statistics. Since machine learning works so closely ...

  8. Hypothesis Space

    The hypothesis space is the set of hypotheses that can be described using this hypothesis language. Often, a learner has an implicit, built-in, hypothesis language, but in addition the set of hypotheses that can be produced can be restricted further by the user by specifying a language bias. This language bias defines a subset of the hypothesis ...

  9. Machine Learning: The Basics

    A hypothesis map reads in low level properties (referred to as features) of a data point and delivers the prediction for the label of that data point. ML methods choose or learn a hypothesis map from a (typically very) large set of candidate maps. We refer to this set as of candidate maps as the hypothesis space or model underlying an ML method.

  10. Hypothesis Space

    A learning algorithm A takes D as an input and produces a function (model, hypothesis) f ∈ H ⊂ F as an output, where H is the hypothesis space. This subset is determined by the formalism used to represent models (e.g., as logical formulas, linear functions, or non-linear functions implemented as artificial neural networks or decision trees ).

  11. PDF CS 446 Machine Learning Fall 2016 OCT 11, 2016 Computational Learning

    of this hypothesis space. In this case, the hypothesis space is given by 2(2n)k, corresponding to the number of ways to choose subsets from among the kliterals, including negations. Thus, the sample complexity is given by ln(jk CNFj) = O(nk) Since kis xed, we have an order polynomial in the number of examples and thus his guaranteed to be PAC ...

  12. Hypothesis Space

    This benefit is especially evident as the hypothesis space increases. Particularly, in the seven-object task (the category of mapping problems with the largest hypothesis space), the MbD algorithm correctly solves significantly more problems within the first 1-4 assists than either the hypothesis pruning or random mapping baselines.

  13. machine learning

    A hypothesis space/class is the set of functions that the learning algorithm considers when picking one function to minimize some risk/loss functional.. The capacity of a hypothesis space is a number or bound that quantifies the size (or richness) of the hypothesis space, i.e. the number (and type) of functions that can be represented by the hypothesis space.

  14. Machine Learning 1.1: Hypothesis Spaces

    This video introduces the concept of a hypothesis space which is a restricted set of predictor functions that can be computed and manipulated efficiently giv...

  15. Searching the hypothesis space (Chapter 6)

    In Chapter 5 we introduced the main notions of machine learning, with particular regard to hypothesis and data representation, and we saw that concept learning can be formulated in terms of a search problem in the hypothesis space H.As H is in general very large, or even infinite, well-designed strategies are required in order to perform efficiently the search for good hypotheses.

  16. Hypothesis Spaces for Deep Learning

    The hypothesis space and the representer theorems for the two deep learning models in it provide us prosperous insights of deep learning and supply deep learning a sound mathematical foundation for further investigation. We organize this paper in six sections. We describe in Section 2 an innate deep learning model with DNNs.

  17. Hypothesis in Machine Learning

    Hypothesis space (H) is the composition of all legal best possible ways to divide the coordinate plane so that it best maps input to proper output. Further, each individual best possible way is called a hypothesis (h). Hence, the hypothesis and hypothesis space would be like this: Hypothesis in Statistics

  18. PDF Computational Learning Theory

    • Complexity of hypothesis space is measured • not by no. of distinct hypotheses |H| • but by no. of distinct instances from X that can be completely discriminated using H • Definition: • A set of instances S is shattered by hypothesis space H if and only if for every dichotomy of S there exists some hypothesis in H consistent with ...

  19. Hypothesis Space

    The hypothesis space is a mathematical construct within which a solution is sought. But this space of possible solutions may be highly constrained by the linear functions in classical statistical analysis and machine learning techniques. Complex problems in the real world may require much more expressive hypothesis spaces than can be provided ...

  20. What is the hypothesis space used by this AND gate Perceptron?

    The hypothesis space used by a machine learning system is the set of all hypotheses that might possibly be returned by it. Per this post, the Perceptron algorithm makes prediction $$ \begin{equation} \hat y = \begin{cases} 1 & wx+b >= 0\\ 0 & wx+b<0 \end{cases} \end{equation} $$ ...

  21. Hypothesis spaces for learning

    The hypothesis space H is said to be algorithmically optimal for a criterion I if given any learner M using a hypothesis space H ′ , one can algorithmically find a learner M ′ using H as a hypothesis space (for the class of languages which was I-learnt byM usingH ′ as a hypothesis space). S. Jain / Information and Computation ...

  22. hypothesis space

    Apr 29, 2020 at 16:30. @MichaelHardy A hypothesis space refers to the set of possible approximations that algorithm can create for f. The hypothesis space consists of the set of functions the model is limited to learn. For instance, linear regression can be limited to linear functions as its hypothesis space. - funmath.

  23. Unit 1 || Lec 5 : Hypothesis Space

    Contents - Concept Learning, Hypothesis Space, Inductive Learning Hypothesis

  24. 10 Unsolved Space Mysteries That Scientists Are Baffled By

    Space is a vast and mysterious place, filled with countless wonders that continue to captivate our imaginations. ... The Lithopanspermia Hypothesis suggests that life on Earth may have originated ...

  25. Dark forest hypothesis

    The dark forest hypothesis is the conjecture that many alien civilizations exist throughout the universe, but they are both silent and hostile, maintaining their undetectability for fear of being destroyed by another hostile and undetected civilization. It is one of many possible explanations of the Fermi paradox, which contrasts the lack of contact with alien life with the potential for such ...