Play with a live Neptune project -> Take a tour 📈

Transfer Learning Guide: A Practical Tutorial With Examples for Images and Text in Keras

It can take weeks to train a neural network on large datasets. Luckily, this time can be shortened thanks to model weights from pre-trained models – in other words, applying transfer learning .

Transfer learning is a technique that works in image classification tasks and natural language processing tasks. In this article, you’ll dive into:

  • what transfer learning is,
  • how to implement transfer learning (in Keras),
  • transfer learning for image classification,
  • transfer learning for natural language processing

Well then, let’s start learning! (no pun intended… ok, maybe a little) 

What is transfer learning?

Transfer learning is about leveraging feature representations from a pre-trained model , so you don’t have to train a new model from scratch. 

The pre-trained models are usually trained on massive datasets that are a standard benchmark in the computer vision frontier. The weights obtained from the models can be reused in other computer vision tasks. 

These models can be used directly in making predictions on new tasks or integrated into the process of training a new model. Including the pre-trained models in a new model leads to lower training time and lower generalization error.  

Transfer learning is particularly very useful when you have a small training dataset. In this case, you can, for example, use the weights from the pre-trained models to initialize the weights of the new model. As you will see later, transfer learning can also be applied to natural language processing problems. 

Transfer learning idea

The advantage of pre-trained models is that they are generic enough for use in other real-world applications. For example:

  • models trained on the ImageNet can be used in real-world image classification problems. This is because the dataset contains over 1000 classes. Let’s say you are an insect researcher. You can use these models and fine-tune them to classify insects. 
  • classifying text requires knowledge of word representations in some vector space. You can train vector representations yourself. The challenge here is that you might not have enough data to train the embeddings. Furthermore, training will take a long time. In this case, you can use a pre-trained word embedding like GloVe to hasten your development process.  

You will explore these use cases in a moment.

What is the difference between transfer learning and fine-tuning?

Fine-tuning is an optional step in transfer learning. Fine-tuning will usually improve the performance of the model. However, since you have to retrain the entire model, you’ll likely overfit. 

case study of transfer learning

Overfitting is avoidable. Just retrain the model or part of it using a low learning rate . This is important because it prevents significant updates to the gradient. These updates result in poor performance. Using a callback to stop the training process when the model has stopped improving is also helpful. 

Why use transfer learning?

Assuming you have 100 images of cats and 100 dogs and want to build a model to classify the images. How would you train a model using this small dataset? You can train your model from scratch, but it will most likely overfit horribly. Enter transfer learning. Generally speaking, there are two big reasons why you want to use transfer learning:

  • training models with high accuracy requires a lot of data . For example, the ImageNet dataset contains over 1 million images. In the real world, you are unlikely to have such a large dataset. 
  • assuming that you had that kind of dataset, you might still not have the resources required to train a model on such a large dataset. Hence transfer learning makes a lot of sense if you don’t have the compute resources needed to train models on huge datasets. 
  • even if you had the compute resources at your disposal, you still have to wait for days or weeks to train such a model . Therefore using a pre-trained model will save you precious time. 

When does transfer learning not work?

Transfer learning will not work when the high-level features learned by the bottom layers are not sufficient to differentiate the classes in your problem. For example, a pre-trained model may be very good at identifying a door but not whether a door is closed or open. In this case, you can use the low-level features (of the pre-trained network) instead of the high-level features. In this case, you will have to retrain more layers of the model or use features from earlier layers. 

When datasets are not similar, features transfer poorly. This paper investigates the similarity of datasets in more detail. That said, as shown in the paper, initializing the network with pre-trained weights results in better performance than using random weights. 

You might find yourself in a situation where you consider the removal of some layers from the pre-trained model. Transfer learning is unlikely to work in such an event. This is because removing layers reduces the number of trainable parameters, which can result in overfitting.  Furthermore, determining the correct number of layers to remove without overfitting is a cumbersome and time-consuming process. 

Transfer learning image

How to implement transfer learning?

Let’s now take a moment and look at how you can implement transfer learning. 

Transfer learning in 6 steps

You can implement transfer learning in these six general steps. 

Transfer learning steps

Obtain the pre-trained model

The first step is to get the pre-trained model that you would like to use for your problem. The various sources of pre-trained models are covered in a separate section. 

Create a base model

Usually, the first step is to instantiate the base mode l using one of the architectures such as ResNet or Xception. You can also optionally download the pre-trained weights . If you don’t download the weights, you will have to use the architecture to train your model from scratch. Recall that the base model will usually have more units in the final output layer than you require. When creating the base model, you, therefore, have to remove the final output layer. Later on, you will add a final output layer that is compatible with your problem. 

Transfer learning base model

Freeze layers so they don’t change during training

Freezing the layers from the pre-trained model is vital. This is because you don’t want the weights in those layers to be re-initialized . If they are, then you will lose all the learning that has already taken place. This will be no different from training the model from scratch. 

Fine tuning pretrained network

Add new trainable layers 

The next step is to add new trainable layers that will turn old features into predictions on the new dataset. This is important because the pre-trained model is loaded without the final output layer. 

New trainable layers

Train the new layers on the dataset

Remember that the pre-trained model’s final output will most likely be different from the output that you want for your model. For example, pre-trained models trained on the ImageNet dataset will output 1000 classes. However, your model might just have two classes. In this case, you have to train the model with a new output layer in place. 

Therefore, you will add some new dense layers as you please, but most importantly, a final dense layer with units corresponding to the number of outputs expected by your model . 

Improve the model via fine-tuning

Once you have done the previous step, you will have a model that can make predictions on your dataset. Optionally, you can improve its performance through fine-tuning . Fine-tuning is done by unfreezing the base model or part of it and training the entire model again on the whole dataset at a very low learning rate. The low learning rate will increase the performance of the model on the new dataset while preventing overfitting. 

The learning rate has to be low because the model is quite large while the dataset is small. This is a recipe for overfitting, hence the low learning rate. Recompile the model once you have made these changes so that they can take effect. This is because the behavior of a model is frozen whenever you call the compile function. That means that you have to call the compile function again whenever you want to change the model’s behavior. The next step will be to train the model again while monitoring it via callbacks to ensure it does not overfit. 

Freeze layers

Pretty straightforward, eh?

Where to find pre-trained models?

Let’s now talk about where you can find pre-trained models to use in your applications. 

Keras pre-trained models

There are more than two dozen pre-trained models available from Keras. They’re served via Keras applications . You get pre-trained weights alongside each model. When you download a model, the weights are downloaded automatically. They will be stored in `~/.keras/models/.` All the Keras applications are used for image tasks. For instance, here is how you can initialize the MobileNet architecture trained on ImageNet. 

Transfer learning using TensorFlow Hub

It’s worth mentioning that Keras applications are not your only option for transfer learning tasks. You can also use models from TensorFlow Hub .

See how you can track Keras model traning with Neptune’s integration with TensorFlow/Keras

Pretrained word embeddings

Word embeddings are usually used for text classification problems. In as much as you can train your word embeddings, using a pre-trained one is much quicker. Here are a couple of word embeddings that you can consider for your natural language processing problems:

  • GloVe(Global Vectors for Word Representation) by Stanford
  • Google’s Word2vec trained on around 1000 billion words from Google News
  • Fasttext English vectors 

Training, Visualizing, and Understanding Word Embeddings: Deep Dive Into Custom Datasets

Hugging face

Hugging Face provides thousands of pre-trained models for performing tasks on texts. Some of the supported functions include:

  • question answering 
  • summarization 
  • translation and 
  • text generation, to mention a few

Over 100 languages are supported by Hugging face. 

Here’s an example of how you can use Hugging face to classify negative and positive sentences. 

How you can use pre-trained models

There are three ways to use a pre-trained model:

  • prediction ,
  • feature extraction ,
  • fine-tuning .

Here, you download the model and immediately use it to classify new images. Here is an example of ResNet50 used to classify ImageNet classes. 

ImageNet is an extensive collection of images that have been used to train models, including ResNet50. There are over 1 million images and 1000 classes in this dataset.

Feature extraction

In this case, the output of the layer before the final layer is fed as input to a new model. The goal is to use the pre-trained model, or a part of it, to pre-process images and get essential features. 

Then, you pass these features to a new classifier—no need to retrain the base model. The pre-trained convolutional neural network already has features that are important to the task at hand. 

Feature extraction

However, the pre-trained model’s final part doesn’t transfer over because it’s specific to its dataset. So, you have to build the last part of your model to fit your dataset.

In the natural language processing realm, pre-trained word embedding can be used for feature extraction. The word embeddings help to place words in their right position in a vector space. They provide relevant information to a model because they can contextualize words in a sentence. The main objective of word embeddings is semantic understanding and the relationship between words. As a result, these word embeddings are task agnostic for natural language problems. 

Fine-tuning

When your new classifier is ready, you can use fine-tuning to improve its accuracy . To do this, you unfreeze the classifier , or part of it, and retrain it on new data with a low learning rate. Fine-tuning is critical if you want to make feature representations from the base model (obtained from the pre-trained model) more relevant to your specific task. 

You can also use weights from the pre-trained model to initialize weights in a new model. The best choice here depends on your problem, and you might need to experiment a bit before you get it right. 

Still, there is a standard workflow you can use to apply transfer learning. 

Let’s check it out. 

Example of transfer learning for images with Keras 

With that background in place, let’s look at how you can use pre-trained models to solve image and text problems. Whereas there are many steps involved in training a model, the focus will be on those six steps specific to transfer learning. 

CHECK LATER

Neptune’s Integration With Keras

Transfer learning with image data

In this illustration, let’s take a look at how you can use a pre-trained model to build and fine-tune an image classifier. Let’s assume that you are a pet lover and you would like to create a machine learning model to classify your favorite pets; cats and dogs. Unfortunately, you don’t have enough data to do this. Fortunately, you are familiar with Kaggle and can get a small dataset. With that in place, you can now select a pre-trained model to use. Once you have chosen your pre-trained model , you can start training the model with Keras. To illustrate, let’s use the Xception architecture , trained on the ImageNet dataset.

If you’re coding along, follow this section step-by-step to apply transfer learning properly.

Getting the dataset

I recommend using Google Colab because you get free GPU computing. 

First, download the dataset into Colab’s virtual machine. 

After that, unzip the dataset and set the path to the training and validation set. 

Loading the dataset from a directory

Let’s now load the images from their location. The `image_dataset_from_directory` function can be used because it can infer class labels.

The function will create a ` tf.data.Dataset ` from the directory. Note that for this to work, the directory structure should look like this:

case study of transfer learning

Import the required modules and load the training and validation set. 

Data pre-processing

Whereas data pre-processing isn’t a specific step in transfer learning, it is an important step in training machine learning models in general. Let’s, therefore, apply some augmentation to the images. When you apply augmentation to a training set, you’re preventing overfitting, because augmentation exposes different aspects of the image to the model. 

You especially want to augment the data when there’s not a lot of data for training. You can augment it using various transformations, like:

  • random rotations,
  • horizontal flipping,

You can apply these transformations when loading the data. Alternatively, as you can see below, you can augment by introducing unique layers. 

These layers will only be applied during the training process.

You can see the result of the above transformations by applying the layers to the same image. Here’s the code:

And here’s how the result would look like (since the images are shuffled, you might get a different result): 

Transfer learning dogs 2

Create a base model from the pre-trained Xception model

Let’s load the model with the weights trained on ImageNet . When that’s done, the desired input shape is defined. 

`include_top=False` means that you’re not interested in the last layer of the model. Since models are visualized from bottom to top, that layer is referred to as the top layer. Excluding the top layers is important for feature extraction . 

Next, freeze the base model layers so that they’re not updated during the training process. 

Since many pre-trained models have a `tf.keras.layers.BatchNormalization` layer, it’s important to freeze those layers. Otherwise, the layer mean and variance will be updated, which will destroy what the model has already learned. Let’s freeze all the layers in this case.

Create the final dense layer

When loading the model, you used `include_top=False` meaning that the final dense layer of the pre-trained model wasn’t included. Now it’s time to define a final output layer for this model . 

Let’s start by standardizing the size of the input images.

After this, apply the data augmentation. 

This model expects data in the range of (-1,1) and not (0,1). So, you have to process the data. 

Luckily, most pre-trained models provide a function for doing that. 

Let’s now define the model as follows:

  • ensure that the base model is running in inference mode so that batch normalization layers are not updated during the fine-tuning stage (set `training=False`);
  • convert features from the base model to vectors , using `GlobalAveragePooling2D`;
  • apply dropout regularization;
  • add a final dense laye r (when you used `include_top=False,` the final output layer was not included, so you have to define your own).

Train the model

You can now train the top layer . Notice that since you’re using a pretrained model, validation accuracy starts at an already high value. 

Transfer learning epoch

Fine-tuning the model

The model can be improved by unfreezing the base model, and retraining it on a very low learning rate. 

You need to monitor this step because the wrong implementation can lead to overfitting. First, unfreeze the base model. 

After updating the trainable attribute, the model has to be compiled again to implement the change.

To prevent overfitting, let’s monitor training loss via a callback. Keras will stop training when the model doesn’t improve for five consecutive epochs. Let’s also use TensorBoard to monitor loss and accuracy. 

How to Make Your TensorBoard Projects Easy to Share and Collaborate On Deep Dive Into TensorBoard: Tutorial With Examples

OK, time to retrain the model . When it’s finished, you’ll notice a slight improvement from the previous model.

At this point, you have a working model for the cats and dogs classification dataset. 

If you were tracking this using an experimentation platform, you can now save the model and send it to your model registry. 

Example of transfer learning with natural language processing

In the natural language processing realm, you can use pre-trained word embeddings to solve text classification problems . Let’s take an example. 

A word embedding is a dense vector that represents a document. In the vector, words with similar meanings appear closer together. You can use the embedding layer in Keras to learn the word embeddings. Training word embeddings takes a lot of time, especially on large datasets, so let’s use word embeddings that have already been trained. 

A couple of popular pre-trained word embeddings are Word2vec and GloVe .

Word embeddings visualization

Let’s walk through a complete example using GloVe word embeddings in transfer learning. 

Loading the dataset

A sentiment analysis dataset will be used for this illustration. Before loading it, let’s import all the modules that are needed for this task. 

Next, download the dataset and load it in using Pandas.

Text sentiment

The goal is to predict the sentiment column above. Since this is text data, it has to be converted into numerical form because that’s what the deep learning model expects. 

Select the features, and the target then split the data into a training and testing set. 

Data Pre-processing

Since this is text data, it has to be processed to make it ready for the models. This is not specific to transfer learning in text classification, but to machine learning models in general. 

Tokenizing the words

To convert sentences into numerical representations, use `Tokenizer`. Tokenizer removes punctuation marks and special characters and converts the sentence to lowercase. 

Just create an instance of `tokenizer` and fit it to the training set. You have to define the size of vocabulary you want. An out-of-word token is also defined to represent words in the testing set that won’t be found in the vocabulary. 

You can use the word index to see how words are mapped to numbers.

Word index

Let’s convert the words to sequences so that a complete sequence of numbers can represent every sentence. This is done using `texts_to_sequences` from the tokenizer.

train sequences

Since the sentences have different lengths, the sequences will also have different lengths. But, the sequences need to have an equal length for the machine learning model. This can be achieved by truncating longer sentences and padding shorter ones with zeros. 

Using `post` for padding will add the zeros at the end of the sequences. `post` for the truncation type will truncate sentences longer than 100 at the end. 

train padded

Using GloVe Embeddings

Now, this is specific to transfer learning in natural language processing . First, let’s download the pre-trained word embeddings. 

Next, extract them into a temporary folder.

Now, use these word embeddings to create your own embedding layer . Load the Glove embeddings, and append them to a dictionary. 

Use this dictionary to create an embedding matrix for each word in the training set. To do this, get the embedding vector for each word using `embedding_index`.

In case a word isn’t found, zero will represent it. For example, here is the embedding vector for the word bakery.

embeddings index

Create the embedding layer

At this point, you can create the embedding layer. Here are a couple of things to note:

  • setting `trainable` to false is crucial because you want to make sure that this layer isn’t re-trained;
  • weights are set to the embedding matrix you just created ;
  • `len(word_index) + 1` is the size of the vocabulary with one added because zero is reserved for padding;
  • `input_length` is the length of input sequences.

Create the model 

You can now create the model using this embedding layer. Bidirectional LSTMs are used to ensure that information is passed backward and forward. 

Training the model 

You can now compile and train the model. 

The early stopping callback can be used to stop the training process when the model training stops improving. You can monitor model loss and accuracy using the TensorBoard callback. 

The performance of the model can be e using the `evaluate` function.

Nice! You have trained and tested a natural language processing model using pre-trained word embeddings. 

That’s all, folks!

In this article, you explored transfer learning, with examples of how to use it to develop models faster. You used pre-trained models in image classification and natural language processing tasks. I hope you enjoyed it, thank you for reading!

If you want to read more about Transfer Learning feel free to check other sources:

  • https://keras.io/guides/transfer_learning/
  • https://builtin.com/data-science/transfer-learning
  • https://towardsdatascience.com/a-comprehensive-hands-on-guide-to-transfer-learning-with-real-world-applications-in-deep-learning-212bf3b2f27a
  • https://www.tensorflow.org/tutorials/images/transfer_learning
  • https://machinelearningmastery.com/transfer-learning-for-deep-learning/
  • https://machinelearningmastery.com/how-to-use-transfer-learning-when-developing-convolutional-neural-network-models/
  • https://towardsdatascience.com/transfer-learning-from-pre-trained-models-f2393f124751
  • https://www.researchgate.net/post/What-is-the-difference-between-Transfer-Learning-vs-Fine-Tuning-vs-Learning-from-scratch
  • https://arxiv.org/pdf/1411.1792.pdf

Was the article useful?

More about transfer learning guide: a practical tutorial with examples for images and text in keras, check out our product resources and related articles below:, customizing llm output: post-processing techniques, deep learning optimization algorithms, track and visualize information from your pipelines: neptune.ai + zenml integration, product updates september ’23: scatter plots, airflow integration, and more, explore more content topics:, manage your model metadata in a single place.

Join 50,000+ ML Engineers & Data Scientists using Neptune to easily log, compare, register, and share ML metadata.

  • Survey paper
  • Open access
  • Published: 22 October 2022

Transfer learning: a friendly introduction

  • Asmaul Hosna 1   na1 ,
  • Ethel Merry 1   na1 ,
  • Jigmey Gyalmo 1   na1 ,
  • Zulfikar Alom 1 ,
  • Zeyar Aung 2 &
  • Mohammad Abdul Azim   ORCID: orcid.org/0000-0001-5529-9482 1  

Journal of Big Data volume  9 , Article number:  102 ( 2022 ) Cite this article

14k Accesses

51 Citations

1 Altmetric

Metrics details

Infinite numbers of real-world applications use Machine Learning (ML) techniques to develop potentially the best data available for the users. Transfer learning (TL), one of the categories under ML, has received much attention from the research communities in the past few years. Traditional ML algorithms perform under the assumption that a model uses limited data distribution to train and test samples. These conventional methods predict target tasks undemanding and are applied to small data distribution. However, this issue conceivably is resolved using TL. TL is acknowledged for its connectivity among the additional testing and training samples resulting in faster output with efficient results. This paper contributes to the domain and scope of TL, citing situational use based on their periods and a few of its applications. The paper provides an in-depth focus on the techniques; Inductive TL, Transductive TL, Unsupervised TL, which consists of sample selection, and domain adaptation, followed by contributions and future directions.

Introduction

People can hardly afford the luxury of investing resources in data gathering in today’s world since they are rare, inaccessible, often expensive, and difficult to compile. As a result, most people found a better means of data collection: one of the ways is to transfer knowledge between the tasks [ 1 ]. This philosophy has inspired Transfer Learning(TL): to improve data gathering and learn in machine learning (ML) using the data compiled before it has been introduced. Most of the algorithms of ML are to predict future outcomes, which are traditionally in the interest of addressing tasks in isolation [ 2 ]. Whereas TL does the otherwise, it bridges the data from the source and targets the task to find a solution, perhaps a better one.

TL aims to improve understanding of the current task by relating it to other tasks performed at different periods but through a related source domain. Figure  1 explains the improvement brought by using the TL strategy in ML. It enhances learning by creating a relation between previous tasks and the target task, providing logical, faster, and better solutions. TL attempts to provide an efficient manner of learning and communication between the source task and the target task, making learning debatable [ 3 ].In addition, TL is most applicable when there is a limited supply of target training data. The strategic use of TL is that not only among the performed(ing) task itself but somewhat beyond and across other tasks [ 4 ]. However, the relationship between source and target task is sometimes not compatible. If the user transfers the testing and training samples, it decreases the target task’s performance; such a situation is a negative transfer and vice versa.

figure 1

Traditional/Classical ML vs. TL [ 3 ]

This paper introduces the traditional approach to TL, improvements in the modern approach, techniques, applications of TL, data gathering, challenges, and the future scope of TL. Although TL is used in numerous areas with its varieties, this paper focuses on a few in-depth areas to provide brief insights and appreciation. The remainder of this paper is organized as follows. " Related work " section provides background information about the TL, definitions, and notations. " Techniques of TL " section describes the three settings of TL strategies: Inductive learning—case studies on multi-task learning and self-learning, Transductive TL, Unsupervised TL—sample selection, its applications, and domain adaptation in TL; " Domain adaptation " section describes numerous TL applications in different domains. " Contributions of TL " section addresses some of the contributions made by TL in medical and related fields. " Future directions of TL " section provides the future directions of TL techniques and conclusions, respectively.

Related work

To date, the disciplines of traditional ML and data mining have been extensively applied in many areas, such as retrieving patterns from existing records obtained from labeled or unlabeled data sets, for instance, training data, to predict future occurrences [ 5 ]. Traditional ML uses training and testing data methods with similar data distribution and input featured. Following the difference generated in the distribution data between the training and the testing set, the outcome or the prediction can either be deteriorated or improved [ 6 ]. In some cases, acquiring training data that fits the testing data’s input feature set as well as the anticipated outcome of distribution data features can be quite challenging and very costly [ 2 ]. As a result, a top-level learner is required for any target domain, which has previously learned and improved from a related field. This innovation drives the backbone of how TL is being adopted today.

TL focuses on wide domains, tasks, and patterns in both training and testing datasets [ 3 ]. Multiple instances of TL can be seen in the real world, such as the ability to distinguish between objects like cars and bikes. Another real-life example can be two individuals learning how to ride a bike. Assuming that one person has no prior biking experience, while the other person has some practice of riding a bicycle. In that situation, the person with the bicycle background will be able to learn to ride a bike comparably faster than the other person since his prior understanding of riding a bicycle will aid him in learning the task of riding a bike effectively. Likewise, TL operates on the premise of storing information from a previously learned task and applying it to a new one. The idea of TL is driven by the fact that humans can effectively relate previously acquired skills in solving contemporary challenges quicker and more accurately [ 3 ].

Since 1995, a variety of terms have been used to describe TL studies, some of which include “learning to learn,” “transfer of knowledge,” “multi-task learning,” “inductive transfer,” “knowledge integration,” “knowledge-based inductive bias learning,” “supervised learning,” “meta-learning,” and “semi-supervised learning” [ 3 , 7 ]. Among such, the multi-task learning model is seen to have a strong learning strategy that is similar to TL because both learning models strive to learn multiple tasks at the same time, despite the fact that they are different [ 8 ]. A detailed and insightful approach to multi-task learning is explored under the TL technique in the latter part of this study. Figure  1 depicts the differences between traditional ML and modern TL strategies in effective learning. As shown, classical ML only attempts to learn from the root/scratch, whereas TL aims to transfer information from the primary tasks to a new task with a top-quality training data set.

Additionally, as mentioned above, TL is required when the target training set is scarce. This can happen due to data problems like the rare, costly to acquire and evaluate, or even unavailability of data. However, TL strategies have become more appealing as large-scale data sources grow increasingly accessible. Utilizing available datasets that are somehow related but not the same makes this learning approach such a viable method [ 2 ]. Some ML applications where TL has proved to be successful and discussed in this paper are TL in a real-world simulation, sentiment classification, gaming, image classification, and zero-shot translation.

Overall the paper considers providing a generic appreciation to TL, a technique of ML to enhance the performance between the training dataset through the acknowledgment of the trained dataset. Unlike other articles, this paper brings in the background, ongoing and future scope of TL, emphasizing multi-task learning, sample selection, and domain adaptation.

Definitions of TL

According to Matt, he defines TL, a category under ML is when the reuse of pre-existing models to solve current challenges. He also acknowledges that TL is a technique employed to train models together. The concepts of pre-existing training data are utilized to enhance the performance of the ongoing challenge, so the solution need not have to be developed from scratch [ 9 ]. Similarly, Daipanja also aligns with the above definition of TL. He further uses the comparison between the traditional ML approach where the data were isolated based on specific tasks, and each challenge was developed from scratch, with limited knowledge to acknowledge one another. Now, however, the TL, the acknowledgment of previous data; trained models for the current training models have been comparatively enhanced and emphasized [ 10 ]. An article by Yoshua et al. defines TL as the technique that trains current models with trained models of previous similar related tasks. The explanations are wide and varieties of explanations are provided; however, most of them align with each definition. Lastly, Jason writes, TL is an optimization tool that escalates the performance of modeling second task [ 11 ].

Relationship between ML and TL

The relationship between TL and ML can be understood when TL improves developing models through the pre-trained models and improves their efficiency. A few of the benefits of TL includes as follows: starting every task or current challenge by building new model to train and test; scratch, Improves the efficiency of ML techniques and models progression Relation between the dataset could be understood from different points of view rather than in isolated terms. Models could be trained based on the required simulations rather than the natural world environments. In times when the resources are limited and the observations of the models are required, TL is one of the tools that help in learning and generating more accurate results so the assigned target domain functions [ 9 ].

Techniques of TL

This section covers the techniques used in TL’s core approaches to three major research questions: What, How and When to transfer.

The question of “what to transfer” entails determining which aspects of information or knowledge will be shared or transferred between domains or tasks. Sometimes, some information may be specific to certain domains, while others are exchanged across common domains, which may increase performance in the target domain.

Following the discovery of which information may be transmitted, learning algorithms to transmit the information must be devised, which now correlates to the questions of “How to transfer.”

“When to transfer” question when it is appropriate to transfer the available information. In some cases, when the source and target domains are unrelated, the transfer may fail forcefully. Alternatively, under a worst-case scenario, it may even harm the learning performance of the target domain, a circumstance termed as “negative transfer.” For this reason, avoiding negative transfer is a critical unanswered question till today.

Based on distinct conditions between the source domain, target domain, and the tasks, there are three sub-settings of TL strategies, categorized as: inductive TL, transductive TL and unsupervised TL (see Table  1 ).

Inductive TL

In this case, the target task differs from the source task, despite the source and target domains is similar. With traditional learning, the focus is usually on the target domain or tasks; however, in multi-learning or multi-tasks learning (a subset of Inductive TL), the goal is to excel at every task available [ 3 ]. Inductive TL is further classified into two cases based on its labeled or unlabeled data source:

Case 1—Multi-task learning

Here, the source and the target domain are the same, and a ton of labeled information in the domain source is accessible. For this situation, the ’inductive TL setting’ is similar to ‘multi-task learning’ since the source and target are the same. Notwithstanding, the inductive learning setting only targets high performance in the objective task by transferring information from the primary source. Multi-learning attempts to gain proficiency with the aim and source the job simultaneously.

Case 2—Self-taught learning

Here, the source and target domain are different but somehow related. No labeled information in the source area is accessible. For this situation, the inductive TL is similar to a self-taught learning setting, which implies that the spaces between the source and target areas materialize to be unique and somehow related to inductive TL (first reported by Raina et al [ 12 ]). Note that here the labeled information in the source area is inaccessible [ 3 ]

Transductive TL

Here, both the tasks (source and target) are the identical in this case. However, the domains are distinct. As shown in Table  1 , no labeled data is available in this target domain, although amany labeled data areavailable in the source domain.

Unsupervised TL

In this TL scenario, the target and the source task are different but somehow related, similar to the inductive TL. Unsupervised TL, on the other hand, focuses more on completing unsupervised tasks, such as clustering and dimension reduction [ 13 , 14 ]. In this situation, both the domains, i.e., source and target, have no labeled data.

  • Sample selection

The sample selection in TL is one of the most critical areas of the building model workflow. It is where selecting variables, and source tasks take place concerning the target task’s requirements. There are primarily two factors to begin the sample selection. Considering these requirements, Firstly, the common sense, the user of sample selection should have an intuition that there is a relation between metrics of source variables that matches or is similar to the target task. The metrics that add value and quantify the target task are then chosen to begin with problem-solving tasks.

Second, to take caution, although there might be many promising source variables and domains available in the source task, the user should be aware of what metrics target task values. While selecting the most relatable and efficient data for the target task, an important issue is not adding too many parameters from the source task that will eventually cause overfitting. However, when overfitting occurs and sample selection negatively affects the target task, a penalty over data is introduced. Penalty criteria narrow the parameters to incorporate only the most helpful information target required from the source task [ 3 ].

Sample selection bias

Sample Selection Bias has been acknowledged as one of the most complicated issues in practical application. The future, and the current training data d differ constraints and distribution [ 15 , 16 ] . Thus, causing the side effects of sample selection [ 17 ]. In some cases, small sampled groups duplicate or create a pattern for bigger groups of the sample, eventually leading the datasets to suffer from sample selection bias. Nevertheless, some measures have been applied to reduce sample selection bias, especially between the target and source task. Figure  2 demonstrates the changes bought by sample selection in TL.

figure 2

Changes brought by sample selection in TL

It can be expressed that many ML algorithms use almost the same training data to test data which will soon be used to make predictions on the training data. However, it fails to recognize that the practical applications use data that are sometimes different from one another, creating variations between the testing and training data. Conventionally, following the distribution of Q ( x ,  y ), the datasets are trained and similarly, the distribution \(P(x, y) = Q(x, y)\) is employed on a dataset for testing. In the last few years, strategies to improve the sample selection area have been constantly enforced and several articles have been published focusing on this matter. Note that a compact overview of the TL strategies and their settings are given in Table  1 .

Regardless of interventions in this area, sample selection has been susceptible to bias, including the choices that inappropriately select the control groups in the case–control studies, bias in loss-to-follow-up cases, and others. Like the other TL features, sample selection has received massive attention from the communities such as ML, statistics, economics, bioinformatics, epidemiology, medicine, and many others. Sample selections use source data to build to predict prediction and alter the source data. Such action crosses data limits to the broader range and beyond a single data distribution giving the user higher scope to build efficient models. It is one of the TL areas that has received much attention from ML and research groups in the past few years.

Brief analysis of sample selection; Kernel mean matching algorithm

Estimation models \(\beta (x)\) such as kernel density estimation, a naive approach, are used to measure the training sample and minimize selection bias from the external data [ 18 ]. Regardless of using them in the label data distribution, some models have been inferior or less effective. Being less effective includes not estimating data with high density or data with heavy information. Also, if the model makes a small estimation error, in that case, especially when the testing \(P(x|s = 1)\) and training P ( x ) models have small values, it disturbs the whole performance of the data causing relatively worse performance to the target task compared to source task. This incident was noticed in several cases of performance evaluation. Estimating \(\beta (x)\) directly was considered more logical than evaluating these models while working with the target task having huge data density and less training and testing samples (Fig. 3 ).

figure 3

Symbols abbreviations [ 16 ]

Gradual improvements are made in the estimation models \(\beta (x)\) present in the covariate shift; an algorithm is suggested known as kernel mean matching (KMM) and unconstrained least-square importance fitting (uLSIF) [ 1 ]. However, only the KMM model will be elaborated in this paper although, we acknowledge the unconstrained least-square importance fitting (uLSIF) and its uses.

Both of them are better measures and versions of the kernel density estimation. The KMM model considers the classical statistics perspective, denoted by \(P^{\theta (y|x)}\) . It is a parametric model that organizes the label data distribution, mainly for the logistic regression models, and applies to other models. It estimates the prediction loss from the source data to the target data and reduces its overfitting. A form of penalty criteria is to narrow the parameters to incorporate only the most helpful information target required from the source task.

The KMM also contradicts the primary assumption of ML, which stated that the testing and training of data comes from one data distribution. The model relates the testing \({(x_i , y_i )}n_i=1 \sim Pte(x) = P(x, y)\) and training \({(x_i , y_i )}n_i=1 \sim Ptr(x) = P(x, y|s = 1)\) samples from multiple sources and ultimately predicts how X (variables) equals to the Y (output).

Data sampling falls under transductive TL, which helps to learn an optimal model workflow for the target domain and task by minimizing any expected risk. Concepts of empirical risk minimization (ERM) are some of the measures that help stimulate data and its risk towards the target problem. Thus, optimal parameters ( \(\theta ^{*}\) ) such as; \(\theta ^{*} = arg min \ \theta ^{*}\in \Theta (x,y) \in P [l(x, y, \theta )]\) , where \(l(x, y, \theta )\) defines the loss function of parameters. Similarly, since estimating the probability distribution of data is difficult, the ERM concept is then utilized. To train the optimal workflow; \(\theta ^{*} = arg min \ \theta \in \Theta 1 n, n_{i=1} [l(x_i, y_i, \theta )]\) , where n defines the size of training data [ 19 ].

The above models for workflow are used in source data selection, but the target domain does not remain ideal. Optimal parameters such as; \(\theta ^{*} = arg min \ \theta \in \Theta (x,y) \in DT P(DT )l(x, y, \theta )\) . However, the training dataset should be obtained from the source domain to structure the target domain. In the case of \(P(DS) = P(DT)\) , we can solve optimization using the target domain. \(\theta ^{*} = arg min \ \theta \in \Theta (x,y) \in DS P(DS)l(x, y, \theta )\) .

In contrast, if P ( DS ) is not equal P ( DT ), modification of the above optimization problem is considered where a model learn high generalization ability for the target domain, as follows: Several methods are available to predict the values of P ( \(xS_i\) ) P ( \(xT_i\) ). According to the Zandrozny [ 20 ] , the author requested to estimate the numerical values of \(P(xS_i)\) and \(P(xT_i)\) without depending on the other classification problems. Similarly, an article by Fan et al. [ 21 ] elaborated this concept and idea of solving selection problems by using multiple classifiers to predict the probability ratio.

Lastly, the covariate shift or sample selection bias offers a considerable advantage and exponentially increases data quality to process the target data.

Applications of sample selection

In TL, sample selection has been used in several areas and models, making them different. It has been used to study drugs at the medical clinics, choose patients from the general population of the given demographic, and many others. The selected data thus differs from the general population based on gender, race, and patients. It has also been used in system detection software built many years ago to improve its predictive method and capabilities.

After using sample selection, the old system lacked organizing due to modern times’ new attacks or spam patterns that improved these challenges. The surveys based on particular religions that were undertaken did not relate to others due to their differences also used sample selection to link and bridge and apply to one another types of research which otherwise. Overall toward the survey’s religion, sample selection has been used to bring about relations among them with the other categories of the beliefs. All of the given examples do not follow the primary assumption of ML, which states that training and testing data have to be from one data distribution. The data are of more than one data distribution, where testing and training occur in different domains. Such selection makes the sample selection users thoughtfully choose the standard features of previous source data and the current target data.

Considering the above applications and the sample selection usage, it defines how many datasets in real-world applications are potentially biased. Further research has been done based on the sample selection bias. The proposed approaches do not assume the exact type of biases or formal models to quantify the distribution of the bias to rectify [ 20 ]. Reducing the sample selection bias is an open problem.

  • Domain adaptation

Domain adaptation is a type of TL in which the task remains the same but the source and destination have different domains or distributions. Consider a model that has been trained on x -rays of many patients to determine whether or not a patient is infected with covid-19 [ 22 ]. However, the best-generalized systems depend on appropriate datasets [ 3 ]. If the data is biased, the system can not generalize accurate outputs, and such a problem set is known as domain adaptation. Domain adaptation helps apply an algorithm to train one or more source domains to improve the target domain. The Domain adaptation process tries to alter the source domain to bring closer the distribution of the source domain to the target domain [ 23 ].

Mathematical explanation of the domain adaptation

Lets denote the domain as D which contains two components; a feature space X and a marginal probability distribution P ( x ). Feature space X can be \(X_1, X_2 ..... X_n .... \infty\) .

So, supervised learning tasks on a specific domain will be, \(D=\{X,P(x)\}\) . Further task will be consistent with Y lebel space and object predictive function f (); it will be denoted as \(T=\{Y, p(y/x)\}\) . The predictive object function f () may train data which contains the pairs of ( \(x_i\) , \(y_i\) ); where Y can be \(y_1\) , \(y_2\) , \(\infty\) and function f is used to predict the level of x . However, domain adaptation has two domains and two tasks. Given source domain [ 4 ], \(D_S=\{XS,p(xs)\}\) , where \(XS={xs_1,....xS_n}\) task on the source domain, \(T_S =\{YS,p(ys/xs)\}\) , where \(YS=\{YS_1, ... YS_N\}\) target domain \(D_T= \{XT,P(XT)\}\) , where \(XT=\{XT_1\ldots .XT_N)\}\) task on target domain \(T_t= \{T_t, P(YS/XT)\}\) , where \(YT=\{YT_1 ... YT_n\}\) [ 24 ] Those components are defined as improving the learning of the target predictive function \(f_T()\) . The target domain uses the information of the source domain and task on the source domain. \(f_{ST}()\) is initiate as a predictive model to train the source domain \(D_S\) so that that domain can adopt the target domain \(D_T\) [ 23 ].

TL applications

Real-world simulations.

Large production companies face enormous challenges to be more flexible for any production fluctuation while providing higher product quality with significantly lower costs in manufacturing and processing in a dynamic workforce [ 25 ]. The central objective of this process’s underlying setup is to distinguish optimal parameters that fulfill high product quality and efficiency.

One approach to defeating these difficulties is exploiting the techniques of artificial intelligence (AI), mainly supervised learning models. Supervised learning trains the models by using appropriately categorized or labeled information. Each ML (ML) or AI application that depends on gathering information or preparing a real-world model is costly, tedious, or even hazardous for our use or the environment. Therefore, robots are being trained to utilize simulations results in advancement and technology and limit development costs. Consequently, with these advancements, the systems become more practical and ideal. Furthermore, one can train, test, recreate, and program the robots to train themselves so that the real-world robots can transfer and use each information learned in the process. These kinds of transfers are done using progressive networks, an ideal platforms for real-world robot simulations. Contrarily, sometimes not all the simulation highlights are effectively repeated when applied in the actual word because of their complex interactions.

Considering the enhancement of the performance, the TL techniques have been also emphasized and used in the real world simulation dataset. The dataset and research articles based on real world simulation include; Policy transfer from simulations to the real world by transfer component analysis [ 26 ], Simulations, learning and real world capabilities [ 27 ], Real-world reinforcement learning via multifidelity simulators [ 28 ], Knowledge-aided Convolutional Neural Network for Small Organ Segmentation [ 29 ], Adaptive Fusion and Category-Level Dictionary Learning Model for Multi-View Human Action Recognition [ 30 ] and Stimulus-driven and concept-driven analysis for image caption generation [ 31 ] to name a few.

The rise in ML and other AI applications has made an enormous impact on gaming advancements so far. Today, one of the fine examples of this yield is AlphaGo, one of the first ML programs that defeated an expert human Go player, developed by Deepmind’s neural network. AlphaGo is an ace in this game. However, it is incompetent with other games and fails to win when entrusted to play different games. This failure happens because it is only programmed, designed, and fitted to play ’Go,’ which drives the ultimate drawback of utilizing artificial neural networks (ANN) in gaming. It can neither be as fast nor ace all games like a human brain. Therefore, in order to play and win other games, AlphaGo needs to thoroughly forget the algorithms of ’Go’ and learn to adapt to an entirely new program [ 32 ].

Consequently, with the help of TL, new games can now be played by re-applying the strategies learned in a previous game, as the definition of TL states. For instance, the applications of TL in gaming can be seen in the game MadRTS [ 33 ], which is a real-time strategy game that includes ongoing competing players. In this game, the application uses CARL (case-based reinforcement learning (RL)) [ 34 ], which is a multi-layered plan that joins case-based reasoning (CBR) and Reinforcement Learning (RL), that permits us to secure as well as separate the keys and strategies of our tasks and use the particular idea of TL in it [ 33 ].

  • Image classification

Multiple models on image classification have been developed to facilitate the resolving of the most pressing issue of identification and recognition accuracy [ 35 ]. Image classification is a significant subject in computer vision, with many applications. Object identification for robotic handling, human or object tracking for autonomous cars, and so on are a few examples of the applications of image classification [ 35 ]. Today, convolutional neural networks (CNN) show reliable outcomes on image or object detection and recognition that are helpful in real-world applications [ 36 ]. The architecture of CNN models works on training and predictions on a high level of abstraction. One of the best tendencies of neural networks is the ability to perceive things inside an image as they are prepared on labeled pictures of massive datasets, which is very challenging in time management. Several Computer Vision and ML issues have demonstrated that the CNN framework performs effectively on solving accuracy.

Convolutional Neural Networks (CNN) have influenced and dominated the ML vision field. In recent years CCN comprised three layers, namely, “an input layer, an output layer, and several hidden layers that includes deep networks, pooling layers, fully linked layers, and normalization layers (ReLU)” (main). For example, in a VGG-16 [ 37 ] ConvNet [ 38 ], illustrated in Fig.  4 , that consist of different layers containing a unique collection of picture combination attributes. The figure must be prepared to perceive images inside a dataset. In doing so, it is firstly pre-trained by utilizing ImageNet. It is layer-wise ready, beginning from the SoftMax layer and preparing it simultaneously, followed by the thick layers. However, these models rapidly strain battery power, limiting smaller gadgets, storage devices, and inexpensive phones [ 36 , 39 ]. To reduce such burdens, TL helps prepare the models through pre-training using ImageNet, consisting of many pictures from various sources and saving time. Another example can be, if a facial image is set as the input into a CNN structure, the system will start to learn basic properties in its training stage, such as lines and edges of face, bright and dark areas, contours, and so on [ 35 ].

figure 4

The architecture of VGG-16 ConvNet [ 39 ]

  • Zero shot translation

One of the popular procedures for machine translation is the neural machine translation (NMT) [ 40 ] which a colossal artificial neural organization achieves. It has displayed promising results and has indicated tremendous potential in unraveling machine interpretation and translations. The best way to exercise machine translation into a language can be done with a touch of planning data using zero-shot translation [ 41 ]. In this note, Zero-shot learning is considered as one of the most promising learning strategies, where the input sources and the classes we intend to portray are disjoint. Accordingly, zero-shot learning is connected to using supervised learning (similar to applications in gaming) to access its accuracy and the training data. One famous example is Google’s Neural Machine Translational model (GNMT), which considers powerful cross-lingual interpretations. For instance, to translate two different languages, Korean and Spanish, we need to have a pivot language (intermediary language) representing the two dialects. Firstly, Korean must be initially translated into English and later to Spanish. Here, English is an intermediary between Korean and Spanish, known as the pivot language.

Therefore, to avoid all the turns and bends from one language to the other, zero-shot can utilize all the available data to understand the translational data applied and to decipher it into a new translational language [ 42 ].

  • Sentiment classification

Understanding hidden or visible feelings conveyed online or in social media is essential to customers, and users [ 43 ]. Sentiment classification is acknowledged as perhaps the most significant area in Natural Language Processing (NLP) research. Social media has surpassed as the essential way of generating opinion data and because of domain diversity, applying sentiment classification on social media has a great deal of potential, but it also has many challenges [ 44 ]. One of the most common sub-areas in sentiment classification is interpreting an individual’s feelings conveyed via media content. Sentiment classification of social media data is unquestionably a project of big data. Earlier research based on sentiment classification analyses texts within a linguistic expression.

Sentiment classification is an additional helpful tool that allows a user or any business organization to identify and know their client’s choices and reviews by understanding their sentiment based on negative, positive, or neutral reviews (as Fig.  5 ), which may also be labeled as good/bad, satisfactory/not satisfactory. It is tough to build an entirely new compilation of texts to analyze sentiments since it is not easy to prepare models for identifying their feelings. Therefore, a solution to these problems can be solved using TL. For instance, if x is the input text and y is the feeling or thought, we need to predict a film review. The deep learning models are prepared on x input via sentiment analysis of the content corpus and identifying every statement’s polarity. When the model is prepared to understand feelings through the extremity of x information, its basic language model and learned knowledge is then transferred onto the model allotted a task to examine sentiments to y , i.e., film reviews. Additionally, different techniques such as embedding are also used in identifying various jobs related to sentiment analysis by transferring information from one source ( x ) and re-applying the same algorithms in the targeted area ( y ) to fulfill the predicted task [ 45 ].

Figure  5 shows the polarity of sentiment analysis Unhappy with the service Neither happy nor sad Happy with the service Very happy and totally in love with the service.

figure 5

Polarities of sentiment analysis [ 44 ]

Contributions of TL

Contributions in medical sciences.

Medical imaging and MRI play essential roles in routine clinical diagnosis and treatment. In MRI the variation shows the difference between standard and disease tissue. With ML and TL knowledge, medical scientists can easily detect the disease. However, there are vast training data that can be expensive to utilize. So, TL uses medical imaging and makes it Convolution neural networks(CNN) are a great success in analyzing medical images and making variations in medical imaging protocols. TL shows outstanding performance in white matter hyperintensity (WMH) segmentation (vascular origin, which can find on brain MRI or computed tomography), tumor segmentation (detecting the location of tumor cells), microbleed segmentation (detecting traumatic brain injury). Therefore, in many cases, network training on one MRI data acquired does not work efficiently in other protocols [ 46 ]. However, domain adaptation helps to ensure the usability of trained models in actual practices. This limit can be solved using the trained models with large annotated legacy datasets on new datasets with different domains, and trained models will require the clinical setting [ 47 ]. MRI shows the high variation among soft tissue and contrasts. For example, the image database, Imagenet contains more than fourteen million annotated images with more than twenty thousand categories. From them, finetune is based on the instances from the target domain [ 48 ].

Contributions in bioinformatics

In the bioinformatics field, analyzing biological sequences plays an important role. To understand the organism’s function, biologists should analyze the gene sequence of the particular microorganisms. However, TL and domain adaptation show outstanding bioinformatics performance (i.e., gene expression analysis, sequence classification, and network reconstruction). Domain follows different models of organisms or different data-collecting sets to maintain the predicted sequences. If there is any error while predicting the series, prognosis systems alarms to replace the component, and the system changes its properties. It will alarm until the system changes or is replaced by a new setting and component [ 46 ]. However, the bioinformatics application has many different problems of distribution. For example, two organisms’ functions can be the same, but the substance can differ, let in marginal distribution differences. On the other hand, if two species belong to a common ancestor with long evolutionary history, in this case, conditional distribution difference will work significantly. The TL can be used to predict the mRNA splicing. In the mRNA splicing case, a source domain can be C.elegans organism, and the target domain is C.remanei and P.pacificus organism. Several TL approaches compare, i.e., FAM and the variant KMM can improve classification performance. Besides, gene expression analysis predicts the association between genotypes and phenotypes. However, in this case, it can have data sparsity problems(not observing enough datasets) as the nucleotides sequence is minimal data. To ensure outstanding performance, TL can be used to provide other information [ 48 ].

There have been numerous use of TL in bioinformatics; the dataset uses the algorithms of TL to showcase the contribution of TL in its section, respectively. Few of the samples of dataset includes; TL for BNER Bioinformatics 2018 [ 49 ], Exploiting TL for the reconstruction of the human gene regulatory network [ 50 ] , Parasitologist-level classification of apicomplexan parasites and host cell with deep cycle TL (DCTL) [ 51 ], AITL: Adversarial Inductive TL with input and output space adaptation for pharmacogenomics [ 52 ], Improved RNA secondary structure and tertiary base-pairing prediction using evolutionary profile [ 53 ], mutational coupling and two-dimensional TL where the Computational prediction of RNA secondary structure is performed with reference of TL, Optimized hybrid investigative based dimensionality reduction methods for malaria vector using KNN classifier [ 54 ], A survey of dimension reduction and classification methods for RNA-Seq data on malaria vector [ 55 ] , A hybrid heuristic dimensionality reduction methods for classifying malaria vector gene expression data [ 56 ] and others to follow. https://www.overleaf.com/project/61fd387cb8f6c7ccd64510aa

Contributions in transportation

TL has transportation applications, i.e., understanding the traffic scene and target driver behavior. Here, images are taken from specific locations [ 57 ]. However, the outputs could be incorrect in variation because of different weather conditions. TL can give an outstanding solution by pictures taken in the exact location at different weather conditions [ 23 ]. For this, firstly, the system trains the network to specify the feature of pictures. Secondly, a new feature is built by the feature transformation strategy. Then, the dimension reduction algorithm generates a low-dimension feature. However, in the last stage, among the tested image, recovered best-suiting image and the Markov model transfers, the cross-domain sets with a best-matching image to test [ 58 ]. TL can be applied in target driver behavior with sufficient personalized data of each target driver’s behavior. TL can demonstrate the result even when target domains are limited, and data are very small or very large [ 48 ].

Contributions in the recommendation system

In this contemporary world, the most heated topic for many industries is to build up an automatic question answering system that more likely works as a recommendation system. Recommend er systems are widely used in the e-commerce field. It helps people to answer all questions that are related to the merchandise [ 59 ]. E-commerce has been an essential part of everyday life for people, and they are familiar with this field. Based on different products, or services, it has different websites where people can shop. E-commerce can be divided into vertical e-commerce and integrated e-commerce websites. From vertical e-commerce websites, people can shop for the same sorts of products. In contrast, integrated e-commerce websites sell multiple products, including food, clothing, research service, etc. The e-commerce recommender system used three techniques to recommend products, i.e., collaborating filtering, content-based filtering, and hybrid recommendation. According to information retrieval, content-based filtering first analyses consequences obtains a set of features, and then builds product feature vectors. Then it calculates the similarity between users and products then recommends. In ML, clustering is used for content-based filtering. Secondly, collaborative filtering recommendation follows two techniques: memory-based filtering and model-based filtering. Memory-based algorithms work on users’ ratings and preferences for a particular product. It predicts the target product of a user. Nevertheless, initially the memory-based and model-based algorithms study according to the ratings of records as well concerning the target user’s rating. Hybrid recommendation works the same as a content-based recommendation, but it can better perform than collaborative filtering. Though recommendation application did a lot in e-commerce, it has been facing many data sparsity problems, leading to poor recommendations [ 60 ]. However, TL comes out with the best recommendation system that combines collaborative filtering proposals to alleviate sparsity problems. This method improves the accuracy of advice by transferring the knowledge learned from dense data sets to sparse ones. TL makes the new framework for e-commerce recommender systems in which knowledge is known from the source domain and source task target domains and target tasks. With the help of TL, people can use knowledge and can solve their problems faster [ 61 ].

Future directions of TL

A great future is awaiting further advancements in TL research. We find many modern visual learning algorithms on data, those of desired object categories. For instance, the Object-oriented paradigm algorithmically detects, recognizes, and describes the unseen images [ 48 ]. We need new data collections containing the precious label to execute those modern visual learning algorithms. However, many pre-existing large datasets, such as Imagenet, have about 150M images, a massive pool with more information.

TL aims to use the previous knowledge and related source task and emphasize extra source data to boost a poorly targeted set [ 2 ]. Besides, TL problems can be solved by dynamic settings for online learning and self-leveling data. Therefore, most often, pre-existing resources are ignored because of no overlapping. This pre-existing can be used for classification and localization; no past knowledge and datasets are useless. Therefore, we can contribute to taking some steps to ensure the best use of pre-existing data. We can revisit the past knowledge and generalize, which may make research potential and relevance for practical purposes and application [ 46 ]. Secondly, we may improve the previous TL methods when the least annotated samples of the new target domain are available. Zero-shot classification is the advanced step because it obtains classifiers for novel categories and arbitrary basis though less data is available. Besides, zero-shot is reliable for the textual embedding of image datasets, and it is faster, more accurate, and more economically active. Finally, we may combine the zero-shot and active learning in support vector machines with optimal query conditions. Additionally, a future study in the domain of TL can go in a variety of areas such as:

To begin, TL techniques can be investigated further and applied to a broader range of applications. New ways are required to overcome knowledge transfer challenges in more complicated circumstances. IFor example, in real-world settings, the client source-domain data may come from a different organization. In this scenario, the question of how to transmit knowledge from the source domain while maintaining user privacy is crucial

Secondly, determining ways to quantify the transfer of information across domains while avoiding negative transfer is crucial. Although few studies have been done on negative transfers, more systematic research is still needed [ 62 ].

Thirdly, the validity of TL requires further research [ 63 ].

In terms of Challenges and gaps, the figure below depicts them (Fig. 6 ).

figure 6

Challenges and gaps in the literatures of TL concerning this table

Furthermore, theoretical research can be performed to establish theoretical evidence for TL’s potency and validity. As a prominent and promising field of ML, TL has several advantages over classical/traditional ML, including reduced data need and less label reliance.

TL is based on data distribution where one task is used in another. It uses outdated data and regulates the source task and target task. It follows some specific strategies based on data and model interpretation. This paper discussed the goals and strategies of TL by introducing the objectives and some of its learning approaches. In addition, we also briefly mentioned the techniques of TL at a model level, along with its applications. Several TL applications have been presented, such as in medicine, bioinformatics, transportation, recommendation, e-commerce, etc. The application of TL in numerous fields indicates that it is an essential research topic and can pave the way for the future technological era. However, it may seem difficult in practice.

Availability of data and materials

N/A (no data used).

Torrey L, Shavlik J. Transfer learning. In: Handbook of research on machine learning applications and trends: algorithms, methods, and techniques, 2010; 242–264. IGI global.

Weiss K, Khoshgoftaar TM, Wang D. A survey of transfer learning. J Big Data. 2016;3(1):1–40.

Article   Google Scholar  

Pan SJ, Yang Q. A survey on transfer learning. IEEE Trans Knowl Data Eng. 2009;22(10):1345–59.

Taylor ME, Stone P. Transfer learning for reinforcement learning domains: a survey. J Mach Learn Res 2009;10(7).

Witten IH, Frank E. Data mining: practical machine learning tools and techniques with java implementations. Acm Sigmod Record. 2002;31(1):76–7.

Shimodaira H. Improving predictive inference under covariate shift by weighting the log-likelihood function. J Stat Plan Inference. 2000;90(2):227–44.

Article   MathSciNet   Google Scholar  

Thrun S, Pratt L. Learning to learn: introduction and overview. In: Learning to Learn. Springer, 1998;p. 3–17

Caruana R. Multitask learning. Mach Learn. 1997;28(1):41–75.

Trotter M. Machine learning deployment for enterprise 2021. https://www.seldon.io/transfer-learning/

Sarkar DD. A comprehensive hands-on guide to transfer learning with real-world applications in deep learning. Towards Data Sci. 2018;20:2020.

Google Scholar  

Brownlee J. A Gentle introduction to transfer learning for deep learning 2019. https://machinelearningmastery.com/transfer-learning-for-deep-learning/ .

Raina R, Battle A, Lee H, Packer B, Ng AY. Self-taught learning: transfer learning from unlabeled data. In: Proceedings of the 24th International Conference on Machine Learning, pp. 759–766 2007.

Dai W, Yang Q, Xue G-R, Yu Y. Self-taught clustering. In: Proceedings of the 25th International Conference on Machine Learning, 2008;200–207.

Wang Z, Song Y, Zhang C. Transferred dimensionality reduction. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases, 2008:550–565. Springer

Ren J, Shi X, Fan W, Yu PS. Type independent correction of sample selection bias via structural discovery and re-balancing. In: Proceedings of the 2008 SIAM International Conference on Data Mining, 2008;565–576. SIAM

Huang J, Gretton A, Borgwardt K, Schölkopf B, Smola A. Correcting sample selection bias by unlabeled data. Adv Neural Inf Process Syst. 2006;19:601–8.

Tran V-T. Selection bias correction in supervised learning with importance weight. PhD thesis 2017.

Liu A, Ziebart B. Robust classification under sample selection bias. In: Advances in neural information processing systems, 2014;37–45.

Liu Z, Yang J-A, Liu H, Wang W. Transfer learning by sample selection bias correction and its application in communication specific emitter identification. JCM. 2016;11(4):417–27.

Zadrozny B. Learning and evaluating classifiers under sample selection bias. In: Proceedings of the Twenty-first International Conference on Machine Learning, p. 114 2004.

Fan W, Davidson I, Zadrozny B, Philip SY. An improved categorization of classifier’s sensitivity on sample selection bias. In: ICDM, 2005;5:605–608. Citeseer

Kamath U, Liu J, Whitaker J. Transfer learning: Domain adaptation. In: Deep Learning for NLP and Speech Recognition. Springer; 2019, p. 495–535.

Ghafoorian M, Mehrtash A, Kapur T, Karssemeijer N, Marchiori E, Pesteie M, Guttmann CR, de Leeuw F-E, Tempany CM, Van Ginneken B. Transfer learning for domain adaptation in mri: Application in brain lesion segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, 2017;516–524. Springer

Steinwart I. On the influence of the kernel on the consistency of support vector machines. J Mach Learn Res. 2001;2:67–93.

MathSciNet   MATH   Google Scholar  

Tercan H, Guajardo A, Heinisch J, Thiele T, Hopmann C, Meisen T. Transfer-learning: bridging the gap between real and simulation data for machine learning in injection molding. Procedia CIRP. 2018;72:185–90.

Matsubara T, Norinaga Y, Ozawa Y, Cui Y. Policy transfer from simulations to real world by transfer component analysis. In: 2018 IEEE 14th International Conference on Automation Science and Engineering (CASE), 2018;264–269. IEEE

Wood RE, Beckmann JF, Birney DP. Simulations, learning and real world capabilities. Education+ Training 2009.

Cutler M, Walsh TJ, How JP. Real-world reinforcement learning via multifidelity simulators. IEEE Trans Robot. 2015;31(3):655–71.

Zhao Y, Li H, Wan S, Sekuboyina A, Hu X, Tetteh G, Piraud M, Menze B. Knowledge-aided convolutional neural network for small organ segmentation. IEEE J Biomed Health Inform. 2019;23(4):1363–73.

Gao Z, Xuan H-Z, Zhang H, Wan S, Choo K-KR. Adaptive fusion and category-level dictionary learning model for multiview human action recognition. IEEE Internet Things J. 2019;6(6):9280–93.

Ding S, Qu S, Xi Y, Wan S. Stimulus-driven and concept-driven analysis for image caption generation. Neurocomputing. 2020;398:520–30.

Chen JX. The evolution of computing: Alphago. Comput Sci Eng. 2016;18(4):4–7.

Sharma M, Holmes MP, Santamaría JC, Irani A, Isbell Jr CL, Ram A. Transfer learning in real-time strategy games using hybrid cbr/rl. In: IJCAI, 2007;7:1041–1046.

Jiang C, Sheng Z. Case-based reinforcement learning for dynamic inventory control in a multi-agent supply-chain system. Expert Syst Appl. 2009;36(3):6520–6.

Hussain M, Bird JJ, Faria DR. A study on cnn transfer learning for image classification. In: UK Workshop on Computational Intelligence, 2018;191–202. Springer

Rastegari M, Ordonez V, Redmon J, Farhadi A. Xnor-net: Imagenet classification using binary convolutional neural networks. In: European Conference on Computer Vision, 2016;525–542. Springer

Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 2014.

Krizhevsky A, Sutskever I, Hinton GE. Imagenet classification with deep convolutional neural networks. Adv Neural Inf Process Syst. 2012;25:1097–105.

Neurohive: Convolutional Network for Classification and Detection (2018). https://neurohive.io/en/popular-networks/vgg16/ .

Chu C, Wang R. A survey of domain adaptation for neural machine translation. arXiv preprint arXiv:1806.00258 2018.

Kumar R, Jha P, Sahula V. An augmented translation technique for low resource language pair: Sanskrit to hindi translation. In: Proceedings of the 2019 2nd International Conference on Algorithms, Computing and Artificial Intelligence, pp. 377–383 2019.

Lobo S. 5 cool ways Transfer Learning is being used today 2018. https://hub.packtpub.com/5-cool-ways-transfer-learning-used-today/#: texttildelow :text=Transfer learning reduces the efforts,driving other two-wheeled vehicles

Tao J, Fang X. Toward multi-label sentiment analysis: a transfer learning based approach. J Big Data. 2020;7(1):1–26.

Fang X, Zhan J. Sentiment analysis using product review data. J Big Data. 2015;2(1):1–14.

Nabi J. Machine Learning!!!FIX ME!!!-!!!Word Embedding & Sentiment Classification using Keras. Towards Data Science 2018. https://towardsdatascience.com/machine-learning-word-embedding-sentiment-classification-using-keras-b83c28087456 .

Zhuang F, Qi Z, Duan K, Xi D, Zhu Y, Zhu H, Xiong H, He Q. A comprehensive survey on transfer learning. Proc IEEE. 2020;109(1):43–76.

Shao L, Zhu F, Li X. Transfer learning for visual categorization: a survey. IEEE Trans Neural Netw Learn Syst. 2014;26(5):1019–34.

Kouw WM, Loog M. A review of domain adaptation without target labels. IEEE Trans Pattern Anal Mach Intell. 2019;43(3):766–85.

BaderLab: BaderLab/Transfer-Learning-BNER-Bioinformatics-2018: This repository contains supplementary data, and links to the model and corpora used for the paper: Transfer learning for biomedical named entity recognition with neural networks.

Mignone P, Pio G, D’Elia D, Ceci M. Exploiting transfer learning for the reconstruction of the human gene regulatory network. Bioinformatics. 2020;36(5):1553–61.

Li S, Yang Q, Jiang H, Cortés-Vecino JA, Zhang Y. Parasitologist-level classification of apicomplexan parasites and host cell with deep cycle transfer learning (dctl). Bioinformatics. 2020;36(16):4498–505.

Sharifi-Noghabi H, Peng S, Zolotareva O, Collins CC, Ester M. Aitl: adversarial inductive transfer learning with input and output space adaptation for pharmacogenomics. Bioinformatics. 2020;36(Supplement–1):380–8.

Singh J, Paliwal K, Zhang T, Singh J, Litfin T, Zhou Y. Improved RNA secondary structure and tertiary base-pairing prediction using evolutionary profile, mutational coupling and two-dimensional transfer learning. Bioinformatics. 2021;37:2589–600.

Arowolo MO, Adebiyi MO, Adebiyi AA, Olugbara O. Optimized hybrid investigative based dimensionality reduction methods for malaria vector using knn classifier. J Big Data. 2021;8(1):1–14.

Arowolo MO, Adebiyi MO, Aremu C, Adebiyi AA. A survey of dimension reduction and classification methods for RNA-seq data on malaria vector. J Big Data. 2021;8(1):1–17.

Arowolo MO, Adebiyi MO, Adebiyi AA, Okesola OJ. A hybrid heuristic dimensionality reduction methods for classifying malaria vector gene expression data. IEEE Access. 2020;8:182422–30.

Shorten C, Khoshgoftaar TM. A survey on image data augmentation for deep learning. J Big Data. 2019;6(1):1–48.

Wang M, Deng W. Deep visual domain adaptation: a survey. Neurocomputing. 2018;312:135–53.

Adão T, Hruška J, Pádua L, Bessa J, Peres E, Morais R, Sousa JJ. Hyperspectral imaging: a review on uav-based sensors, data processing and applications for agriculture and forestry. Remote Sensing. 2017;9(11):1110.

Gao Y, Mosalam KM. Deep transfer learning for image-based structural damage recognition. Comput Aided Civil Infrastruct Eng. 2018;33(9):748–68.

Tang J, Zhao Z, Bei J, Wang W. The application of transfer learning on e-commerce recommender systems. In: 2013 10th Web Information System and Application Conference, 2013;479–482. IEEE

Wang Z, Dai Z, Poczos B, Carbonell J. Characterizing and avoiding negative transfer. in 2019 IEEE. In: CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019;11285–11294.

Lipton ZC. The mythos of model interpretability: in machine learning, the concept of interpretability is both important and slippery. Queue. 2018;16(3):31–57.

Download references

Acknowledgements

Partially supported by Khalifa University, UAE.

Author information

Asmaul Hosna1, Ethel Merry1 and Jigmey Gyalmo1 contributed equally.

Authors and Affiliations

Department of Computer Science, Asian University for Women, 20/A M. M. Ali Road, Chittogram, Bangladesh

Asmaul Hosna, Ethel Merry, Jigmey Gyalmo, Zulfikar Alom & Mohammad Abdul Azim

Department of Electrical Engineering and Computer Science, Khalifa University, Abu Dhabi, United Arab Emirates

You can also search for this author in PubMed   Google Scholar

Contributions

Equally contributed. Research lead by MAA. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Mohammad Abdul Azim .

Ethics declarations

Ethics approval and consent to participate, consent to participate, competing interests.

The authors declare that they have no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Hosna, A., Merry, E., Gyalmo, J. et al. Transfer learning: a friendly introduction. J Big Data 9 , 102 (2022). https://doi.org/10.1186/s40537-022-00652-w

Download citation

Received : 04 September 2021

Accepted : 19 September 2022

Published : 22 October 2022

DOI : https://doi.org/10.1186/s40537-022-00652-w

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Machine learning
  • Transfer learning
  • Multi-task learning

case study of transfer learning

case study of transfer learning

Maintenance work is planned for Wednesday 1st May 2024 from 9:00am to 11:00am (BST).

During this time, the performance of our website may be affected - searches may run slowly and some pages may be temporarily unavailable. If this happens, please try refreshing your web browser or try waiting two to three minutes before trying again.

We apologise for any inconvenience this might cause and thank you for your patience.

case study of transfer learning

Digital Discovery

Tackling data scarcity with transfer learning: a case study of thickness characterization from optical spectra of perovskite thin films †.

ORCID logo

* Corresponding authors

a Low Energy Electronic Systems (LEES), Singapore-MIT Alliance for Research and Technology (SMART), 1 Create Way, Singapore 138602, Singapore E-mail: [email protected] , [email protected]

b Solar Energy Research Institute of Singapore (SERIS), National University of Singapore, 7 Engineering Drive, Singapore 117574, Singapore

c Institute of Materials Research and Engineering (IMRE), Agency for Science, Technology and Research (A*STAR), 2 Fusionopolis Way, Singapore 138634, Singapore

d Department of Mechanical Engineering, Massachusetts Institute of Technology (MIT), 77 Massachusetts Ave., Cambridge, MA 02139, USA

e Institute for Infocomm Research (I2R), Agency for Science, Technology and Research (A*STAR), 1 Fusionopolis Way, Singapore 138632, Singapore

f Institute of Sustainability for Chemicals, Energy and Environment, Agency for Science, Technology and Research (A*STAR), 1 Pesek Rd, Singapore 627833, Singapore

g Department of Chemistry, The University of British Columbia (UBC), 2036 Main Mall, Vancouver, BC V6T 1Z1, Canada

h Department of Mathematics, National University of Singapore (NUS), 21 Lower Kent Ridge Rd, Singapore 119077, Singapore

Transfer learning (TL) increasingly becomes an important tool in handling data scarcity, especially when applying machine learning (ML) to novel materials science problems. In autonomous workflows to optimize optoelectronic thin films, high-throughput thickness characterization is often required as a downstream process. To surmount data scarcity and enable high-throughput thickness characterization, we propose a transfer learning workflow centering an ML model called thicknessML that predicts thickness from UV-Vis spectrophotometry. We demonstrate the transfer learning workflow from a generic source domain (of materials with various bandgaps) to a specific target domain (of perovskite materials), where the target-domain data are from just 18 refractive indices from the literature. While featuring perovskite materials in this study, the target domain easily extends to other material classes with a few corresponding literature refractive indices. With accuracy defined as being within-10%, the accuracy rate of perovskite thickness prediction reaches 92.2 ± 3.6% (mean ± standard deviation) with TL compared to 81.8 ± 11.7% without. As an experimental validation, thicknessML with TL yields a 10.5% mean absolute percentage error (MAPE) for six deposited perovskite films.

Graphical abstract: Tackling data scarcity with transfer learning: a case study of thickness characterization from optical spectra of perovskite thin films

Associated articles

  • Correction: Tackling data scarcity with transfer learning: a case study of thick…

Supplementary files

  • Supplementary information PDF (348K)

Article information

case study of transfer learning

Download Citation

Permissions.

case study of transfer learning

Tackling data scarcity with transfer learning: a case study of thickness characterization from optical spectra of perovskite thin films

S. I. P. Tian, Z. Ren, S. Venkataraj, Y. Cheng, D. Bash, F. Oviedo, J. Senthilnath, V. Chellappan, Y. Lim, A. G. Aberle, B. P. MacLeod, F. G. L. Parlane, C. P. Berlinguette, Q. Li, T. Buonassisi and Z. Liu, Digital Discovery , 2023,  2 , 1334 DOI: 10.1039/D2DD00149G

This article is licensed under a Creative Commons Attribution-NonCommercial 3.0 Unported Licence . You can use material from this article in other publications, without requesting further permission from the RSC, provided that the correct acknowledgement is given and it is not used for commercial purposes.

To request permission to reproduce material from this article in a commercial publication , please go to the Copyright Clearance Center request page .

If you are an author contributing to an RSC publication, you do not need to request permission provided correct acknowledgement is given.

If you are the author of this article, you do not need to request permission to reproduce figures and diagrams provided correct acknowledgement is given. If you want to reproduce the whole article in a third-party commercial publication (excluding your thesis/dissertation for which permission is not required) please go to the Copyright Clearance Center request page .

Read more about how to correctly acknowledge RSC content .

Social activity

Search articles by author, advertisements.

A case study on transfer learning in convolutional neural networks

Ieee account.

  • Change Username/Password
  • Update Address

Purchase Details

  • Payment Options
  • Order History
  • View Purchased Documents

Profile Information

  • Communications Preferences
  • Profession and Education
  • Technical Interests
  • US & Canada: +1 800 678 4333
  • Worldwide: +1 732 981 0060
  • Contact & Support
  • About IEEE Xplore
  • Accessibility
  • Terms of Use
  • Nondiscrimination Policy
  • Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity. © Copyright 2024 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.

  • Open access
  • Published: 26 April 2024

Exploring combinations of dimensionality reduction, transfer learning, and regularization methods for predicting binary phenotypes with transcriptomic data

  • S. R. Oshternian 1   na1 ,
  • S. Loipfinger 1   na1 ,
  • A. Bhattacharya 1 &
  • R. S. N. Fehrmann 1  

BMC Bioinformatics volume  25 , Article number:  167 ( 2024 ) Cite this article

Metrics details

Numerous transcriptomic-based models have been developed to predict or understand the fundamental mechanisms driving biological phenotypes. However, few models have successfully transitioned into clinical practice due to challenges associated with generalizability and interpretability. To address these issues, researchers have turned to dimensionality reduction methods and have begun implementing transfer learning approaches.

In this study, we aimed to determine the optimal combination of dimensionality reduction and regularization methods for predictive modeling. We applied seven dimensionality reduction methods to various datasets, including two supervised methods (linear optimal low-rank projection and low-rank canonical correlation analysis), two unsupervised methods [principal component analysis and consensus independent component analysis (c-ICA)], and three methods [autoencoder (AE), adversarial variational autoencoder, and c-ICA] within a transfer learning framework, trained on > 140,000 transcriptomic profiles. To assess the performance of the different combinations, we used a cross-validation setup encapsulated within a permutation testing framework, analyzing 30 different transcriptomic datasets with binary phenotypes. Furthermore, we included datasets with small sample sizes and phenotypes of varying degrees of predictability, and we employed independent datasets for validation.

Our findings revealed that regularized models without dimensionality reduction achieved the highest predictive performance, challenging the necessity of dimensionality reduction when the primary goal is to achieve optimal predictive performance. However, models using AE and c-ICA with transfer learning for dimensionality reduction showed comparable performance, with enhanced interpretability and robustness of predictors, compared to models using non-dimensionality-reduced data.

These findings offer valuable insights into the optimal combination of strategies for enhancing the predictive performance, interpretability, and generalizability of transcriptomic-based models.

Peer Review reports

Introduction

There is an ongoing effort to develop predictive models that utilize transcriptomic profiles. Clinicians can use such models to select, for example, the optimal treatment for each patient, i.e., precision medicine. Unfortunately, only a few transcriptomic-based predictive models have reached clinical practice, e.g., for oncology, these include Oncotype DX and MammaPrint [ 1 ]. One reason for this limited adaptation is that the sample size is often too small compared to the numerous potential input predictors—in this case, gene expression levels—used to train the predictive model. As a result, overfitting can occur when a model learns the details and noise in the training dataset to such an extent that it negatively impacts the model's performance on new data.

Dimensionality reduction or regularization techniques can be employed to mitigate overfitting [ 2 ]. Well-known unsupervised dimensionality reduction methods include principal component analysis (PCA), consensus independent component analysis (c-ICA), and autoencoders (AE) [ 3 , 4 , 5 ]. Both PCA and c-ICA linearly transform a training dataset with many predictors into a new set—comprising fewer predictors—that still retains most of the information in the original dataset. In PCA and c-ICA, the new predictors are the activity scores (loading factors or mixing matrix weights, respectively) of each component in each sample. An AE is a type of deep neural network that consists of an encoder and a decoder network. The encoder network learns to reduce the data's dimensionality, transforming numerous predictors into a limited set of new predictors (i.e., latent representation). The decoder network learns to reconstruct the input data from these latent representations with minimal loss of information. Supervised dimensionality reduction methods, such as linear optimal low-rank projection (LOL) and low-rank canonical correlation analysis (CCA), effectively reduce data to a lower-dimensional representation, maintaining class-related information [ 6 , 7 ]. These methods differ from unsupervised approaches by utilizing both predictor variables and class labels to inform the reduction process.

Unsupervised dimensionality reduction methods could benefit from transfer learning, a technique that enables them to draw on insights gained from more extensive and diverse datasets [ 8 ]. In our approach, we leveraged a comprehensive set of transcriptomic profiles—beyond those used for predictive model training—to refine the c-ICA and AE dimensionality reduction methods. The refined c-ICA linear transformation, or the AE's encoder network, is subsequently applied to a specific dataset to generate a new set of predictors. These predictors are then used as input to train the predictive model. Employing transfer learning in this manner has the potential to increase the robustness of these new predictors, thereby mitigating overfitting and enhancing the model's predictive performance.

Lasso, Ridge, and Elastic Net are popular regularization techniques used to mitigate overfitting [ 9 , 10 , 11 ]. During the training phase, these techniques aim to minimize the impact of predictors that are highly correlated with each other, thereby reducing model complexity. Generally, these regularization techniques add a penalty to the model as its complexity increases during training. Combining dimensionality reduction methods with these regularization techniques might even further improve the model's predictive performance.

In this study, our aim was to determine which dimensionality reduction method across supervised approaches (LOL, CCA), unsupervised approaches (PCA, c-ICA), and transfer learning approaches (AE, AVAE, and c-ICA) can enhance predictive performance of models. We investigated their impact on predictive performance with and without the application of regularization techniques. For this, we trained predictive models on 30 different transcriptomic datasets and their dimensionality-reduced counterparts. We then evaluated the models' performance and the robustness of predictor selection using a cross-validation setup encapsulated in a permutation testing framework (Fig.  1 ).

figure 1

Study overview. Preprocessed gene expression datasets served as input for two supervised dimensionality reduction methods: linear optimal low-rank projection (LOL) and low-rank canonical correlation analysis (CCA), alongside two unsupervised methods: principal component analysis (PCA) and consensus independent component analysis (c-ICA). Furthermore, latent representations were generated using a transfer learning approach with an autoencoder (AE), adversarial variational autoencoder (AVAE), and c-ICA, all trained on the GPL570 dataset. These gene-level and latent representations were then individually utilized in the predictive modeling pipeline, employing a cross-validation strategy with and without three different regularization techniques to evaluate predictive performance. The statistical significance of model performance was determined using a permutation test

Data acquisition of an extensive compendium of transcriptomic profiles (GPL570 dataset)

Publicly available raw gene expression data generated using the Affymetrix HG-U133 Plus 2.0 microarray platform was obtained from the Gene Expression Omnibus (GEO; accession number GPL570) [ 12 ]. Preprocessing and aggregation of raw gene expression data were conducted using the robust multi-array average algorithm available in Analysis Power Tools (release 2.11.3). The mapping of probesets to genes and quality control procedures are described in Additional file 1 : Supplementary Note. This comprehensive collection of transcriptomic profiles is referred to as the GPL570 dataset.

For predictive analysis, we selected transcriptomic datasets with available biological phenotypes from GEO. All selected datasets were preprocessed in the same manner as the GPL570 dataset.

Training of the autoencoder

Autoencoder (AE) methods are a type of unsupervised deep neural network utilized for various tasks, including feature learning, dimensionality reduction, data denoising, and classification [ 5 ]. The encoder network learns to reduce the dimensionality of the input data by mapping it to a latent space, thereby generating a new representation with a limited number of variables, referred to as the latent representation. The decoder network learns to reconstruct the samples in the input data from their latent representations with minimal loss of information.

In our study, the encoder maps the GPL570 dataset, consisting of 139,786 samples and 19,863 genes, to a latent representation with 1024 latent variables. The gene expression levels serve as the input variables for the encoder. The decoder attempts to reconstruct the samples' gene expression levels from their latent representations. A schematic representation of the AE can be found in Additional file 1 : Fig. S1, and a detailed description of hyperparameters and layers is provided in Additional file 1 : Table S1.

After randomly shuffling the samples in the GPL570 dataset, we divided the dataset into training (70%, n = 97,850), validation (15%, n = 20,968), and test sets (15%, n = 20,968). The training of AEs is focused on minimizing the difference between the input and reconstructed data. We employed the Mean Squared Error (MSE) loss function to train the encoder and the decoder parameters. AE training was conducted using the Ranger optimizer [ 13 ]. The MSE was used to train and evaluate the model's performance from a sample perspective. In addition, we calculated a metric to gauge the reconstruction performance from the gene perspective. First, we calculated the Pearson correlation between the genes in the input data, resulting in a triangular correlation matrix with dimension p genes by p genes. Second, we calculated the same triangular correlation matrix using the reconstructed data. We then computed the absolute difference between the correlations obtained with the input and the reconstructed data for each gene pair. After sorting the absolute correlation differences in ascending order, we obtained the 95th percentile, referred to as the R-difference 95th , as a reconstruction performance metric from the gene perspective. The closer the R-difference 95th is to zero, the better the reconstructed data captures the gene-by-gene correlation structure present in the input data.

Training of the adversarial variational autoencoder

An adversarial variational autoencoder (AVAE) is a type of deep neural network that enhances the capabilities of a standard AE by incorporating adversarial training and imposing a constraint on the latent distribution [ 14 ]. This approach can result in more biologically meaningful representations in the latent space and may allow, for example, the generation of new transcriptomic profiles with similar properties by adding noise to the latent representation [ 15 ]. A schematic representation of the AVAE and a detailed description of its hyperparameters and layers is provided in Additional file 1 : Fig. S2 and Table S2.

To train and evaluate the AVAE, we used the same GPL570 training, validation, and test sets as with the AE. During the training process, the AVAE aims to encode the samples' gene expression levels into a latent representation and to reconstruct them from this latent representation. In our AVAE, we imposed constraints that force the latent variables to conform to a Gaussian prior distribution. We employed the Ranger optimizer, using parameters identical to those in the AE training, to minimize both the MSE for reconstruction performance and the Kullback–Leibler divergence (KL) to regulate the distribution of the latent variables [ 16 ].

The calculation of loss for the encoder and decoder is as follows: The encoder loss is the sum of the MSE loss of the reconstructed data and the KL loss of the generated latent representation. The decoder loss is the sum of the discriminator loss for the reconstructed data, the discriminator loss for the generated data, and the MSE loss of the reconstructed data.

Consensus independent component analysis on the GPL570 dataset

Consensus independent component analysis (c-ICA) was performed to segregate bulk transcriptomic profiles into statistically independent transcriptional components (TCs), as previously described in more detail [ 17 ].

In brief, applying c-ICA to a transcriptomic dataset with p genes and n samples yields i transcriptional components, each of dimension 1 ×  p . Each TC captures the transcriptional pattern of an underlying process (e.g., biological process or cellular state). A TC comprises p scalars, representing both the direction and magnitude of the effect of the underlying process on a gene's expression level. In addition to the TCs, c-ICA also outputs a mixing matrix (MM) of dimension i  ×  n , containing coefficients for each TC-sample pair. The inner product between an individual sample's vector of coefficients in the MM and the vector of scalars for each gene across all TCs results in reconstructed transcriptomic profiles that closely resemble the input profiles. Thus, the coefficients in the MM can be interpreted in a similar manner to the latent representations derived from the AE and AVEA.

Initially, a preprocessing technique called whitening is applied to the input dataset to accelerate the convergence rate of the ICA algorithm. This involves conducting PCA on the sample-by-sample covariance matrix. The number of principal components capturing at least 90% of the total variance in the GPL570 dataset serves as the input for the c-ICA. Due to the random initialization of the optimization algorithm, 25 ICA runs were conducted, and only TCs that could be identified consistently across multiple runs were selected using a consensus approach. For parameters and details on performing the c-ICA algorithm on the GPL570 dataset, please refer to Additional file 1 : Supplementary Note. The c-ICA yield TCs, each representing a robust, statistically independent, and distinct transcriptional pattern of an underlying process, along with a mixing matrix that describes the latent representations of the transcriptomes.

Dimensionality reduction methods applied to datasets

Various dimensionality reduction methods were applied to each transcriptomic dataset. For the purpose of reference comparison, all datasets were also included without any dimensionality reduction. These datasets, which contain a high number of genes and thus feature complexity, are hereafter referred to as gene-level datasets. The gene-level datasets served as input for several dimensionality reduction methods, which are described below.

Supervised dimensionality reduction

Supervised dimensionality reduction effectively reduces data to a lower-dimensional representation, maintaining class-related information. We applied two supervised methods, linear optimal low-rank projection (LOL) and low-rank canonical correlation analysis (CCA), to gene-level datasets with their corresponding phenotype classes [ 6 , 7 ]. While LOL uses class-conditional means and class-centered covariance to optimize data representation for improved classification, CCA focuses on identifying correlated patterns between samples from specific phenotype classes. The resulting representations are referred to as LOL and CCA latent representations, respectively.

Principal component analysis

Principal component analysis (PCA) is a widely used method for reducing the dimensionality of transcriptomic data [ 3 ]. In each gene-level dataset, individual gene's expression levels were standardized to have a mean of 0 and a standard deviation of 1. The sample correlation matrix was then calculated. Subsequently, the eigenvectors and eigenvalues of this matrix were determined. The pseudo-inverse of the eigenvector matrix yields activity scores for each principal component (PC) in each sample. These activity scores, capturing 100% of the variance observed in the gene-level dataset, are referred to as the single PCA latent representation.

Consensus independent component analysis

c-ICA was trained independently on each of the gene-level datasets. For every dataset, the number of principal components that accounted for 100% of the dataset variance during the whitening step was used as the input for ICA. Only TCs that were identified multiple times across 50 runs were selected, following a consensus approach (see Additional file 1 : Supplementary Note for details). The resulting mixing matrix, which comprises the activity scores for each TC across all samples, is referred to as the single ICA latent representation.

  • Autoencoder

We utilized each gene-level dataset as input for the AE network, which had been pre-trained on the GPL570 dataset. Our objective in applying transfer learning was focused on dimensionality reduction, rather than transferring specific phenotypic information. To achieve a compact representation, we passed the datasets through the encoder layers of the AE. This resulting representation, encompassed by 1024 latent variables, is referred to as the AE latent representation.

Adversarial variational autoencoder

Each gene-level dataset was individually used as input for the trained AVAE network. The datasets were processed through both the shared and the specific encoder networks, resulting in a representation with 256 latent variables (refer to Additional file 1 : Fig. S2 for details). This latent representation contains vectors representing the mean and standard deviation. This mean vector consists of 128 latent variables for each sample and is referred to as the AVAE latent representation.

c-ICA transformation obtained with the GPL570 dataset

Each gene-level dataset was projected into its latent representation using the c-ICA model that had been trained on the GPL570 dataset. This involved calculating the pseudo-inverse of the independent component matrix and then multiplying this by the matrix of the gene-level dataset [ 18 ]. The resulting projected mixing matrix for each dataset included activity scores for 3286 independent components per sample, captured as latent variables, and is referred to as the GPL570 ICA latent representation.

In summary, we generated seven different representations for each gene-level dataset (see Fig.  1 ). First, we applied dimensionality reduction methods on each dataset individually using two supervised methods to obtain 'LOL' and 'CCA' latent representations, and two unsupervised methods to construct 'single PCA' and 'single ICA' latent representations. Secondly, we leveraged unsupervised transfer learning approaches, using three models trained on the GPL570 dataset: an AE, an AVAE, and a c-ICA model, yielding three additional latent representations: 'AE', 'AVAE', and 'GPL570 ICA'. The seven representations, alongside the original gene-level data, formed the basis for our subsequent predictive modeling.

  • Predictive modeling

Predictive models without regularization

In our predictive modeling, logistic regression without regularization was utilized for phenotype prediction. The disproportionately large number of predictors compared to the sample size, rendered the direct application of a logistic regression model infeasible. To circumvent this issue, we employed the Median Absolute Deviation (MAD) to identify the most variable predictors, thereby capping the number of predictors to match the sample size. Predictors exhibiting the highest MAD were deemed to indicate the most significant variation among samples, thus more likely to discern differences in the data. These specifically chosen predictors were subsequently used to train the logistic regression model.

Predictive models using regularization techniques

We also incorporated logistic regression combined with three regularization techniques: Lasso, Ridge, and Elastic Net, for phenotype prediction [ 7 , 8 , 9 , 19 ]. These techniques are designed to deal with correlated predictors and provide more stable models. Lasso produces sparse models by setting the coefficients of certain predictors to zero. Ridge assigns coefficients close to zero to reduce multicollinearity. Elastic Net combines the strengths of both Lasso and Ridge. These techniques help us identify informative input predictors with minimal inter-correlation for effective prediction. Detailed information with all parameters can be found in Additional file 1 : Supplementary Note.

Evaluation metrics for performance

To assess the predictive performance of the models, the Matthew Correlation Coefficient (MCC) was selected as the primary evaluation metric to [ 20 ]. In binary classification context, the MCC functions as an equivalent to the discrete version of the Pearson correlation coefficient and is interpreted in a similar manner [ 21 ]. In addition, to draw more robust conclusions on the predictive performance, we also included the Adjusted Rand Index (ARI) and the 1-Brier score [ 22 , 23 ]. Note that the MCC is considered more informative than both the ARI and the Brier Score in binary classification evaluations [ 24 ]. When regularization techniques were applied, the definitive metric score was established by identifying the highest performance value from among the three regularization techniques: Lasso, Ridge, and Elastic Net.

Cross-validation

To validate the performance of the predictive models, a k -fold cross-validation (CV) was used. Initially, the samples in a dataset were divided into k folds in a stratified manner based on the phenotypic labels. Subsequently, k  − 1 of these folds were used for training, while the remaining k th fold served as the test set. After completing the CV, the overall model performance was calculated using the aggregated predictive labels from all folds. CV was conducted in two different ways: first with k  = 10, resulting in a 90% training and 10% testing split; and second, in a reverse scheme with k  = 5, leading to a 20% training and 80% testing split, to examine the model's predictive behavior when trained on a smaller sample size.

Permutation test

A phenotype permutation test was conducted using 200 permutations to evaluate the significance of the predictive model's performance. Prior to carrying out the CV, the phenotypic labels were randomly shuffled. For each of these random reshufflings, a cross-validated performance metric was calculated using the predictive model, thereby generating a null distribution of the model's predictive performance. The statistical significance for testing the null hypothesis—that there is no association between the input predictors and the phenotypic label—was indicated by a p-value. This p-value is defined as the proportion of permutations yielding an performance metric equal or better than the metric obtained with non-permutated phenotypic labels, relative to the total number of permutations. The final reported performance metric is the mean value of metric across the 200 CV runs with the non-permutated phenotypic labels. We refer to this combination of CV and permutation testing as the CV-permutation test.

Regularization technique comparison

We conducted a paired samples Wilcoxon test to assess whether one regularization technique significantly outperforms the others when applied with specific dimensionality reduction methods. For each dimensionality reduction method, the performance difference between regularization techniques were tested across 30 datasets. The resulting p-values were subsequently adjusted using the Bonferroni correction for multiple testing.

Robustness of input predictors

To evaluate the robustness of predictor selection within the prediction models, we used a cross-validation approach focused specifically on Lasso regularization. Unlike Ridge and Elastic Net, Lasso has the ability to zero out predictors, that do not increase predictive performance. For each dataset, we conducted 20 CV runs, using stratified sampling to divide the data into ten folds for each run. During each CV run, we trained ten separate Lasso regression models on unique combinations of nine folds. We then identified predictors with non-zero coefficients in these Lasso models for further analysis. The robustness of these predictors was evaluated by examining all ten models from each of the 20 CV runs, culminating in a comprehensive assessment over 200 individual models. Due to the variable number of predictors across different representations, we used the proportion of occurrence for each predictor, calculated as the number of times a predictor had a non-zero coefficient divided by the total number of predictors that had a non-zero coefficient in at least one CV run. The robustness of the predictor selection was then quantified by calculating the area under the curve for the proportion of predictors relative to the number of runs (AUC of proportion). A higher AUC of proportion indicates greater robustness in predictor selection. Furthermore, robustness was also assessed utilized the pairs of independent datasets with identical phenotype classes. A Lasso model was trained separately on each dataset within a pair. We then calculated the robustness by determining the proportion of predictors that were selected by both Lasso models in the pair compared to all selected predictors.

Extensive compendium of transcriptomic profiles

To implement the transfer learning approach in our dimensionality reduction methods, we collected 139,786 transcriptomic profiles, which encompass measurements for 19,863 unique genes (GPL570 dataset). From this comprehensive dataset, we selected 30 studies that featured binary phenotypes to conduct predictive modeling. Among these 30 studies are five pairs of independent datasets, each investigating the same phenotype. The sample size in these selected studies ranged from 46 to 437, and they spanned a diverse array of phenotypes (see Additional file 1 : Table S3).

AE, AVAE, and c-ICA can effectively reduce ~ 140 K transcriptomic profiles to latent representations

We aimed to evaluate the efficacy of AE, AVAE, and c-ICA as dimensionality reduction methods in transforming the GPL570 dataset's transcriptomic profiles into lower dimensional latent representations, while minimizing information loss.

For the AE method, we chose the network configuration at epoch 540, based on its performance metrics with the validation set (see Additional file 1 : Fig. S3 for learning curves). When applied to the test set, the AE network displayed an MSE of 0.0965 and R-difference 95th of 0.0310. These low values indicate both accurate reconstruction of the transcriptomic profiles and retention of the gene-by-gene correlation structure. Validation test further confirmed that the AE network was not prone to overfitting (refer to Table  1 ). The trained AE network is available in Supplementary Data 1.

In the case of AVAE, we selected the network at epoch 7500. Compared to the AE network, the AVAE network displayed a slightly lower reconstruction performance, with an MSE of 0.1968 and an R-difference 95th of 0.0922. Also, for this network, no overfitting was observed (Table  1 ; Additional file 1 : Fig. S3). This relatively lower performance is attributed to the network's design, which aims to normalize the latent variables to a normal distribution, as indicated by a KL value of 0.1071. The performance metric values for the generator network showed that the latent space could be successfully used to generate new profiles with a gene-by-gene correlation structure similar to that observed in the profiles of the validation and test sets (validation R-difference 95th  = 0.1266). In this study, we only used the decoder network of the AVAE. The trained AVAE network is provided as Supplementary Data 2.

As for c-ICA, the method generated latent representations with 3286 latent variables, where a latent variable corresponds to the activity score of a TC. Its reconstruction performance, measured by an MSE of 0.3578 and an R-difference 95th of 0.2490, was lower compared to both AE and AVAE. This decrease in performance is due to c-ICA's stricter constraint of enforcing statistical independence among the latent variables. The c-ICA latent variables can be found in Supplementary Data 3.

In summary, our results indicate that AE, AVAE, and c-ICA are effective at reducing the dimensionality of transcriptomic profiles to latent representations, albeit with varying degrees of information loss.

A broad range of predictability of phenotypes across 30 datasets

We explored the predictive capabilities of our predictive models across a variety of biological phenotypes. In comparisons of predictive models with and without regularization, we observed mostly superior performance from models employing regularization, as evidenced by their MCC, ARI, and Brier Score metrics (see Fig.  2 A; Additional file 1 : Fig. S4). This effect was especially marked in datasets with smaller sample sizes, where models without regularization were more prone to overfitting (Fig.  2 B). Consequently, we are directing our focus towards the performance of prediction models that incorporate regularization. Our models' performance using the gene—level representations, as primarily assessed by the MCC, indicated a broad spectrum of phenotype predictability. For example, certain phenotypes like leukemia type (GSE131184 with an MCC of 0.97) and differentiation between normal and cancer tissues (GSE35570 with an MCC of 1.0 and GSE53757 with an MCC of 0.94) were highly predictable using gene-level representations. Conversely, other phenotypes such as renal transplant rejection (GSE36059 with an MCC of 0.54 and GSE48581 with an MCC of 0.45), and detection of Parkinson disease from blood samples (GSE99039 with an MCC of 0.29) exhibited lower predictability. Additional details of latent representations and predictive performances can be found in Supplementary Data 4 and Supplementary Data 5, respectively. The additional performance metrics we considered, ARI and Brier scores, demonstrated a strong correlation with MCC (Spearman's rho: ARI = 0.94, Brier score = 0.88), as shown in Additional file 1 : Fig. S5. They also mirrored the MCC in terms of variability in predictive performance. This variation in predictive performance metrics allows us to further investigate how different modeling strategies affect performance across various levels of phenotype predictability.

figure 2

Comparative analysis of predictive model performance with and without regularization. The performance of predictive models applied to gene-level data and their latent representations across 30 datasets is shown. The performance is displayed as the test data Matthew correlation coefficient (MCC) obtained through the CV-permutation test for both the predictive model with the best regularization technique and the model without regularization. The CV was executed with two distinct settings: A using 90% of the data samples for training and B utilizing 20% for training

Supervised dimensionality reduction: Promising predictive performance on training data but less reliable on truly independent datasets

To evaluate the capacity of latent representations from supervised dimensionality reduction methods to enhance predictive performance, we applied LOL and CCA to each dataset. These methods produced new predictors by incorporating the phenotype labels during the reduction process, which significantly improved the performance of subsequent regression models, as shown in Additional file 1 : Figs. S6–S8 A, B. However, this apparent high performance may be due to information leakage during the dimensionality reduction phase. Since phenotype labels were used to guide the reduction across all samples, potential bias in the cross-validation results may occur. This 'bleeding' of information between the training and test folds could lead to an overestimation of the true predictive performance of the models. This concern was substantiated when we evaluated the models on a truly independent dataset with the same phenotype. While in general, a performance drop was observed in the independent dataset, CCA managed to yield similar predictive performance as gene-level data (mean MCC difference of − 0.009). In contrast, the LOL latent representation underperformed, with a mean MCC difference of − 0.189 compared to the gene-level representation, suggesting a greater tendency for overfitting as evidenced by its lower performance on a truly independent dataset (as detailed in Fig.  3 B). This pattern was consistent for both the ARI and the Brier score (as detailed in Additional file 1 : Figs. S6–S8C). In essence, although predictive models utilizing these supervised dimensionality reduction techniques show promise on the datasets they were trained on, their performance may not be as reliable when applied to novel, independent datasets.

figure 3

Comparative predictive performance of latent representations and gene-level representation. A The Matthews correlation coefficient (MCC) for the most effective regularization technique across all 30 datasets is presented. The top row shows cross-validation performance with 90% of the data used for training, while the bottom row shows performance with only 20% used for training. The mean MCC difference (Δ) between gene-level and latent representations is indicated, with a negative Δ value signifying better performance of the gene-level representation. B The predictive performance of models with regularization trained on latent representations is displayed across five pairs of independent datasets. Within each pair, one dataset was used for training predictive models, and the paired dataset served as the test set for performance assessment

Dimensionality reduction methods PCA and c-ICA led to lower predictive performance

We examined the impact of using latent representations with fewer predictors generated through PCA or c-ICA on the performance of predictive models, compared to using gene-level representations. The mean MCC differences when using single PCA and single ICA latent representations compared to gene-level representations were − 0.135 and − 0.067, respectively (refer to Fig.  3 A). This indicated diminished performance for both methods, with PCA showing a more substantial decrease in performance relative to gene-level representations across almost all datasets. This pattern is similar for both the ARI and the Brier score (refer to Additional file 1 : Figs. S6–S8). To ascertain the statistical significance of these performances, we performed a CV-permutation test. The MCC scores were significant for most datasets across all three methods; however, PCA's MCC scores were not significant for 10 out of the 30 datasets (as shown in Fig.  3 A). In summary, our results suggest that employing PCA and c-ICA for dimensionality reduction did not enhance the predictive models' performance when used in conjunction with regularization techniques, and in some instances led to reduced performance.

Dimensionality reduction methods AE and c-ICA combined with transfer learning showed comparable predictive performance compared to models using gene-level data

We investigated the potential for enhanced predictive performance when employing latent representations derived from transfer learning approaches using AE, AVAE, and c-ICA, in comparison to using gene-level representations. Our analysis revealed that the mean MCC differences between the latent representations obtained through AE, AVAE, and GPL570 ICA and the gene-level representations were − 0.027, − 0.109, and − 0.025, respectively (as shown in Fig.  3 A). While AE and GPL570 ICA yielded slightly diminished performance relative to gene-level representations, the data suggest that these transfer learning approaches effectively preserve the essential information even when the number of predictors is reduced. Conversely, the AVAE latent representation captured less relevant information than its gene-level, AE latent, and GPL570 ICA latent counterparts. Significance results through CV-permutation test further validated that the observed performances were not obtained by chance. Consistent with our CV performance findings, we noted parallel outcomes across the five pairs of independent datasets (see Fig.  3 B). This consistency was also evident for both the ARI and Brier Score metrics (see Additional file 1 : Figs. S6–S8). Details on all dataset representations and CV-permutation test results can be found in Supplementary Data 4 and Supplementary Data 5, respectively. A pairwise comparison between all representations is illustrated in Additional file 1 : Fig. S9.

In summary, our results demonstrate that employing transfer learning approaches to project transcriptomic profiles into more compact, lower-dimensional representations succeeds in preserving the biological information relevant to specific phenotypes, all while reducing the number of predictors required in the models.

Dimensionality reduction methods are more effective for datasets with small sample sizes

We explored the efficacy of predictive models on datasets with smaller sample sizes, utilizing only 20% of the available samples, which ranged from 9 to 87 in number. In general, a notable decrease in predictive performance was observed, and a higher proportion of datasets failed to show significant predictability according to the CV-permutation test. Nevertheless, predictive models with datasets showcasing a more marked phenotypic divergence, such as comparisons between cancerous and healthy tissues (GSE35570, GSE53757), demonstrated more resilient performance (see Additional file 1 : Table S4). Importantly, the mean MCC differences between the gene-level and latent representations became more modest. This trend suggests that dimensionality reduction methods tend to perform comparably to gene-level representations in scenarios involving smaller sample sizes. Hence, our results imply that dimensionality reduction methods are more beneficial when dealing with datasets that have limited sample sizes compared to those with more extensive sample collections.

The predictive performance is more dependent on phenotype than on regularization techniques

In order to investigate whether there exists a 'best-fit' regularization technique for each type of data representation, we assessed the MCC difference between various regularization techniques across all 30 datasets. Our analyses revealed that, for most data representations, no single regularization technique consistently outperformed the others within each dataset (refer to Additional file 1 : Table S5 and Fig. S10). However, there were some specific cases: for the AE latent representation, Ridge was frequently the optimal choice; for PCA, both Lasso and Elastic Net tended to be more effective than Ridge; and for gene-level data, Elastic Net often surpassed Lasso. Importantly, we found that the effect of choosing a specific regularization technique was comparatively minor when set against the inherent predictability of the phenotype in each dataset.

Dimensionality reduction leads to more stable selection of input predictors

To assess whether dimensionality reduction methods lead to a more stable selection of input predictors, we calculated the AUC of proportion for each dataset across various reduction methods (refer to Fig.  4 A; Additional file 1 : Fig. S11 and Data S6). For example, in the GSE64951 dataset with the single ICA latent representation, the AUC of proportion was 0.97, suggesting that most input predictors were consistently selected across all runs. In contrast, the same dataset's gene-level representation had an AUC of proportion of only 0.15, despite having comparable MCC values (gene-level MCC = 0.43, single ICA latent MCC = 0.45). While variability existed across datasets, the single ICA latent representation generally exhibited the most robust selection of predictors (see Additional file 1 : Table S6). Conversely, the gene-level representation often demonstrated less consistency in predictor selection.

figure 4

Robustness of predictor selection across various representations. A The area under the curve (AUC) of proportion of selected predictors is displayed for each representation across all 30 datasets. The hinges of the boxes denote the second and third quartiles, and the whiskers extend by half of that interquartile range. The center of each box represents the median value. B The proportion of selected predictors for five pairs of independent datasets is displayed for each representation

Overall, as the number of runs increased, the proportion of selected predictors decreased. Nevertheless, certain key predictors were consistently chosen across the 200 models, highlighting their significance in the predictive models.

In analyses of predictor selection robustness for models trained on pairs of independent datasets, we noted a similar trend as described above (refer to Fig.  4 B). Models based on the supervised methods CCA and LOL exhibited low robustness in predictor selection. When these methods were used to train predictive models on new datasets, multiple predictors, including the primary discriminative ones, were frequently selected. This indicates that CCA and LOL might preferentially capture dataset-specific information as the main discriminative predictors and distribute phenotype-specific information among various predictors.

In this study, we utilized 30 transcriptomic datasets to determine the optimal combination of dimensionality reduction methods, transfer learning approaches, and regularization techniques to achieve the most effective predictive models. We found that predictive models trained directly on transcriptomic data, without employing dimensionality reduction (i.e., using gene-level representation), yielded the highest predictive performance across multiple metrics. However, the models utilizing the AE and GPL570 ICA latent representations of the datasets exhibited almost similar levels of performance and exhibited improved interpretability and robustness in predictor selection compared to models using gene-level representations.

Dimensionality reduction methods can effectively mitigate overfitting of predictive models by reducing the complexity of input data, eliminating noise and irrelevant information, and focusing on the most informative aspects [ 25 , 26 ]. Moreover, dimensionality reduction methods improve generalization by capturing the underlying structure or patterns in the data, enabling the model to better generalize to unseen instances and reducing the likelihood of overfitting to specific training examples [ 26 ].

While the abundance of predictors in gene-level representations presents a modeling challenge (i.e. overfitting), they inherently encapsulate the complete spectrum of phenotypic information necessary for accurate prediction. Our results indicate that predictive models trained with regularization techniques can already effectively extract the phenotypic information and mitigate overfitting, even when dealing with many potential predictors as in the gene-level representations. The combination of dimensionality reduction methods and regularization techniques did not yield further improvements in predictive performance. While both methods independently reduce the risk of overfitting, our results show that they do not necessarily enhance each other's ability to boost predictive performance.

The comparable performance observed between predictive models utilizing the gene-level representation and the latent representations obtained through GPL570 ICA or AE highlights the effectiveness of these transfer learning approaches combined with dimensionality reduction methods in capturing the phenotype-relevant information inherent in the gene-level data. This finding is consistent across a broad range of phenotype predictability, further emphasizing the robustness and utility of these methods. It is worth noting that GPL570 ICA exhibited a lower reconstruction performance compared to AE, which can be attributed to the less than 100% explained variance and the imposed statistical independence restriction on the TCs in GPL570 ICA. This suggests that excellent reconstruction performance is not necessarily a prerequisite for effectively capturing phenotype-relevant information in a latent representation.

While models using GPL570 ICA and AE representations have already demonstrated high predictive performance, there remains room for potential improvement. For instance, optimizing the AE network structure or increasing the explained variance to obtain more TCs from c-ICA could enhance performance, provided that computational resources permit. These optimizations could capture additional phenotype-relevant information within the latent representation. Such enhancements have the potential to further elevate the performance of models using these methods' representations, making their performance even more comparable to models using the gene-level representation.

Differences in predictive performance between dimensionality reduction methods may be due to the inherent characteristics and limitations of each method, which affect how the data is reduced in dimensionality. PCA reduces data complexity by creating new orthogonal axes that spread the data as much as possible, thereby capturing most of the observed variance with fewer new predictors. c-ICA reduces dimensionality with the constraint of statistical independence—stricter than orthogonality—which minimizes any shared information between new predictors. This minimized shared information may explain the higher predictive performance observed when c-ICA is used compared to PCA. Furthermore, c-ICA has been shown to reveal more subtle and biologically relevant patterns in the data compared to PCA, which is particularly useful in scenarios involving complex signals or mixtures (i.e., the bulk transcriptomic profiles generated from complex tissue biopsies used in this study) [ 27 ]. In the context of AEs, the AVAE regularizes the latent distribution to follow a Gaussian distribution, which has been shown to improve the generative ability and interpretability, but diminish the predictive performance compared to an AE without this regularization. Techniques like AEs and c-ICA can benefit from transfer learning, utilizing a diverse collection of samples to effectively uncover complex biological patterns, typically leading to more robust new predictors. Conversely, training AEs and c-ICA on a single dataset may only reflect the unique biological patterns within that dataset, yielding new predictors that are more specific for that dataset but potentially less generalizable to other datasets. It is also important to recognize that transfer learning approaches, while robust, are more demanding in terms of resources and might overlook unique biological details not represented in their extensive training sets. Transfer learning for the supervised dimensionality reduction is not feasible on the same scale as for unsupervised, as in the majority of samples the required labels used in the reduction process will not be available.

In addition to predictive performance, the interpretability of predictive models plays a crucial role in understanding the underlying phenotype-relevant biological processes. When dealing with phenotypes primarily driven by a limited number of genes, gene-level representation is appropriate as all the phenotype-relevant genes have a high chance of being selected as input predictors [ 28 ]. However, for phenotypes involving intricate gene interactions and diverse biological processes, the complexity can lead to variability in input predictor selection. In such cases, capturing this complexity with dimensionality reduction methods in a latent representation with a lower number of predictors offers the advantage of a more robust selection, resulting in higher generalizability and interpretability. For AE, various network interpretation methods can aid in identifying which genes have the most influence on each latent variable selected as an input predictor [ 29 ]. In c-ICA, each TC captures a statistically independent transcriptional pattern, often associated with a specific biological process, which can be identified using gene set enrichment analysis [ 30 ]. Thus, AE and c-ICA have the advantage of providing a more robust and interpretable latent representation, enabling a deeper understanding of the underlying biological mechanisms.

To our knowledge, our study is the most comprehensive comparative analysis with the aim to determine which optimal combination of dimensionality reduction method across supervised approaches, unsupervised approaches, transfer learning, and regularization techniques can enhance the predictive performance of models. Two other studies used transcriptomic data to investigate the impact of dimensionality reduction on the performance of predictive models [ 31 , 32 ]. One study used a single breast cancer dataset to show that the classification accuracy of a support vector machine (SVM) for estrogen receptor status decreased to varying degrees for several dimensionality reduction methods, including PCA [ 31 ]. The other study used a highly imbalanced dataset (498 cancer samples and 52 non-cancer samples) to analyze the impact of different dimensionality reduction methods (PCA, kernel PCA, and autoencoder) on machine learning models (neural network and SVM) used for cancer prediction (non-cancer versus cancer). The F-measures reported in their study reveal only marginal differences in performance [ 32 ]. In contrast to these studies, we used a cross-validation setup encapsulated within a permutation testing framework, analyzed 30 different transcriptomic datasets, included datasets with small sample sizes and phenotypes of varying degrees of predictability, and we utilized truly independent datasets for validation.

In conclusion, the results from this comprehensive comparative study indicate that when prioritizing predictive performance, utilizing gene-level data in predictive modeling with regularization techniques yields the best results. Dimensionality reduction with PCA or c-ICA on the dataset itself yielded suboptimal predictive performance. However, when combined with transfer learning, dimensionality reduction methods like c-ICA and AE showed predictive performance comparable to that of gene-level data. Additionally, these methods offered advantages in terms of predictor selection's reproducibility and interpretability.

Availability of data and materials

Microarray expression data were collected from the public Gene Expression Omnibus repository under the accession number GPL570 (generated using the Affymetrix HG-U133 Plus 2.0 platform). All Supplementary Data and codes required to reproduce the results are available at the following link: https://zenodo.org/records/10404690 .

Supplitt S, Karpinski P, Sasiadek M, et al. Current achievements and applications of transcriptomics in personalized cancer medicine. Int J Mol Sci. 2021;22:1422.

Article   CAS   PubMed   PubMed Central   Google Scholar  

Sirimongkolkasem T, Drikvandi R. On regularisation methods for analysis of high dimensional data. Ann Data Sci. 2019;6:737–63.

Article   Google Scholar  

Hotelling H. Analysis of a complex of statistical variables into principal components. J Educ Psychol. 1933;24:417–41.

Hyvärinen A, Oja E. Independent component analysis: algorithms and applications. Neural Netw. 2000;13:411–30.

Article   PubMed   Google Scholar  

Hinton GE, Salakhutdinov RR. Reducing the dimensionality of data with neural networks. Science. 2006;313:504–7.

Article   CAS   PubMed   Google Scholar  

Vogelstein JT, Bridgeford EW, et al. Supervised dimensionality reduction for big data. Nat Commun. 2021;12(1):2872.

Shin H, Eubank RL. Unit canonical correlations and high-dimensional discriminant analysis. J Stat Comput Simul. 2011;81:167–78.

Hanczar B, Bourgeais V, Zehraoui F. Assessment of deep learning and transfer learning for cancer prediction based on gene expression data. BMC Bioinform. 2022;23:262.

Article   CAS   Google Scholar  

Tibshirani R. Regression shrinkage and selection via the lasso. J R Stat Soc B Stat Methodol. 1996;58:267–88.

Hoerl AE, Kennard RW. Ridge regression: biased estimation for nonorthogonal problems. Technometrics. 1970;12:55–67.

Zou H, Hastie T. Regularization and variable selection via the elastic net. J R Stat Soc B Stat Methodol. 2005;67:301–20.

Barrett T, Wilhite SE, Ledoux P, et al. NCBI GEO: archive for functional genomics data sets–update. Nucleic Acids Res. 2013;41:D991-995.

Wright L. Ranger—a synergistic optimizer. GitHub repository; 2019.

Munjal P, Paul A, Krishnan N. Implicit discriminator in variational autoencoder. IJCNN. 2020;2020:1–8.

Google Scholar  

Way GP, Greene CS. Extracting a biologically relevant latent space from cancer transcriptomes with variational autoencoders. Pac Symp Biocomput. 2018;23:80–91.

PubMed   PubMed Central   Google Scholar  

Hershey JR, Olsen PA. Approximating the Kullback Leibler divergence between gaussian mixture models. In: ICASSP'07; 2007. p. IV-317–20.

Bhattacharya A, Bense RD, Urzúa-Traslaviña CG, et al. Transcriptional effects of copy number alterations in a large set of human cancers. Nat Commun. 2020;11:715.

Chiappetta P, Roubaud MC, Torrésani B. Blind source separation and the analysis of microarray data. J Comput Biol. 2004;11:1090–109.

Friedman JH, Hastie T, Tibshirani R. Regularization paths for generalized linear models via coordinate descent. J Stat Softw. 2010;33:1–22.

Article   PubMed   PubMed Central   Google Scholar  

Chicco D, Jurman G. The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genomics. 2020;21:6.

Powers DMW. Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation; 2020.

Hubert L, Arabie P. Comparing partitions. J Classif. 1985;2:193–218.

Brier GW. Verification of forecasts expressed in terms of probability. Mon Weather Rev. 1950;78(1):1–3.

Chicco D, Warrens MJ, Jurman G. The Matthews correlation coefficient (MCC) is more informative than Cohen’s kappa and Brier score in binary classification assessment. IEEE Access. 2021;9:78368–81.

Bengio Y, Courville A, Vincent P. Representation learning: a review and new perspectives. TPAMl. 2013;35:1798–828.

Jia W, Sun M, Lian J, Hou S. Feature dimensionality reduction: a review. Complex Intell Syst. 2022;8(3):2663–93.

Urzúa-Traslaviña CG, Leeuwenburgh VC, Bhattacharya A, et al. Improving gene function predictions using independent transcriptional components. Nat Commun. 2021;12:1464.

Ghosh D, Chinnaiyan AM. Classification and selection of biomarkers in genomic data using lasso. J Biomed Biotechnol. 2005;2005:147–54.

Hanczar B, Zehraoui F, Issa T, et al. Biological interpretation of deep neural network for phenotype prediction based on gene expression. BMC Bioinform. 2020;21:501.

Zhou W, Altman RB. Data-driven human transcriptomic modules determined by independent component analysis. BMC Bioinform. 2018;19:327.

Bartenhagen C, Klein HU, Ruckert C, et al. Comparative study of unsupervised dimension reduction techniques for the visualization of microarray gene expression data. BMC Bioinform. 2010;11:567.

Kabir MF, Chen T, Ludwig SA. A performance analysis of dimensionality reduction algorithms in machine learning models for cancer prediction. Healthc Anal. 2023;3: 100125.

Download references

Acknowledgements

We would like to express our thanks to Jeffrey Shu and Dylan Mezach for their contributions to the initial work of this project. We also thank the Center for Information Technology of the University of Groningen for their support and for providing access to the Hábrók high performance computing cluster.

This research was supported by a Hanarth Fund grant to R.S.N.F.

Author information

S. Rezaee Oshternian and S. Loipfinger have contributed equally to this work.

Authors and Affiliations

Department of Medical Oncology, University Medical Center Groningen, University of Groningen, P.O. Box 30.001, 9700 RB, Groningen, The Netherlands

S. R. Oshternian, S. Loipfinger, A. Bhattacharya & R. S. N. Fehrmann

You can also search for this author in PubMed   Google Scholar

Contributions

A.B. and R.S.N.F. conceived this study. S.R., S.L., A.B., and R.S.N.F. collected and assembled the data. S.R., S.L., A.B., and R.S.N.F. performed the data analysis, contributed to the interpretation of the data, wrote the paper, and made the final decision to submit the paper. All authors read and approved the final manuscript.

Corresponding author

Correspondence to R. S. N. Fehrmann .

Ethics declarations

Ethics approval and consent to participate.

Not applicable.

Consent for publication

Competing interests.

None of the authors have any conflict of interest.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1..

 Supplementary Tables, Supplementary Figures, and Supplementary Note.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Oshternian, S.R., Loipfinger, S., Bhattacharya, A. et al. Exploring combinations of dimensionality reduction, transfer learning, and regularization methods for predicting binary phenotypes with transcriptomic data. BMC Bioinformatics 25 , 167 (2024). https://doi.org/10.1186/s12859-024-05795-6

Download citation

Received : 29 September 2023

Accepted : 22 April 2024

Published : 26 April 2024

DOI : https://doi.org/10.1186/s12859-024-05795-6

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Dimensionality reduction
  • Transfer learning
  • Transcriptomic data
  • Independent component analysis

BMC Bioinformatics

ISSN: 1471-2105

case study of transfer learning

Transfer of Training, Trainee Attitudes and Best Practices in Training Design: a Multiple-Case Study

  • Organizational Training and Performance
  • Published: 05 February 2020
  • Volume 64 , pages 280–301, ( 2020 )

Cite this article

case study of transfer learning

  • Mohan Yang   ORCID: orcid.org/0000-0003-0856-0814 1 ,
  • Victoria L. Lowell 1 ,
  • Ahmad M. Talafha 2 &
  • Jonathan Harbor 3  

1433 Accesses

11 Citations

1 Altmetric

Explore all metrics

Transfer of training has not been perceived as successful for training programs in bringing about targeted knowledge, skills, and attitudes (KSA). The review of current literature revealed a lack of consensus of the nature of transfer and ambiguity of the role an individual’s attitude plays in the learning and transfer process. To provide insight into trainees’ attitude from a holistic view, we conducted a multiple-case study to investigate trainees’ learning and transfer experience in depth. The findings from five independent cases revealed high transfer rates of newly acquired KSA to teaching, and that trainees’ affective, cognitive, and behavioral attitudes are perceived to be closely related to their learning and transfer. Recommendations of training design practice such as experiential learning design and signature pedagogies across different disciplines are provided based on the training program design and the triangulated data resources.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price includes VAT (Russian Federation)

Instant access to the full article PDF.

Rent this article via DeepDyve

Institutional subscriptions

case study of transfer learning

Similar content being viewed by others

case study of transfer learning

Approaches to Teaching in Professional Training: a Qualitative Study

case study of transfer learning

Training transfer: a systematic review of the impact of inner setting factors

Beyond the tensions within transfer theories: implications for adaptive expertise in the health professions.

Ajzen, I. (1991). The theory of planned behavior. Organizational Behavior and Human Decision Processes, 50 , 179–211.

Google Scholar  

Ajzen, I., & Fishbein, M. (2000). Attitudes and the attitude-behavior relation: Reasoned and automatic processes. European Review of Social Psychology, 11 (1), 1–33.

Ajzen, I., & Fishbein, M. (2005). The influence of attitudes on behavior. In D. Albarracín, B. T. Johnson, & M. P. Zanna (Eds.), Handbook of attitude (pp. 173–221). Mahwah: Erlbaum.

Allport, G. W. (1935). Attitudes. In C. Murchison (Ed.), A handbook of social psychology . Worcester: Clark University Press.

ATD Research. (2018). 2018 state of the industry . Alexandria: ATD.

Baldwin, T., & Ford, J. K. (1988). Transfer of training: a review and directions for future research. Personnel Psychology, 41 (1), 63–105.

Bassi, L. J., & Van Buren, M. E. (1999). The ASTD state of the industry report. Training and Development, 53 , 1–22.

Bhatti, M. A., Ali, S., Mohd Isa, M. F., & Mohamed Battour, M. (2014). Training transfer and transfer motivation: the influence of individual, environmental, situational, training design, and affective reaction factors. Performance Improvement Quarterly, 27 (1), 51–82.

Bizjak, B., Knežević, M., & Cvetrežnik, S. (2011). Attitude change towards guests with disabilities. Reflections from tourism students. Annals of Tourism Research, 38 (3), 842–857.

Blume, B. D., Ford, J. K., Baldwin, T. T., & Huang, J. L. (2010). Transfer of training: A meta-analytic review. Journal of Management, 36 (4), 1065–1105.

Bohner, G., & Dickel, N. (2011). Attitudes and attitude change. Annual Review of Psychology, 62 (1), 391–417.

Broad, M. L. (1997). Overview of transfer of training: From learning to performance. Performance Improvement Quarterly, 10 (2), 7–21.

Burke, L. A., & Hutchins, H. M. (2007). Training transfer: an integrative literature review. Human Resource Development Review, 6 (3), 263–296.

Campbell, D. T. (1963). Social attitudes and other acquired behavioral dispositions. In S. Koch (Ed.), Psychology: A study of a science. Study II. Empirical substructure and relations with other sciences. Vol. 6. Investigations of man as socius: Their place in psychology and the social sciences (pp. 94–172). New York: McGraw-Hill.

Chauhan, R., Ghosh, P., Rai, A., & Kapoor, S. (2017). Improving transfer of training with transfer design: Does supervisor support moderate the relationship? Journal of Workplace Learning, 29 (4), 268–285.

Cheng, E. W. L. (2016). Maintaining the transfer of in-service teachers’ training in the workplace. Educational Psychology, 36 (3), 444–460.

Cheng, E. W. L., & Hampson, I. (2008). Transfer of training: a review and new insights. International Journal of Management Reviews, 10 (4), 327–341.

Cheng, E. W. L., Sanders, K., & Hampson, I. (2015). An intention-based model of transfer of training. Management Research Review, 38 (8), 908–928.

Chiaburu, D. S., & Marinova, S. V. (2005). What predicts skill transfer? An exploratory study of goal orientation, training self-efficacy and organizational supports. International Journal of Training and Development, 9 (2), 110–123.

Choi, M., & Roulston, K. (2015). Learning transfer in practice: a qualitative study of medical professionals’ perspectives. Human Resource Development Quarterly, 26 (3), 1–9.

Collins, A., Brown, J. S., & Newman, S. E. (1987). Cognitive apprenticeship: Teaching the craft of reading, writing and mathematics (Technical Report No . 403) . BBN Laboratories, Cambridge, MA. Centre for the Study of Reading, University of Illinois, Champaign.

Colquitt, J. A., LePine, J. A., & Noe, R. A. (2000). Toward an integrative theory of training motivation: a meta-analytic path analysis of 20 years of research. Journal of Applied Psychology, 85 (5), 678–707.

Conrey, F. R., & Smith, E. R. (2007). Attitude representation: Attitudes as patterns in a distributed, connectionist representational system. Social Cognition, 25 (5), 718–735.

Creswell, J. (2013). Qualitative inquiry and research design: Choosing among five approaches (3rd ed.). Thousand Oaks: Sage.

Creswell, J. (2014). Research design: Qualitative, quantitative, and mixed methods approaches (4th ed.). Thousand Oaks: Sage.

Donovan, P., & Darcy, D. P. (2011). Learning transfer: the views of practitioners in Ireland. International Journal of Training and Development, 15 (2), 121–139.

Eagly, A. H., & Chaiken, S. (1993). The psychology of attitudes . Orlando: Harcourt Brace Jovanovich College Publishers.

Elangovan, A. R., & Karakowsky, L. (1999). The role of trainee and environmental factors in transfer of training: an exploratory framework. Leadership & Organization Development Journal, 20 (5), 268–275.

Everett, D. R. (2010). Factors that influence transfer of learning from the online environment. Information Systems Education Journal, 8 (4), n4.

Facteau, J. D., Dobbins, G. H., Russell, J. E., Ladd, R. T., & Kudisch, J. D. (1995). The influence of general perceptions of the training environment on pre-training motivation and perceived training transfer. Journal of Management, 21 (1), 1–25.

Fazio, R. H. (2007). Attitudes as object-evaluation associations of varying strength. Social Cognition, 25 (5), 603–637.

Ford, J. K., & Weissbein, D. (1997). Transfer of training: an updated review and analysis. Performance Improvement Quarterly, 10 (2), 22–41.

Gagné, R. (1985). The conditions of learning (4th ed.). New York: Holt, Rinehart & Winston.

Gawronski, B., & Bodenhausen, G. V. (2007). Unraveling the processes underlying evaluation: Attitudes from the perspective of the APE model. Social Cognition, 25 (5), 687–717.

Gegenfurtner, A. (2013). Dimensions of motivation to transfer: A longitudinal analysis of their influence on retention, transfer, and attitude change. Vocations and Learning, 6 (2), 187–205 Retrieved from.

Gegenfurtner, A., Veermans, K., Festner, D., & Gruber, H. (2009). Motivation to transfer training: an integrative literature review. Human Resource Development Review, 8 (3), 403–423.

Georgenson, D. L. (1982). The problem of transfer calls for partnership. Training and Development , 75–78.

Gibbs, G. R. (2007). Analyzing qualitative data. In U. Flick (Ed.), The Sage qualitative research kit . Thousand Oaks: Sage.

Grossman, R., & Salas, E. (2011). The transfer of training: What really matters. International Journal of Training and Development, 15 (2), 103–120.

Gunawardena, C. N., Linder-VanBerschot, J. A., LaPointe, D. K., & Rao, L. (2010). Predictors of learner satisfaction and transfer of learning in a corporate online education program. American Journal of Distance Education, 24 (4), 207–226.

Holton, E. F., Bates, R. A., & Ruona, W. E. A. (2000). Development of a generalized learning transfer system inventory. Human Resource Development Quarterly, 11 (4), 333–360.

Hoyt, B. R. (2013). Predicting training transfer of new computer software skills: a research study comparing e-learning and in-class delivery. AURCO Journal, 19 , 132–161.

IBM. (2008). Unlocking the DNA of the adaptable workforce: The global human capital study 2008. https://robertoigarza.files.wordpress.com/2008/03/rep-unlocking-the-dna-of-the-adaptable-workforce-ibm-2008.pdf . Accessed 2 Sep 2018.

Jodlbauer, S., Selenko, E., Batinic, B., & Stiglbauer, B. (2012). The relationship between job dissatisfaction and training transfer. International Journal of Training and Development, 16 (1), 39–53.

Jones, S., Torres, V., & Arminio, J. (2014). Negotiating the complexities of qualitative research in higher education: Fundamental elements and issues (2nd ed.). New York: Routledge.

Joo, Y. J., Lim, K. Y., & Park, S. Y. (2011). Investigating the structural relationships among organisational support, learning flow, learners’ satisfaction and learning transfer in corporate e-learning. British Journal of Educational Technology, 42 (6), 973–984.

Kamradt, T. F., & Kamradt, E. J. (1999). Structured design for attitudinal instruction. In C. M. Reigeluth (Ed.), Instructional-design theories and models: A new paradigm of instructional theory (Vol. 2, pp. 563–590). Mahwah: Lawrence Erlbaum Associates.

Karl, K. A., & Ungsrithong, D. (1992). Effects of optimistic versus realistic previews of training programs on self-reported transfer of training. Human Resource Development Quarterly, 3 (4), 373–384.

Kirkpatrick, D. L., & Kirkpatrick, J. D. (2006). Evaluating training programs: the four levels (3rd ed.). San Francisco: Berrett-Koehler Publishers, Inc.

Kirkpatrick, J. D., & Kirkpatrick, W. K. (2016). For level of training evaluation . Alexandra: ATD Press.

Kolb, D. A. (1984). Experiential learning: Experience as the source of learning and development . Englewood Cliffs: Prentice.

Kontoghiorghes, C. (2002). Predicting motivation to learn and motivation to transfer learning back to the job in a service organization: a new systemic model for training effectiveness. Performance Improvement Quarterly, 15 (3), 114–129.

Kontoghiorghes, C. (2004). Reconceptualizing the learning transfer conceptual framework: Empirical validation of a new systemic model. International Journal of Training and Development, 8 (3), 210–221.

Lim, D. H., & Morris, M. L. (2006). Influence of trainee characteristics, instructional satisfaction, and organizational climate on perceived learning and training transfer. Human Resource Development Quarterly, 17 (1), 85–115.

Lincoln, Y., & Guba, E. (1985). Naturalistic inquiry . Beverly Hills: Sage.

Lionetti, P. (2012). Transfer of training: 1988–2011 with the practitioner in mind . ProQuest Dissertations and Theses. Pepperdine University.

Mansour, J. B., Naji, A., & Leclerc, A. (2017). The relationship between training satisfaction and the readiness to transfer learning: the mediating role of normative commitment. Sustainability, 9 , 834.

Mathieu, J. E., Tannenbaum, S. I., & Salas, E. (1992). Influences of individual and situational characteristics on measures of training effectiveness. Academy of Management Journal, 35 (4), 828–847.

Michalak, D. F. (1981). The neglected half of training. Training and Development Journal, 35 , 22–28.

Miller, L. (2013). ASTD 2013 state of the industry report: Workplace learning remains a key organizational investment. https://www.td.org/magazines/td-magazine/astd-2013-state-of-the-industry-report-workplace-learning-remains-a-key-organizational-investmen . Accessed 2 Sep 2018

Mosel, J. N. (1957). Why training programs fail to carry over. Personnel, 34 (3), 56–64.

Nadler, L. (1971). Support systems for training. Training and Development Journal, 25 (10), 2–7.

Nair, P. K. (2007). A path analysis of relationships among job stress, job satisfaction, motivation to transfer, and transfer of learning: Perceptions of occupational safety and health administration outreach trainers . College Station: Texas A&M University.

Noe, R. A., & Schmitt, N. (1986). The influence of trainee attitudes on training effectiveness: Test of a model. Personnel Psychology, 39 (3), 497–523.

Park, J. (2005). The relationship between computer attitude, usability, and transfer of training in e-learning settings . Champaign: University of Illinois at Urbana-Champaign.

Park, J., & Wentling, T. (2007). Factors associated with transfer of training in workplace e-learning. Journal of Workplace Learning, 19 (5), 311–329.

Park, Y., Lim, D. H., & Chang, J. (2017). Trainee versus supervisor assessment of training transfer: Mediational analysis of transfer variables. Asia Pacific Journal of Human Resources, 55 (2), 192–212.

Peters, S., Cossette, M., Bates, R., Holton, E., Hansez, I., & Faulx, D. (2014). The influence of transfer climate and job attitudes on the transfer process: Modeling the direct and indirect effects. Journal of Personnel Psychology, 13 (4), 157–166.

Petty, R. E., Wegener, D. T., & Fabrigar, L. R. (1997). Attitudes and attitude change. Annual Review of Psychology, 48 , 609–647.

Petty, R. E., Briñol, P., & DeMarree, K. G. (2007). The Meta–Cognitive Model (MCM) of attitudes: Implications for attitude measurement, change, and strength. Social Cognition, 25 (5), 657–686.

Robinson, D. G. (1996). President, Partners in Change. Presentation at the session International perspectives on learning and performance, Part 2, International Conference of the American Society for Training and Development, Orlando.

Rokeach, M., & McLellan, D. (1972). Feedback of information about the values and attitudes of self and others as determinants of long-term cognitive and behavioral change. Journal of Applied Social Psychology, 2 (3), 236–251.

Rosenberg, M. J., Hovland, C. I., McGuire, W. J., Abelson, R. P., & Brehm, J. W. (1960). Attitude organization and change: An analysis of consistency among attitude components , Yales studies in attitude and communication . Oxford: Yale University Press.

Ruona, W., Leimbach, M., Holton, E., & Bates, R. (2002). The relationship between learner utility reactions and predicted learning transfer among trainees. International Journal of Training and Development, 6 (4), 218–228.

Sahoo, M., & Mishra, S. (2019). Effects of trainee characteristics, training attitudes and training need analysis on motivation to transfer training. Management Research Review, 42 (2), 215–238.

Saks, A. M., & Belcourt, M. (2006). An investigation of training activities and transfer of training in organizations. Human Resource Management, 45 (4), 629–648. https://doi.org/10.1002/hrm .

Article   Google Scholar  

Saldaña, J. (2013). The coding manual for qualitative researchers (2nd ed.). Los Angeles: Sage.

Schwarz, N. (2007). Attitude construction: Evaluation in context. Social Cognition, 25 (5), 638–656.

Scourtoudis, L. D. R. M., & Dyke, L. (2007). Assessing the behavioural change and organizational outcomes resulting from management training. International Journal of Learning, 13 (10), 75–85.

Seyler, D. L., Holton, E. F., Bates, R. A., Burnett, M. F., & Carvalho, M. A. (1998). Factors affecting motivation to transfer training. International Journal of Training and Development, 2 (1), 2–16.

Shen, J., & Tang, C. (2018). How does training improve customer service quality? The roles of transfer of training and job satisfaction. European management journal, 36 (6), 708-716.

Shulman, L. S. (2005). Signature pedagogies in the professions. Daedalus, 134 (3), 52–59.

Simonson, M. R. (1979). Designing instruction for attitudinal outcomes. Journal of Instructional Development, 2 (3), 15.

Stolovitch, H. D. (1997). Introduction to the special issue on transfer of training-transfer of learning. Performance Improvement Quarterly, 10 (2), 5–6.

Strickland, O. J., Santiago, J., Fuller, S., & Dueñas, P. (2013). Training transfer behaviors: the roles of trainee confidence, knowledge, and work attitudes. Journal of Organizational Psychology, 13 (1), 11–20.

Tan, J. A., Hall, R. J., & Boyce, C. (2003). The role of employee reactions in predicting training effectiveness. Human Resource Development Quarterly, 14 (4), 397–411.

Thorndike, E. L., & Woodworth, R. S. (1901). The influence of improvement in one mental function upon the efficiency of other functions. (I). Psychological Review, 8 (3), 247–261.

Turab, G. M., & Gian, C. (2015). A model of the antecedents of training transfer. International Journal of Training Research, 13 (1), 82–95.

Velada, R., & Caetano, A. (2007). Training transfer: the mediating role of perception of learning. Journal of European Industrial Training, 31 (4), 283–296.

Visser, P. S., & Mirabile, R. R. (2004). Attitudes in the social context: the impact of social network composition on individual-level attitude strength. Journal of Personality and Social Psychology, 87 (6), 779–795.

Watson, S. L., Watson, W. R., Richardson, J., & Loizzo, J. (2016a). Instructor’s use of social presence, teaching presence, and attitudinal dissonance: a case study of an attitudinal change MOOC. The International Review of Research in Open and Distance Learning, 17 (3), 54–74.

Watson, W. R., Kim, W., & Watson, S. L. (2016b). Computers & education learning outcomes of a MOOC designed for attitudinal change: a case study of an animal behavior and welfare MOOC. Computers & Education, 96 , 83–93.

Wexley, K. N., & Latham, G. P. (2002). Developing and training human resources in organizations (3rd ed.). Upper Saddle River: Prentice Hall.

Williams, R. C., & Nafukho, F. M. (2015). Technical training evaluation revisited an exploratory, mixed-methods study. Performance Improvement Quarterly, 28 (1), 69–92.

Yin, R. (1994). Case study research: Design and methods , Applied social research methods series (Vol. 5, 2nd ed.). Thousand Oaks: Sage.

Yin, R. K. (2012). A (very) brief refresher on the case study method. In Applications of case study research (3rd ed., pp. 3–20). Thousand Oaks: Sage.

Yin, R. K. (2014). Analyzing case study evidence. In Case study research: Design and methods (pp. 133–175). Thousand Oaks: Sage.

Yin, R. K. (2017). Case study research and applications: Design and methods (6th ed.). Thousand Oaks: Sage.

Zimbardo, P., & Ebbesen, E. B. (1969). Influencing attitudes and changing behavior: a basic introduction to relevant methodology, theory, and applications . Reading: Addison-Wesley publishing company.

Zimbardo, P., Ebbesen, E., & Maslach, C. (1977). Influencing attitudes and changing behavior: An introduction to method, theory, and applications of social control and personal power (2d ed., Topics in social psychology) . Reading, Mass.: Addison-Wesley Pub.

Zumrah, A. R., & Boyle, S. (2015). The effects of perceived organizational support and job satisfaction on transfer of training. Personnel Review, 44 (2), 236–254.

Download references

Author information

Authors and affiliations.

Purdue University, BRNG 3134, 100 N. University Street, West Lafayette, IN, 47907, USA

Mohan Yang & Victoria L. Lowell

Western Michigan University, 3304 Everett Tower, Mail Stop 5152, Kalamazoo, MI, 49008, USA

Ahmad M. Talafha

University of Montana, Office of the Provost, 125 University Hall, 32 Campus Drive, Missoula, MT, 59812, USA

Jonathan Harbor

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Mohan Yang .

Ethics declarations

Disclosure of potential conflicts of interest.

This study was not funded. The authors declare that they have no conflict of interest.

Ethical Approval

All procedures performed in the study involving human participants were in accordance with the ethical standards of the Institutional Review Board (IRB) and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards.

Informed Consent

The signed informed consent was waived, however, a written statement about the research was provided to the participants before the interviews.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Yang, M., Lowell, V.L., Talafha, A. et al. Transfer of Training, Trainee Attitudes and Best Practices in Training Design: a Multiple-Case Study. TechTrends 64 , 280–301 (2020). https://doi.org/10.1007/s11528-019-00456-5

Download citation

Published : 05 February 2020

Issue Date : March 2020

DOI : https://doi.org/10.1007/s11528-019-00456-5

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Experiential learning
  • Cognitive apprenticeship
  • Learning transfer
  • Signature pedagogies
  • Training design
  • Training effectiveness
  • Transfer of training
  • Find a journal
  • Publish with us
  • Track your research

All Courses

  • Interview Questions
  • Free Courses
  • Career Guide
  • PGP in Data Science and Business Analytics
  • PG Program in Data Science and Business Analytics Classroom
  • PGP in Data Science and Engineering (Data Science Specialization)
  • PGP in Data Science and Engineering (Bootcamp)
  • PGP in Data Science & Engineering (Data Engineering Specialization)
  • Master of Data Science (Global) – Deakin University
  • MIT Data Science and Machine Learning Course Online
  • Master’s (MS) in Data Science Online Degree Programme
  • MTech in Data Science & Machine Learning by PES University
  • Data Analytics Essentials by UT Austin
  • Data Science & Business Analytics Program by McCombs School of Business
  • MTech In Big Data Analytics by SRM
  • M.Tech in Data Engineering Specialization by SRM University
  • M.Tech in Big Data Analytics by SRM University
  • PG in AI & Machine Learning Course
  • Weekend Classroom PG Program For AI & ML
  • AI for Leaders & Managers (PG Certificate Course)
  • Artificial Intelligence Course for School Students
  • IIIT Delhi: PG Diploma in Artificial Intelligence
  • Machine Learning PG Program
  • MIT No-Code AI and Machine Learning Course
  • Study Abroad: Masters Programs
  • MS in Information Science: Machine Learning From University of Arizon
  • SRM M Tech in AI and ML for Working Professionals Program
  • UT Austin Artificial Intelligence (AI) for Leaders & Managers
  • UT Austin Artificial Intelligence and Machine Learning Program Online
  • MS in Machine Learning
  • IIT Roorkee Full Stack Developer Course
  • IIT Madras Blockchain Course (Online Software Engineering)
  • IIIT Hyderabad Software Engg for Data Science Course (Comprehensive)
  • IIIT Hyderabad Software Engg for Data Science Course (Accelerated)
  • IIT Bombay UX Design Course – Online PG Certificate Program
  • Online MCA Degree Course by JAIN (Deemed-to-be University)
  • Cybersecurity PG Course
  • Online Post Graduate Executive Management Program
  • Product Management Course Online in India
  • NUS Future Leadership Program for Business Managers and Leaders
  • PES Executive MBA Degree Program for Working Professionals
  • Online BBA Degree Course by JAIN (Deemed-to-be University)
  • MBA in Digital Marketing or Data Science by JAIN (Deemed-to-be University)
  • Master of Business Administration- Shiva Nadar University
  • Post Graduate Diploma in Management (Online) by Great Lakes
  • Online MBA Programs
  • Cloud Computing PG Program by Great Lakes
  • University Programs
  • Stanford Design Thinking Course Online
  • Design Thinking : From Insights to Viability
  • PGP In Strategic Digital Marketing
  • Post Graduate Diploma in Management
  • Master of Business Administration Degree Program
  • MS in Business Analytics in USA
  • MS in Machine Learning in USA
  • Study MBA in Germany at FOM University
  • M.Sc in Big Data & Business Analytics in Germany
  • Study MBA in USA at Walsh College
  • MS Data Analytics
  • MS Artificial Intelligence and Machine Learning
  • MS in Data Analytics
  • Master of Business Administration (MBA)
  • MS in Information Science: Machine Learning
  • MS in Machine Learning Online
  • MIT Data Science Program
  • AI For Leaders Course
  • Data Science and Business Analytics Course
  • Cyber Security Course
  • PG Program Online Artificial Intelligence Machine Learning
  • PG Program Online Cloud Computing Course
  • Data Analytics Essentials Online Course
  • MIT Programa Ciencia De Dados Machine Learning
  • MIT Programa Ciencia De Datos Aprendizaje Automatico
  • Program PG Ciencia Datos Analitica Empresarial Curso Online
  • Mit Programa Ciencia De Datos Aprendizaje Automatico
  • Online Data Science Business Analytics Course
  • Online Ai Machine Learning Course
  • Online Full Stack Software Development Course
  • Online Cloud Computing Course
  • Cybersecurity Course Online
  • Online Data Analytics Essentials Course
  • Ai for Business Leaders Course
  • Mit Data Science Program
  • No Code Artificial Intelligence Machine Learning Program
  • MS Information Science Machine Learning University Arizona
  • Wharton Online Advanced Digital Marketing Program
  • What are the types of transfer learning?
  • What are some of the parameters that need to be adjusted to ensure optimal performance?
  • What are the steps to be followed for training?
  • Transfer Learning: Type 1
  • Transfer Learning: Type 2
  • Transfer Learning: Type 3

Computer Vision: A Case Study- Transfer Learning

The conclusion to the series on computer vision talks about the benefits of transfer learning and how anyone can train networks with reasonable accuracy. Usually, articles and tutorials on the web don’t include methods and hacks to improve accuracy. The aim of this article is to help you get the most information from one source. Stick on till the end to build your own classifier. 

The ImageNet moment was remarkable in computer vision and deep learning , as it created opportunities for people to reuse the knowledge procured through several hours or days of training with high-end GPUs. The different architectures can recognise over 20,000 classes of various objects and have achieved better accuracy than humans. How do we use this knowledge that scientists across the globe have gathered? The solution is transfer learning. Just as how a teacher teaches us class 8 mathematics which is built upon concepts learnt from classes 1-7, similarly, we can use the existing knowledge to suit our own needs. In this article, we will discuss transfer learning in its entirety and some common hacks that are required to increase the accuracy of outputs. Also, check out this computer vision essentials course and equip yourself with a hands on set of skills.

We will take an experimental approach with data, hyper-parameters and loss functions. Through the process of experimentation, we will discover the various techniques, concepts and hacks that would be helpful during the process of transfer learning. We will work with food-101 dataset that has 1000 images per class, and comprises 101 classes of food. 

We performed a series of experiments in every step of the training to identify the ideal loss, ideal hyper-parameters to achieve better results. The role of experimentation is to find out what works best according to the dataset. It requires this because not all datasets have the same features and type of data. Thus, a common approach for the same is to split the dataset into training, testing, and validation sets. The model is trained on the training set and then tested on the validation set to ensure overfitting/underfitting has not occurred. Once, we have a good score on both training and validation set; Only then do we expose our model to the test set. Thus, the validation set can be thought of as part of a dataset that is used to find the optimal conditions for best performance. 

Before we understand the parameters that need to be adjusted, let’s dive deep into transfer learning. Revise your concepts with Introduction to Transfer Learning .

  • Freeze Convolutional Base Model
  • Train selected top layers in the base model
  • Combination of steps a and b.

The convolutional base model refers to the original model architecture that we will use. It is a choice between using the entire model along with its weights, or freezing the model partially. In the first case, the initial weights are the model’s trained weights, and we will fine-tune all the layers according to our dataset. In the latter case, although the initial weights are the model’s pre-trained weights itself, the initial layers in the model are frozen. By freezing a layer, we are referring to the property of not updating the weights during training. This is to ensure that the number of trainable parameters is less. We freeze the initial layers as they identify low-level features such as edges, corners, and thus these features are independent of the dataset.  

  • Learning Rate
  • Model Architecture
  • Type of transfer learning
  • Optimisation technique
  • Regularisation 

We will consider a variety of experiments regarding the choice of optimiser, learning rate values, etc. We encourage readers to think of more ways to understand and implement. The experiments that have been performed are as follow:

1. Choice of optimiser

SGD with momentum update

  • SGD with Nesterov Momentum update

2. Learning Rate Scheduling

  • Same learning rate
  • Polynomial Decay #works well initially
  • Cyclical Learning Rate # used this finally

3. Model Selection

  • Resnet50 – Tried, but took massive amounts of time per epoch, hence didn’t proceed further
  • InceptionV3 – Stuck with this model and decreased image size to 96*96*3

4. Transfer Learning Type

  • Train selected the top layers in the base model
  • Combination of steps a and b. # This model worked well in increasing validation accuracy

5. Number of neurons and Dropout values

  • 128 – number of neurons + 0.5 – probability
  • 128 – number of neurons +0.25 – probability # Used this combination, as others increased the number of parameters massively.
  • 256 – number of neurons + 0.25 – probability
  • 256 – number of neurons + 0.5 – probability
  • 512 – number of neurons + 0.5 – probability
  • 512 – number of neurons + 0.25 – probability

6. GlobalAveragePooling2D vs GlobalMaxPooling2D

GlobalMaxPooling2D works better as a regularisation agent and also improves training accuracy when compared to GlobalAveragePooling2D. We did a comparison among the pooling techniques to study the role of pooling techniques as regularisation agent. 

Before starting a project, we should come up with an outline of the project deliverables and outcomes expected. Based on the conclusions made, list out the possible logical steps needed to be taken to complete the task.  

  • Define a model
  • Find ideal initial learning rate
  • Create a module for scheduling the learning rate

Augment the Images

  • Apply the transformation(mean subtraction) for better fine-tuning
  • Test on a smaller set
  • Fit the model
  • Test the model on random images
  • Visualise the kernels to validate if the training has been successful.

We will begin coding right away. We suggest you open your text editor or IDE and start coding as you read the blog. You can download the dataset from the official website, which can be found via a simple Google search: Food-101 dataset.

In the lines 1-32, we have imported all the libraries that will be required.

In lines 33-37, we define the parameters that will be used frequently within the article.

Line 38 loads the inception model with imagenet weights, to begin with, and include_top argument refers to the exclusion of the final layers as the model predicted 1000 classes, and we only have 101 classes.

Line 52 creates an ImageDataGenerator object, which is used to directly obtain images from a directory. It performs various operations on all the images in the directory mentioned. The operations mentioned here are normalisation, which is mentioned as the argument rescale = 1.0/255.0. The augmentation is done because CNNs are spatially invariant. If we rotate an image and send it to the network for prediction, the chances of mis-classification are high as the network hasn’t learned that during the training phase. Hence, augmentation leads to a better generalisation in learning. 

Line 53 and 54 similarly create ImageDataGenerator objects for loading images from test and validation directories, respectively. 

In lines 55-57, we specify the mean for the model which is used for the pre-processing of images. Mean-subtraction ensures that the model learns better. In Lines58-61, we load the data into respective variables. The next step is to find the ideal learning rate.

Let’s find the initial learning rate 

Model checkpoint refers to saving model after each round of training.

Early stopping is a technique to stop training if the decrease in loss value is negligible. We wait for a certain patience period, and then if the loss doesn’t decrease, we stop the training process.

The above snippet of code deals with the learning rate scheduling. Let’s talk about Learning Rate Scheduling:

Learning Rate Scheduling

Learning rate scheduling refers to making the learning rate adapt to the change in the loss values. Usually, the loss decreases its value until a certain epoch, when it stagnates. This is because the learning rate at that instant is very large comparatively, and thus, the optimisation isn’t able to reach the global optimum. Hence, the learning rate needs to be decreased. This tuning of the learning rate is necessary to get the lowest error percentage.

We have experimented with three types of learning rate scheduling techniques:

  • Polynomial decay
  • Cyclical learning rate scheduler

Polynomial decay, as the name suggests, decays the learning rate or step size polynomially, and step decay is decayed uniformly. Cyclical learning rate scheduler works by varying the learning rate between a minimum and a maximum range of values during the training process. It is to avoid local minimums. Usually, the cost functions are non-convex and it is desirable to get the global minimums.

We perform the same in Lines 62-88. To find the initial learning rate, we have used Adrian Rosebrock’s module from his tutorial on learning rate scheduling. For further insights into the topic, we suggest going through his blog on the same. 

Sanity Checks:

Overfit a tiny subset of data, to make sure the model fits the data, and make sure loss after first epoch is around -ln(1/n) as a safety metric. In this case n=101, hence, initial loss = 4.65

Since the loss value is nearly zero for the validation set without any regularisation method, the model is suitable to be fitted to a larger dataset. Overfitting occurs in the latter case, which can be administered by the use of dropouts and regularisers in the ultimate and penultimate layers.

As mentioned earlier, we are freezing the first few layers to ensure the number of trainable parameters are less.

Fit generator refers to model being trained and fit to the given dataset at hand.

In lines 110-130 we re-defined our model because this time we have frozen the first few layers and then proceeded with training. Lines 131-141 check if the model is overfitting or not.

The figure shows that the training accuracy is high, whereas the validation accuracy is low. Thus, applying regularisation techniques is necessary to avoid overfitting. We apply dropout to manage the same. 

Type 3 refers to the combination of both types of transfer learning, initially fine-tuning the entire network for a few epochs, and then freezing the top layers for next N number of epochs.

Cyclical Learning Rate

During training, the validation loss did not decrease irrespective of the variation in the initial learning rate. Hence, the logical assumption that can be made is that the cost function must have hit a local minimum, and to get it out of there, we use cyclical learning rate which performed much better than before.

Type of Transfer Learning Used

  • Type 1: Number of epochs: 180 epochs : Accuracy: 58.07 after 180 epochs
  • Type 2: Number of epochs: 100 epochs : Accuracy : 58.62 after 100 epochs
  • Type 3: Number of epochs: 150 epochs : Accuracy: 58.05 after 150 epochs

Thus, Type 2 is the most suitable type of transfer learning for this problem.

Optimiser Used

  • Polynomial Decay# works well initially

Model Selection

InceptionV3 – Used this model and decreased image size to 96*96*3

Transfer Learning Type

Combination of Type 1 and Type 2 models of transfer learning results in increasing the validation accuracy. The way to experiment with this would be to train the model with Type 1 for 50 epochs and then re-train with Type-2 transfer learning.

Number of neurons and Dropout values

b. 128 – number of neurons +0.25 – probability  #Used this combination, as others increased the number of parameters massively.

Some additional experiments that the user can do are try adding noise to images during the data augmentation phase to make the model independent of noise. 

We suggest the readers go through the entire article at least two times to get a thorough understanding of deep learning and computer vision, and the way it is implemented and used. We can go a step further and visualise the kernels to understand what is happening at a basic level. How are networks learning? The answer to that is: Kernels are smooth when the network has learned the classification right and are noisy and blurry when the classification learnt is wrong. We suggest the user figure out ways to visualise the kernels. It will add credibility and competence.

Please go through the entire series once, and then come back to this article, as it surely will get you a head start in computer vision, and we hope you gain the ability to understand and comprehend research papers in computer vision. 

If you wish to learn more about transfer learning and other computer vision concepts, upskill with Great Learning’s PG program in Artificial Intelligence and Machine Learning . If you want to only study machine learning concepts with a course of shorter duration, join Great Learning’s PG program in Machine Learning.

Avatar photo

Top Free Courses

Applicatiosn of Generative AI

Top 20 Generative AI Applications/ Use Cases Across Industries

Advantages and Disadvantages of Artificial Intelligence

Advantages and Disadvantages of Artificial Intelligence

Artificial Intelligence

Generative Artificial Intelligence Implications for Industry Experts

case study of transfer learning

Cómo la Ciencia de Datos y el Machine Learning están reinventando la Planeación de Eventos

no code AI

Unveiling the Potential of Artificial Intelligence Markup Language

Deepfake

Navigating the Deepfake Revolution: Implications for Gen AI, Bots, and More

Leave a comment cancel reply.

Your email address will not be published. Required fields are marked *

Save my name, email, and website in this browser for the next time I comment.

Great Learning Free Online Courses

Table of contents

  • Machine Learning Tutorial
  • Data Analysis Tutorial
  • Python - Data visualization tutorial
  • Machine Learning Projects
  • Machine Learning Interview Questions
  • Machine Learning Mathematics
  • Deep Learning Tutorial
  • Deep Learning Project
  • Deep Learning Interview Questions
  • Computer Vision Tutorial
  • Computer Vision Projects
  • NLP Project
  • NLP Interview Questions
  • Statistics with Python
  • 100 Days of Machine Learning

Getting Started with Machine Learning

  • An introduction to Machine Learning
  • Getting started with Machine Learning
  • What is Machine Learning?
  • Types of Machine Learning
  • Best Python libraries for Machine Learning
  • Difference Between Machine Learning and Artificial Intelligence
  • General steps to follow in a Machine Learning Problem

Data Preprocessing

  • ML | Introduction to Data in Machine Learning
  • ML | Understanding Data Processing
  • Python | Create Test DataSets using Sklearn
  • Generate Test Datasets for Machine learning
  • ML | Overview of Data Cleaning
  • One Hot Encoding in Machine Learning
  • ML | Dummy variable trap in Regression Models
  • What is Exploratory Data Analysis ?
  • ML | Feature Scaling - Part 1
  • Feature Engineering: Scaling, Normalization, and Standardization
  • Label Encoding in Python
  • ML | Handling Imbalanced Data with SMOTE and Near Miss Algorithm in Python

Classification & Regression

  • Ordinary Least Squares (OLS) using statsmodels
  • Linear Regression (Python Implementation)
  • ML | Multiple Linear Regression using Python
  • Polynomial Regression ( From Scratch using Python )
  • Implementation of Bayesian Regression
  • How to Perform Quantile Regression in Python
  • Isotonic Regression in Scikit Learn
  • Stepwise Regression in Python
  • Least Angle Regression (LARS)
  • Logistic Regression in Machine Learning
  • Understanding Activation Functions in Depth
  • Regularization in Machine Learning
  • Implementation of Lasso Regression From Scratch using Python
  • Implementation of Ridge Regression from Scratch using Python

K-Nearest Neighbors (KNN)

  • K-Nearest Neighbor(KNN) Algorithm
  • Implementation of Elastic Net Regression From Scratch
  • Brute Force Approach and its pros and cons
  • ML | Implementation of KNN classifier using Sklearn
  • Regression using k-Nearest Neighbors in R Programming

Support Vector Machines

  • Support Vector Machine (SVM) Algorithm
  • Classifying data using Support Vector Machines(SVMs) in Python
  • Support Vector Regression (SVR) using Linear and Non-Linear Kernels in Scikit Learn
  • Major Kernel Functions in Support Vector Machine (SVM)

Decision Tree

  • Python | Decision tree implementation
  • CART (Classification And Regression Tree) in Machine Learning
  • Decision Tree Classifiers in R Programming
  • Python | Decision Tree Regression using sklearn

Ensemble Learning

  • Ensemble Methods in Python
  • Random Forest Regression in Python
  • ML | Extra Tree Classifier for Feature Selection
  • Implementing the AdaBoost Algorithm From Scratch
  • Gradient Boosting in ML
  • CatBoost in Machine Learning
  • LightGBM (Light Gradient Boosting Machine)
  • Stacking in Machine Learning

Generative Model

  • ML | Naive Bayes Scratch Implementation using Python
  • Applying Multinomial Naive Bayes to NLP Problems
  • Gaussian Process Classification (GPC) on the XOR Dataset in Scikit Learn
  • Gaussian Discriminant Analysis
  • Quadratic Discriminant Analysis
  • Basic Understanding of Bayesian Belief Networks
  • Hidden Markov Model in Machine learning

Time Series Forecasting

  • Components of Time Series Data
  • AutoCorrelation
  • How to Check if Time Series Data is Stationary with Python?
  • How to Perform an Augmented Dickey-Fuller Test in R
  • How to calculate MOVING AVERAGE in a Pandas DataFrame?
  • Exponential Smoothing in R Programming
  • Python | ARIMA Model for Time Series Forecasting

Clustering Algorithm

  • K means Clustering - Introduction
  • Hierarchical Clustering in Machine Learning
  • Principal Component Analysis(PCA)
  • ML | T-distributed Stochastic Neighbor Embedding (t-SNE) Algorithm
  • DBSCAN Clustering in ML | Density based clustering
  • Spectral Clustering in Machine Learning
  • Gaussian Mixture Model
  • ML | Mean-Shift Clustering

Convolutional Neural Networks

  • Introduction to Convolution Neural Network
  • Image Classifier using CNN

What is Transfer Learning?

Recurrent neural networks.

  • Introduction to Recurrent Neural Network
  • Introduction to Natural Language Processing
  • NLP Sequencing
  • Bias-Variance Trade Off - Machine Learning

Reinforcement Learning

  • Reinforcement learning
  • Markov Decision Process
  • Q-Learning in Python
  • Deep Q-Learning
  • Natural Language Processing (NLP) Tutorial

Model Deployment and Productionization

  • Python | Build a REST API using Flask
  • How To Use Docker for Machine Learning?
  • Cloud Deployment Models

Advanced Topics

  • What is AutoML in Machine Learning?
  • Generative Adversarial Network (GAN)
  • Explanation of BERT Model - NLP
  • What is a Large Language Model (LLM)
  • Variational AutoEncoders
  • Transfer Learning with Fine-tuning
  • 100 Days of Machine Learning - A Complete Guide For Beginners
  • 100+ Machine Learning Projects with Source Code [2024]

We, humans, are very perfect at applying the transfer of knowledge between tasks. This means that whenever we encounter a new problem or a task, we recognize it and apply our relevant knowledge from our previous learning experiences. This makes our work easy and fast to finish. For instance, if you know how to ride a bicycle and if you are asked to ride a motorbike which you have never done before. In such a case, our experience with a bicycle will come into play and handle tasks like balancing the bike, steering, etc. This will make things easier compared to a complete beginner. Such learnings are very useful in real life as they make us more perfect and allow us to earn more experience. Following the same approach, a term was introduced Transfer Learning in the field of machine learning. This approach involves the use of knowledge that was learned in some tasks and applying it to solve the problem in the related target task. While most machine learning is designed to address a single task, the development of algorithms that facilitate transfer learning is a topic of ongoing interest in the machine-learning community. 

Transfer learning is a technique in machine learning where a model trained on one task is used as the starting point for a model on a second task. This can be useful when the second task is similar to the first task, or when there is limited data available for the second task. By using the learned features from the first task as a starting point, the model can learn more quickly and effectively on the second task. This can also help to prevent overfitting , as the model will have already learned general features that are likely to be useful in the second task.

Why do we need Transfer Learning?

Many deep neural networks trained on images have a curious phenomenon in common: in the early layers of the network, a deep learning model tries to learn a low level of features, like detecting edges, colours, variations of intensities, etc. Such kind of features appear not to be specific to a particular dataset or a task because no matter what type of image we are processing either for detecting a lion or car. In both cases, we have to detect these low-level features. All these features occur regardless of the exact cost function or image dataset. Thus, learning these features in one task of detecting lions can be used in other tasks like detecting humans.

How does Transfer Learning work?

This is a general summary of how transfer learning works:

  • Pre-trained Model: Start with a model that has previously been trained for a certain task using a large set of data. Frequently trained on extensive datasets, this model has identified general features and patterns relevant to numerous related jobs.
  • Base Model: The model that has been pre-trained is known as the base model. It is made up of layers that have utilized the incoming data to learn hierarchical feature representations.
  • Transfer Layers: In the pre-trained model, find a set of layers that capture generic information relevant to the new task as well as the previous one. Because they are prone to learning low-level information, these layers are frequently found near the top of the network.
  • Fine-tuning: Using the dataset from the new challenge to retrain the chosen layers. We define this procedure as fine-tuning. The goal is to preserve the knowledge from the pre-training while enabling the model to modify its parameters to better suit the demands of the current assignment.

The Block diagram is shown below as follows: 

Transfer Learning-Geeksforgeeks

Transfer Learning

Low-level features learned for task A should be beneficial for learning of model for task B.

This is what transfer learning is. Nowadays, it is very hard to see people training whole convolutional neural networks from scratch, and it is common to use a pre-trained model trained on a variety of images in a similar task, e.g models trained on ImageNet (1.2 million images with 1000 categories) and use features from them to solve a new task.  When dealing with transfer learning, we come across a phenomenon called the freezing of layers. A layer, it can be a CNN layer, hidden layer, a block of layers, or any subset of a set of all layers, is said to be fixed when it is no longer available to train. Hence, the weights of freeze layers will not be updated during training. While layers that are not frozen follows regular training procedure. When we use transfer learning in solving a problem, we select a pre-trained model as our base model. Now, there are two possible approaches to using knowledge from the pre-trained model. The first way is to freeze a few layers of the pre-trained model and train other layers on our new dataset for the new task. The second way is to make a new model, but also take out some features from the layers in the pre-trained model and use them in a newly created model. In both cases, we take out some of the learned features and try to train the rest of the model. This makes sure that the only feature that may be the same in both of the tasks is taken out from the pre-trained model, and the rest of the model is changed to fit the new dataset by training. 

Freezed and Trainable Layers:  

Freezed and Trainable Layers

Now, one may ask how to determine which layers we need to freeze, and which layers need to train. The answer is simple, the more you want to inherit features from a pre-trained model, the more you have to freeze layers. For instance, if the pre-trained model detects some flower species and we need to detect some new species. In such a case, a new dataset with new species contains a lot of features similar to the pre-trained model. Thus, we freeze less number of layers so that we can use most of its knowledge in a new model. Now, consider another case, if there is a pre-trained model which detects humans in images, and we want to use that knowledge to detect cars, in such a case where the dataset is entirely different, it is not good to freeze lots of layers because freezing a large number of layers will not only give low level features but also give high-level features like nose, eyes, etc which are useless for new dataset (car detection). Thus, we only copy low-level features from the base network and train the entire network on a new dataset.

Let’s consider all situations where the size and dataset of the target task vary from the base network.

  • The target dataset is small and similar to the base network dataset: Since the target dataset is small, that means we can fine-tune the pre-trained network with the target dataset. But this may lead to a problem of overfitting. Also, there may be some changes in the number of classes in the target task. So, in such a case we remove the fully connected layers from the end, maybe one or two, and add a new fully connected layer satisfying the number of new classes. Now, we freeze the rest of the model and only train newly added layers.
  • The target dataset is large and similar to the base training dataset: In such cases when the dataset is large, and it can hold a pre-trained model there will be no chance of overfitting. Here, also the last full-connected layer is removed, and a new fully-connected layer is added with the proper number of classes. Now, the entire model is trained on a new dataset. This makes sure to tune the model on a new large dataset keeping the model architecture the same.
  • The target dataset is small and different from the base network dataset: Since the target dataset is different, using high-level features of the pre-trained model will not be useful. In such a case, remove most of the layers from the end in a pre-trained model, and add new layers a satisfying number of classes in a new dataset. This way we can use low-level features from the pre-trained model and train the rest of the layers to fit a new dataset. Sometimes, it is beneficial to train the entire network after adding a new layer at the end.
  • The target dataset is large and different from the base network dataset: Since the target network is large and different, the best way is to remove the last layers from the pre-trained network and add layers with a satisfying number of classes, then train the entire network without freezing any layer.

Transfer learning is a very effective and fast way, to begin with, a problem. It gives the direction to move, and most of the time best results are also obtained by transfer learning. 

Below is the sample code using Keras for Transfer learning & fine-tuning with a custom training loop.   

Transfer Learning Implementations

Pre-requisites for implementing the code:.

Before going for the implementing the code you have to install some libraries given below:

TensorFlow is an open-source framework that is used for Machine Learning. It provides a range of functions to achieve complex functionalities with single lines of code.

Import the necessary libraries and functions

Import required libraries and the MNIST dataset, a dataset of handwritten digits often used for training and testing machine learning models.

Load and unpack the MNIST dataset into training and testing sets for images (x) and labels (y).

Convert class labels to one-hot encoded vectors for both training and testing sets.

Load a pretrained MobileNetV2 model without the classification layer

The code initializes a MobileNetV2 model using TensorFlow/Keras. It has an input shape of (224, 224, 3), excludes the top layers for feature extraction, and uses pre-trained ImageNet weights. This makes it suitable for tasks like transfer learning in image classification.

Add custom layers on top of the pre-trained model

Using TensorFlow’s Keras API, this code builds a convolutional neural network (CNN). Layers for reshaping, convolution, pooling, flattening, and fully connected operations are included. Dropout is used to achieve regularisation. Using softmax activation, the model, which is ideal for image classification like MNIST, generates class probabilities. The design achieves a compromise between feature extraction and categorization, allowing for successful learning and generalization.

Freeze the pretrained layers

It is essential that the convolutional base be frozen prior to model compilation and training. Freezing ( through layer setting).trainable = False) stops a layer’s weights from changing while it is being trained. Since there are numerous layers in MobileNet V2, all of them will be frozen if the trainable flag for the model is set to False.

Compile the model

It is essential to utilize a lower learning rate at this point because you are training a much larger model and want to readjust the pretrained weights. If not, your model may rapidly become overfit.

Custom training loop

This TensorFlow script trains a neural network over multiple epochs. It uses a training dataset ( train_dataset ) and a validation dataset ( val_dataset ). The training loop computes gradients of categorical cross-entropy loss and updates the model’s weights using the Adam optimizer. After each epoch, a validation loop calculates and prints the validation accuracy. There’s a minor correction: the optimizer is initialized outside the training loop for proper functionality.

Evaluate the model performance on test set

Advantages of transfer learning:.

  • Speed up the training process: By using a pre-trained model, the model can learn more quickly and effectively on the second task, as it already has a good understanding of the features and patterns in the data.
  • Better performance: Transfer learning can lead to better performance on the second task, as the model can leverage the knowledge it has gained from the first task.
  • Handling small datasets: When there is limited data available for the second task, transfer learning can help to prevent overfitting, as the model will have already learned general features that are likely to be useful in the second task.

Disadvantages of transfer learning:

  • Domain mismatch: The pre-trained model may not be well-suited to the second task if the two tasks are vastly different or the data distribution between the two tasks is very different.
  • Overfitting : Transfer learning can lead to overfitting if the model is fine-tuned too much on the second task, as it may learn task-specific features that do not generalize well to new data.
  • Complexity : The pre-trained model and the fine-tuning process can be computationally expensive and may require specialized hardware.

Frequently Asked Questions:

1. what is transfer learning.

In simple terms, transfer learning is a technique where we use a pre-trained model which are trained on a sufficient amount of dataset to perform on a similar task that is related to what it was trained on initially.

2. What is the role of transfer learning in natural language processing?

In Natural Language Processing (NLP), transfer learning refers to using huge text corpora with pre-trained language models. Subsequent NLP tasks benefit from these acquired characteristics by employing models such as BERT or GPT, which learn contextual representations. This method speeds up model training, decreases the requirement for large task-specific datasets, and enhances performance. In NLP, transfer learning has emerged as a key component that supports advances in sentiment analysis, text categorization, and language understanding, among other language-related tasks.

3. Define the transfer learning in layman terms?

Transfer learning in layman terms can be defined like if you learn to ride a bicycle, you can apply some of that skill when learning to ride a new type of bike. Similarly, in transfer learning, a pre-trained model is used as a starting point for a new task that shares similarities with the original task it was trained on.

4. What is fine-tuning in Transfer Learning?

Fine-tuning in transfer learning refers to the process of taking a pre-trained model on one task and further training it on a new, specific task. Initially, the model is trained on a large dataset for a general task, learning useful features. In fine-tuning, some layers of the pre-trained model may be kept frozen (not updated) to retain previously learned knowledge, while others are adjusted to adapt to the nuances of the new task. This approach is particularly beneficial when the new task has a smaller dataset, allowing the model to specialize without requiring extensive training from scratch.

5. What is freezed and trainable layers in Transfer learning?

In transfer learning, “frozen” layers refer to pre-trained layers whose weights remain fixed during subsequent training, preserving learned features. These layers are not updated to prevent loss of valuable knowledge. In contrast, “trainable” layers are those modified and fine-tuned on the new task, allowing the model to adapt to task-specific patterns. Freezing early layers, where generic features are learned, is common, while later layers may be fine-tuned. This strategy balances leveraging pre-existing knowledge and tailoring the model for specific tasks, optimizing performance when labeled data for the new task is limited.

Please Login to comment...

Similar reads.

  • Neural Network
  • Computer Subject
  • Machine Learning

advertisewithusBannerImg

Improve your Coding Skills with Practice

 alt=

What kind of Experience do you want to share?

  • Newsletter Sign Up

case study of transfer learning

The emergency preparedness framework was applied. This framework includes 4 actions: (1) mitigate, (2) prepare, (3) respond, and (4) recover. These actions can be repeated. Recommendations for how each action (1–4) can be applied to a portal transition are included in each blue quadrant of the circle

Sites could mitigate issues by first understanding which patients will be most affected by the transition, such as those who rely heavily on secure messaging. Reliable use of secure messaging within the VA facilitates positive patient-clinician relationships by providing a mechanism for efficient between-visit communication [ 20 , 21 , 22 , 23 ]. During the EHR transition, clinicians and staff became concerned about the well-being of patients from whom they weren’t receiving messages and those who depended on the portal to complete certain tasks. Since secure messaging is often initiated by patients to clinicians [ 23 ], clinicians will likely be unaware that messages are being missed. Understanding how and which patients currently use the portal and anticipating potential portal needs is a first step toward mitigating potential issues.

Despite efforts to inform Veterans of the EHR transition and patient portal [ 24 ] including information sent to a Veteran by email, direct mail, postings on VA websites, and a town hall, our findings agree with those of Fix and colleagues [ 10 ] and suggest that many Veterans were unprepared for the transition. Our findings suggest that end users heard that more is needed to improve the dissemination of knowledge about the transition and how to navigate the new patient portal to both VA employees and the patients they serve.

Preparations for the transition should prioritize providing VA clinicians and staff with updated information and resources on how to access and use the new portal [ 25 ]. VA clinicians deliver quality care to veterans and many VA employees are proud to serve the nation’s veterans and willing to go the extra mile to support their patients’ needs [ 26 ]. In this study, participants expressed feeling unprepared to assist or even respond to their patients’ questions and concerns about using the new portal. This unpreparedness contributed to increased clinician and staff stress, as they felt ill-equipped to help their patients with portal issues. Such experiences can negatively affect the patient-clinician relationship. Preparing clinicians and patients about an upcoming transition, including technical support for clinicians and patients, may help minimize these potential issues [ 10 , 27 ]. Specialized training about an impending transition, along with detailed instructions on how to gain access to the new system, and a dedicated portal helpline may be necessary to help patients better navigate the transition [ 23 , 28 ].

In addition to a dedicated helpline, our recommendations include responding to potential changes in needed veteran services during the transition. In our study, participants observed more veteran walk-ins due to challenges with the patient portal. Health systems need to anticipate and address this demand by expanding access to in-person services and fortifying other communication channels. For example, sites could use nurses to staff a walk-in clinic to handle increases in walk-in traffic and increase call center capacity to handle increases in telephone calls [ 29 ]. Increased use of walk-in clinics have received heightened attention as a promising strategy for meeting healthcare demands during the COVID-19 pandemic [ 30 ] and can potentially be adapted for meeting care-related needs during an EHR transition. These strategies can fill a gap in communication between clinicians and their patients while patients are learning to access and navigate a new electronic portal.

Finally, there is a need for a recovery mechanism to restore confidence in the reliability of the EHR and the well-being of clinicians and staff. Healthcare workers are experiencing unprecedented levels of stress [ 31 ]. A plan must be in place to improve and monitor the accuracy of data migrated, populated, and processed within the new system [ 2 ]. Knowing that portal function is monitored could help ease clinician and staff concerns and mitigate stress related to the transition.

Limitations

This study has several limitations. First, data collection relied on voluntary participation, which may introduce self-selection response bias. Second, this work was completed at one VA medical center that was the first site in the larger enterprise-wide transition, and experiences at other VAs or healthcare systems might differ substantially. Third, we did not interview veterans and relied entirely on secondhand accounts of patient experiences with the patient portal. Future research should include interviews with veterans during the transition and compare veteran and VA employee experiences.

Despite a current delay in the deployment of the new EHR at additional VA medical centers, findings from this study offer timely lessons that can ensure clinicians and staff are equipped to navigate challenges during the transition. The strategies presented in this paper could help maintain patient-clinician communication and improve veteran experience. Guided by the emergency preparedness framework, recommended strategies to address issues presented here include alerting those patients most affected by the EHR transition, being prepared to address patients’ concerns, increasing staffing for the help desk and walk-in care clinics, and monitoring the accuracy and reliability of the portal to provide assurance to healthcare workers that patients’ needs are being met. These strategies can inform change management at other VA medical centers that will soon undergo EHR transition and may have implications for other healthcare systems undergoing patient portal changes. Further work is needed to directly examine the perspectives of veterans using the portals, as well as the perspectives of both staff and patients in the growing number of healthcare systems beyond VA that are preparing for an EHR-to-EHR transition.

Data availability

Deidentified data analyzed for this study are available from the corresponding author on reasonable request.

Abbreviations

Electronic health record

Department of Veterans Affairs

VA Medical Centers

Department of Defense

Huang C, Koppel R, McGreevey JD 3rd, Craven CK, Schreiber R. Transitions from one Electronic Health record to another: challenges, pitfalls, and recommendations. Appl Clin Inf. 2020;11(5):742–54.

Article   Google Scholar  

Penrod LE. Electronic Health Record Transition considerations. PM R. 2017;9(5S):S13–8.

Article   PubMed   Google Scholar  

Cogan AM, Haltom TM, Shimada SL, Davila JA, McGinn BP, Fix GM. Understanding patients’ experiences during transitions from one electronic health record to another: a scoping review. PEC Innov. 2024;4:100258. https://doi.org/10.1016/j.pecinn.2024.100258 . PMID: 38327990; PMCID: PMC10847675.

Article   PubMed   PubMed Central   Google Scholar  

Powell KR. Patient-perceived facilitators of and barriers to Electronic Portal Use: a systematic review. Comput Inf Nurs. 2017;35(11):565–73.

Google Scholar  

Wilson-Stronks A, Lee KK, Cordero CL, et al. One size does not fit all: meeting the Health Care needs of diverse populations. Oakbrook Terrace, IL: The Joint Commission; 2008.

Carini E, Villani L, Pezzullo AM, Gentili A, Barbara A, Ricciardi W, Boccia S. The Impact of Digital Patient Portals on Health outcomes, System Efficiency, and patient attitudes: updated systematic literature review. J Med Internet Res. 2021;23(9):e26189.

Home -. My HealtheVet - My HealtheVet (va.gov).

Nazi KM, Turvey CL, Klein DM, Hogan TP. A decade of veteran voices: examining patient Portal Enhancements through the Lens of user-centered design. J Med Internet Res. 2018;20(7):e10413. https://doi.org/10.2196/10413 .

Tapuria A, Porat T, Kalra D, Dsouza G, Xiaohui S, Curcin V. Impact of patient access to their electronic health record: systematic review. Inf Health Soc Care. 2021;46:2.

Fix GM, Haltom TM, Cogan AM, et al. Understanding patients’ preferences and experiences during an Electronic Health Record Transition. J GEN INTERN MED. 2023. https://doi.org/10.1007/s11606-023-08338-6 .

Monturo C, Brockway C, Ginev A. Electronic Health Record Transition: the patient experience. CIN: Computers Inf Nurs. 2022;40:1.

Tian D, Hoehner CM, Woeltje KF, Luong L, Lane MA. Disrupted and restored patient experience with transition to New Electronic Health Record System. J Patient Exp. 2021;18:8.

Emergency management programs for healthcare facilities. the four phases of emergency management. US Department of Homeland Security website: https://www.hsdl.org/?view&did=765520 . Accessed 28 Aug 2023.

Ahlness EA, Orlander J, Brunner J, Cutrona SL, Kim B, Molloy-Paolillo BK, Rinne ST, Rucci J, Sayre G, Anderson E. Everything’s so Role-Specific: VA Employee Perspectives’ on Electronic Health Record (EHR) transition implications for roles and responsibilities. J Gen Intern Med. 2023;38(Suppl 4):991–8. Epub 2023 Oct 5. PMID: 37798577; PMCID: PMC10593626.

Rucci JM, Ball S, Brunner J, Moldestad M, Cutrona SL, Sayre G, Rinne S. Like one long battle: employee perspectives of the simultaneous impact of COVID-19 and an Electronic Health Record Transition. J Gen Intern Med. 2023;38(Suppl 4):1040–8. https://doi.org/10.1007/s11606-023-08284-3 . Epub 2023 Oct 5. PMID: 37798583; PMCID: PMC10593661.

Sayre G, Young J. Beyond open-ended questions: purposeful interview guide development to elicit rich, trustworthy data [videorecording]. Seattle (WA): VA Health Services Research & Development HSR&D Cyberseminars; 2018.

Averill JB. Matrix analysis as a complementary analytic strategy in qualitative inquiry. Qual Health Res. 2002;12:6855–66.

Elo S, Kyngäs H. The qualitative content analysis process. J Adv Nurs. 2008;62:1107–15.

Haun JN, Lind JD, Shimada SL, Simon SR. Evaluating Secure Messaging from the veteran perspective: informing the adoption and sustained use of a patient-driven communication platform. Ann Anthropol Pract. 2013;372:57–74.

Kittler AF, Carlson GL, Harris C, Lippincott M, Pizziferri L, Volk LA, et al. Primary care physician attitudes toward using a secure web-based portal designed to facilitate electronic communication with patients. Inf Prim Care. 2004;123:129–38.

Shimada SL, Petrakis BA, Rothendler JA, Zirkle M, Zhao S, Feng H, Fix GM, Ozkaynak M, Martin T, Johnson SA, Tulu B, Gordon HS, Simon SR, Woods SS. An analysis of patient-provider secure messaging at two Veterans Health Administration medical centers: message content and resolution through secure messaging. J Am Med Inf Assoc. 2017;24:5.

Jha AK, Perlin JB, Kizer KW, Dudley RA. Effect of the transformation of the Veterans Affairs Health Care System on the quality of care. N Engl J Med. 2003;348:22.

McAlearney AS, Walker DM, Gaughan A, Moffatt-Bruce S, Huerta TR. Helping patients be better patients: a qualitative study of perceptions about Inpatient Portal Use. Telemed J E Health. 2020;26:9.

https://www.myhealth.va.gov/mhv-portal-web/transitioning-to-my-va-health-learn-more .

Beagley L. Educating patients: understanding barriers, learning styles, and teaching techniques. J Perianesth Nurs. 2011;26:5.

Moldestad M, Stryczek KC, Haverhals L, Kenney R, Lee M, Ball S, et al. Competing demands: Scheduling challenges in being veteran-centric in the setting of Health System initiatives to Improve Access. Mil Med. 2021;186:11–2.

Adusumalli J, Bhagra A, Vitek S, Clark SD, Chon TY. Stress management in staff supporting electronic health record transitions: a novel approach. Explore (NY). 2021;17:6.

Heponiemi T, Gluschkoff K, Vehko T, Kaihlanen AM, Saranto K, Nissinen S, et al. Electronic Health Record implementations and Insufficient Training Endanger nurses’ Well-being: cross-sectional survey study. J Med Internet Res. 2021;23:12e27096.

Laurant M, van der Biezen M, Wijers N, Watananirun K, Kontopantelis E, van Vught AJ. Nurses as substitutes for doctors in primary care. Cochrane Database Syst Rev. 201;7(7):CD001271.

Elnahal S, Kadakia KT, Gondi S, How, U.S. Health systems Can Build Capacity to Handle Demand Surges. Harvard Business Review. 2021. https://hbr.org/2021/10/how-u-s-health-systems-can-build-capacity-to-handle-demand-surges/ Accessed 25 Nov 2022.

George RE, Lowe WA. Well-being and uncertainty in health care practice. Clin Teach. 2019;16:4.

Download references

Acknowledgments

We acknowledge and thank members of the EMPIRIC Evaluation qualitative and supporting team for their contributions to this work including Ellen Ahlness, PhD, Julian Brunner, PhD, Adena Cohen-Bearak, MPH, M.Ed, Leah Cubanski, BA, Christine Firestone, Bo Kim, PhD, Megan Moldestad, MS, and Rachel Smith. We greatly appreciate the staff at the Mann-Grandstaff VA Medical Center and associated community-based outpatient clinics for generously sharing of their time and experiences participating in this study during this challenging time.

The “EHRM Partnership Integrating Rapid Cycle Evaluation to Improve Cerner Implementation (EMPIRIC)” (PEC 20–168) work was supported by funding from the US Department of Veterans Affairs, Veterans Health Administration, Health Services Research & Development Quality Enhancement Research Initiative (QUERI) (PEC 20–168). The findings and conclusions in this article are those of the authors and do not necessarily reflect the views of the Veterans Health Administration, Veterans Affairs, or any participating health agency or funder.

Author information

Authors and affiliations.

VA Northeast Ohio Healthcare System, 10701 East Blvd., Research Service 151, 44106, Cleveland, OH, USA

Sherry L. Ball

Center for Healthcare Organization and Implementation Research, VA Boston Healthcare System, Boston, MA, USA

Bo Kim & Seppo T. Rinne

Department of Psychiatry, Harvard Medical School, Boston, MA, USA

Center for Healthcare Organization and Implementation Research, VA Bedford Healthcare System, Bedford, MA, USA

Sarah L. Cutrona & Brianne K. Molloy-Paolillo

Division of Health Informatics & Implementation Science, Department of Population and Quantitative Health Sciences, University of Massachusetts Chan Medical School, Worcester, MA, USA

Sarah L. Cutrona

Seattle-Denver Center of Innovation for Veteran-Centered and Value-Driven Care, VHA Puget Sound Health Care System, Seattle, WA, USA

Ellen Ahlness, Megan Moldestad & George Sayre

University of Washington School of Public Health, Seattle, WA, USA

George Sayre

Geisel School of Medicine at Dartmouth, Hannover, NH, USA

Seppo T. Rinne

You can also search for this author in PubMed   Google Scholar

Contributions

S.R. designed the larger study. G.S. was the qualitative methodologist who led the qualitative team. S.B., E.A., and M.M. created the interview guides and completed the interviews; Data analysis, data interpretation, and the initial manuscript draft were completed by S.B. and B.K. S.C. and B.M. worked with the qualitative team to finalize the analysis and edit and finalize the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Sherry L. Ball .

Ethics declarations

Ethics approval and consent to participate.

This evaluation was designated as non-research/quality improvement by the VA Bedford Healthcare System Institutional Review Board. All methods were carried out in accordance with local and national VA guidelines and regulations for quality improvement activities. This study included virtual interviews with participants via MS Teams. Employees volunteered to participate in interviews and verbal consent was obtained to record interviews. Study materials, including interview guides with verbal consent procedures, were reviewed and approved by labor unions and determined as non-research by the VA Bedford Healthcare System Institutional Review Board.

Consent for publication

Not applicable.

The findings and conclusions in this paper are those of the authors and do not necessarily represent the official position of the Department of Veterans Affairs.

Prior presentations

Ball S, Kim B, Moldestad M, Molloy-Paolillo B, Cubanski L, Cutrona S, Sayre G, and Rinne S. (2022, June). Electronic Health Record Transition: Providers’ Experiences with Frustrated Patients. Poster presentation at the 2022 AcademyHealth Annual Research Meeting. June 2022.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Ball, S.L., Kim, B., Cutrona, S.L. et al. Clinician and staff experiences with frustrated patients during an electronic health record transition: a qualitative case study. BMC Health Serv Res 24 , 535 (2024). https://doi.org/10.1186/s12913-024-10974-5

Download citation

Received : 29 August 2023

Accepted : 09 April 2024

Published : 26 April 2024

DOI : https://doi.org/10.1186/s12913-024-10974-5

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • EHR transition
  • Patient experience
  • Clinician experience
  • Qualitative analysis

BMC Health Services Research

ISSN: 1472-6963

case study of transfer learning

Cookies in use

Get the facts about transgender & non-binary athletes.

Transgender and non-binary people, in particular trans and non-binary student athletes, are under attack by politicians at all levels of government, as well as in the media. 2021 officially surpassed 2015 as the worst year for anti-LGBTQ+ legislation in recent history, with 27 anti-LGBTQ+ bills enacted. 2022 looks to be continuing this trend, with the first weeks of the year seeing anti-LGBTQ+ bills under consideration in state legislatures across the country. More than 55 directly target transgender girls and women in sports and would prevent them from playing on the team that aligns with their gender identity.

When reporting on issues related to transgender and gender non-confirming people, please use the Human Rights Campaign’s Brief Guide to Getting Transgender Coverage Right in order to ensure inclusive and accurate coverage. In addition, if you’re reporting on trans youth in sports, please find some facts and resources below.

Transgender and non-binary people have been under attack for years - the pivot to sports is yet another front in this fight.

For the last two decades, anti-LGBTQ+ politicians have attempted to sow disinformation about LGBTQ+ people’s rights to score cheap political points with their base. Transgender and non-binary people have been the target of many of these attacks, especially since the 2015 state legislative sessions. Then, so-called bathroom bills were a major focus and politicians lied about threats to women and girls’ safety that never materialized. Today, trans and non-binary youth are the target of these attacks, with baseless arguments about sports participation and misinformation about affirming healthcare access .

At least 35 of the more than 250 bills proposed in 2021 would unnecessarily regulate or prohibit transgender youth from being able to access best-practice, age-appropriate, gender-affirming medical care. So far, two bills of this kind have become law -- in Arkansas , when the state House and Senate overrode Governor Asa Hutchinson’s veto of House Bill 1570, which is aimed at denying medically-necessary, gender-affirming services to transgender youth, and in Tennessee , when Republican Governor Bill Lee signed SB 126, which unnecessarily regulates life-saving, best practice medical care for transgender youth. Such bills have proliferated in the last two years, despite no evidence that there has been any issue with youth receiving inappropriate care. In fact, these bills are opposed by organizations dedicated to children’s health, including the American Academy of Pediatrics, the American Association of Child and Adolescent Psychiatry, the National Association of Social Workers, and more.

These attacks on transgender youth generally, and trans athletes specifically, are fueled by discrimination, not facts . These bills represent a cruel effort to further stigmatize and discriminate against LGBTQ+ people across the country, specifically trans youth who simply want to live as their true selves and grow into who they are. After failing to prohibit trans and non-binary people’s access to restrooms, legislators have pivoted to using misinformation about sports as the next way to score political points.

Proponents of anti-trans sports bans are relying on stereotypes that have sexist implications.

While it may be true that a particular transgender youth has physical abilities that help them in the sport of their choice (like height, which is helpful in volleyball for instance), natural variations in physical characteristics are part of sports, especially at younger ages. Many of these bills would govern play at elementary and middle school as well as high school, when all youth’s bodies are undergoing tremendous change at significantly varying speeds. In other sports, a smaller physique might be to an athlete’s advantage. And, like all other youth, trans youth are short and tall, strong and not, fast and slow.

Breaking down these stereotypes also breaks down these arguments. Transgender girls are not new, and they’re not taking over girls’ sports. In fact, transgender youth:

are a small part of the overall population in schools, and only about half of trans youth identify as girls ( opponents don’t seem as interested in trans boys, who they assume will not be able to compete with cisgender boys - a sexist assumption),

just like other other youth, have varying interest in playing sports,

just like other youth, will have varying degrees of physical ability and attributes that may/may not lend themselves to success in the sport of their choice,

just like all other youth, will have varying degrees of seriousness and commitment to sports.

Transgender youth already face very high levels of discrimination, including in school.

Transgender youth experience all kinds of mistreatment (such as harassment, harsher discipline, or physical or sexual assault) because of their gender identity. There are many very real challenges that face transgender youth, including mistreatment in schools, family rejection, threats of physical violence, and other discrimination. Anti-trans sports bans risk further marginalizing young people who already face tremendous challenges in school. Proponents of these bans suggest that trans athletes are pretending to be trans in order to do well at sports - ignoring entirely the incredible stigma trans youth face.

Twenty states, the IOC, and the NCAA have allowed trans athletes to play sports for decades, with no problems.

If there was truly an existential issue with transgender athletes competing in sports, these bodies would be taking more specific steps to address it. What we’re actually seeing is the opposite - legislators using transgender youth as a culture war talking point are attempting to put into place bans that nobody is asking for. The NCAA’s recent change to remove non-discrimination protections from its constitution and defer to IOC guidelines was a political decision that abdicates their responsibility to protect student athletes, but it does not meaningfully change the current state of play: transgender students can play sports, with specific requirements in place for each sport.

Playing sports comes with well-known academic, emotional, mental, and social benefits. Transgender youth should not be shut off from these opportunities.

Playing sports helps young people maintain good physical health, build self-confidence and self-esteem, grow leadership skills, understand the value of teamwork, and much more, according to the President’s Council on Sports, Fitness, & Nutritional Science .

Numerous athletes at both the amateur and professional level have spoken out in support of their transgender teammates and competitors.

These athletes include Women’s World Cup champion soccer player Megan Rapinoe , tennis icon Billie Jean King , Stanford swimmer Brooke Forde , NBA star Dwyane Wade , Canadian soccer phenom Erin McLeod , WNBA star Napheesa Collier , and many more. Additionally, sports organizations like the Ivy League , the College Swimming & Diving Coaches Association of America (CSCAA), and others have spoken out publicly to defend trans people’s presence in sports.

For more information, please visit hrc.org/transgender , HRC’s Transgender and Non-Binary People FAQ and HRC’s Brief Guide to Getting Transgender Coverage Right .

Related Resources

Transgender

HRC’s Brief Guide to Reporting on Transgender Individuals

Understanding the transgender community.

Transgender, Health & Aging, Workplace

Debunking the Myths: Transgender Health and Well-Being

Love conquers hate., wear your pride this year..

100% of every HRC merchandise purchase fuels the fight for equality.

Choose a Location

  • Connecticut
  • District of Columbia
  • Massachusetts
  • Mississippi
  • New Hampshire
  • North Carolina
  • North Dakota
  • Pennsylvania
  • Puerto Rico
  • Rhode Island
  • South Carolina
  • South Dakota
  • West Virginia

Leaving Site

You are leaving hrc.org.

By clicking "GO" below, you will be directed to a website operated by the Human Rights Campaign Foundation, an independent 501(c)(3) entity.

More From Forbes

5 free digital marketing courses to study in 2024.

  • Share to Facebook
  • Share to Twitter
  • Share to Linkedin

A career within digital marketing is extremely lucrative and in high demand over the next few years

If you have expertise in digital and social media marketing, your skills could be worth $1.5 trillion by 2030 , according to Coursera's Job Skills of 2024 report. In fact, research from the U.S. Bureau of Labor Statistics reveals that marketing occupations are soaring in demand by as much as 6% yearly, which is faster than the average for job growth.

Digital marketing has rapidly accelerated the success of marketing and advertizing campaigns. With billions of social media users, and billions using the web to search for topics, answers to questions, conduct shopping, and loads more of everyday activities, digital marketing is full of potential and is certainly an in-demand skill worth investing in.

The investment in developing this skill, whether it is financial or time expenditure, is certainly proving to be well worth it. With a career in digital marketing, such as a digital marketing manager, you can expect to realize earnings of as much as $124,000. And of course, the more you learn and develop yourself professionally, and the more senior-level your role, the more you increase your earning potential.

Here are five free digital marketing courses you should seriously consider embarking on so you can launch a successful digital marketing career in 2024:

1. LinkedIn Learning

LinkedIn Learning has a learning path called "Master Digital Marketing," which contains courses taught and led by industry experts handpicked by LinkedIn. The course content within the learning path ranges from mastering SEO, to marketing on LinkedIn and other social media platforms such as TikTok, to even learning marketing with AR (augmented reality).

‘Challengers’ Reviews: Does Zendaya Tennis Movie Score With Critics?

Patriots select north carolina quarterback drake maye with no 3 pick in nfl draft, ‘baby reindeer’ star says real martha searches need to stop.

Although perhaps not 100% free in every sense of the word, courses on LinkedIn Learning are free with a 30-day free trial of LinkedIn Premium subscription. Otherwise, with a Premium subscription you can access all the courses without any extra cost.

2. Google Digital Garage

Google Digital Garage has now moved to Google Skillshop, part of the Grow With Google initiative, and it hosts a range of courses including Fundamentals of Digital Marketing, which are totally free of charge to you. After completing the course, you're able to gain a certificate which you can showcase on your LinkedIn profile and to employers.

3. PPC University

This website hosts a range of informative content, via courses and guides such as PPC 101, totally free of charge. These cover topics such as PCC (pay-per-click advertizing), and Facebook ads. Although it does not cover all aspects of digital marketing, this can be a helpful resource if you're seeking to zone in on PPC as a digital marketing skill.

Another free digital marketing course provider is HubSpot, which is already known to be a leader in the digital marketing industry. Their learning arm, HubSpot Academy, provides several courses for all levels from beginner to the more advanced, such as its Digital Marketing Course: Get Certified in Digital Marketing, which also comes with a free certificate.

Meta offers a range of courses and certifications, paid and free, for those aspiring to build digital marketing skills. One free course is the Meta Certified Digital Marketing Associate, offered via Meta Blueprint. It is essential to bear in mind that although the study materials are free, obtaining the certification may cost you.

Undertaking a digital marketing course, and even paying for the certificate if necessary, is well ... [+] worth the time and financial expenditure

All five of these are examples of what courses are available to you if you are passionate about upskilling (or reskilling as the case may be) and taking your professional development to the next level by pursuing a digital marketing career. Learning something new doesn't need to cost much. Through these five free digital marketing courses, you can prove your value to employers, become a trusted expert, and even work as a freelance digital marketing manager.

Rachel Wells

  • Editorial Standards
  • Reprints & Permissions

IMAGES

  1. A Case Study of Transfer of Learning

    case study of transfer learning

  2. Case study: transfer learning

    case study of transfer learning

  3. Transfer Learning in 2024: What It Is & How It Works

    case study of transfer learning

  4. What is Transfer Learning in Deep Learning? [Examples & Application]

    case study of transfer learning

  5. An Introduction to Transfer Learning in Machine Learning

    case study of transfer learning

  6. Transfer Learning with TensorFlow : Feature Extraction

    case study of transfer learning

VIDEO

  1. Webinar : Transfer Learning for Image and Text Classification

  2. Webinar : Transfer Learning for Image and Text Classification session2

  3. LECTURE 12

  4. EfficientML.ai Lecture 19: On-Device Training and Transfer Learning (MIT 6.5940, Fall 2023)

  5. 12 -Transfer Learning with TensorFlow

  6. CS 285: Lecture 22, Part 1: Transfer Learning & Meta-Learning

COMMENTS

  1. A Comprehensive Hands-on Guide to Transfer Learning with Real-World

    Let's explore some real-world case studies now and build some deep transfer learning models! Case Study 1: Image Classification with a Data Availability Constraint. In this simple case study, will be working on an image categorization problem with the constraint of having a very small number of training samples per category. The dataset for ...

  2. PDF Transfer learning: a friendly introduction

    transfer," "knowledge integration," "knowledge-based inductive bias learning," "super-vised learning," "meta-learning," and "semi-supervised learning" [7]. Among such, the 3, multi-task learning model is seen to have a strong learning strategy that is similar to TL because both learning models strive to learn multiple ...

  3. Transfer Learning Guide: A Practical Tutorial With Examples for Images

    Case study How Neptune gave Waabi organization-wide visibility on experiment data. Case study How Elevatus uses Neptune to check experiment results in under 1 minute. See all case studies. ... Now, this is specific to transfer learning in natural language processing. First, let's download the pre-trained word embeddings. ...

  4. Transfer learning: a friendly introduction

    Transfer learning (TL), one of the categories under ML, has received much attention from the research communities in the past few years. Traditional ML algorithms perform under the assumption that a model uses limited data distribution to train and test samples. ... Inductive learning—case studies on multi-task learning and self-learning ...

  5. PDF 1 A Comprehensive Survey on Transfer Learning

    domains, transfer learning can be further divided into two categories, i.e., homogeneous and heterogeneous transfer learning [4]. Homogeneous transfer learning approachesare developed and proposed for handling the situations where the domains are of the same feature space. In homogeneous transfer learning, some studies assume that domains differ

  6. [2103.03166] Contrastive Learning Meets Transfer Learning: A Case Study

    Contrastive Learning Meets Transfer Learning: A Case Study In Medical Image Analysis. Yuzhe Lu, Aadarsh Jha, Yuankai Huo. Annotated medical images are typically rarer than labeled natural images since they are limited by domain knowledge and privacy constraints. Recent advances in transfer and contrastive learning have provided effective ...

  7. PDF Federated Transfer Learning: concept and applications

    Federated transfer learning is a special case of federated learning and different from both horizontal and vertical federated learning. In federated transfer learning, two datasets differ in the feature space. This applies to datasets collected from enterprises of different but similar nature.

  8. A conceptual study of transfer learning with linear models for data

    3.1.2.Case 2: transfer learning assuming P 1 ⊇ P 2 but P 1 is unknown. We demonstrated the effectiveness of transfer learning where we know the important features a priori. In reality, such information may not be readily available and P 1 needs to be estimated via unsupervised or supervised learning. Here, we use LASSO on ρ 1 to generate a model M 1 and thereby identify the important ...

  9. Transfer Learning: Leveraging Trained Models on Novel Tasks

    According to the availability of labeled and unlabeled data in the source domain, two case studies arise in the inductive transfer learning setting. Case 1. In this case, the source domain is unavailable for labeled data. So, the inductive transfer learning setting works like a self-taught learning setting, addressed by Raina et al. . As the ...

  10. Tackling data scarcity with transfer learning: a case study of

    Transfer learning (TL) increasingly becomes an important tool in handling data scarcity, especially when applying machine learning (ML) to novel materials science problems. In autonomous workflows to optimize optoelectronic thin films, high-throughput thickness characterization is often required as a downstr

  11. PDF Case-Based Reasoning in Transfer Learning

    Transfer Learning and Case-Based Reasoning . Research on machine learning has traditionally focused on tasks in which examples are repeatedly and i. ndependently drawn from an i. dentical d. istribution (i.i.d.) (Simon, 1983). This simplifying assumption is the basis of most ML research to date, as well as most research on case-based learning.

  12. Attitudinal Influences on Transfer of Training: A Systematic Literature

    A multiple-case study showed trainees' attitudes, including their affective reaction, cognitive perception, and behavioral response, ... transfer model, and Holton et al.'s Learning Transfer System Inventory. Among those existing studies, most focused on singular aspects of attitude, rather than from a comprehensive perspective, incorporating ...

  13. A case study on transfer learning in convolutional neural networks

    In this work, a case study is performed on transfer learning approach in convolutional neural networks. Transfer learning parameters are examined on AlexNet, VGGNet and ResNet architectures for marine vessel classification task on MARVEL dataset. The results confirmed that transferring the parameter values of the first layers and fine-tuning the other layers, whose weights are initialized from ...

  14. Transfer learning in environmental remote sensing

    Prior shift: in the case of prior shift, the conditional distributions have high similarity but the prior distributions of the label space in the source domain and target domain are different, i.e., p s y x ≈ p t y x and p s y ≠ p t y (Fig. 2).Prior shift happens when the label spaces are different in the source and target domains. For instance, in landcover classification, the source ...

  15. Exploring combinations of dimensionality reduction, transfer learning

    To our knowledge, our study is the most comprehensive comparative analysis with the aim to determine which optimal combination of dimensionality reduction method across supervised approaches, unsupervised approaches, transfer learning, and regularization techniques can enhance the predictive performance of models.

  16. Towards a Better Understanding of Transfer Learning for Medical Imaging

    One of the main challenges of employing deep learning models in the field of medicine is a lack of training data due to difficulty in collecting and labeling data, which needs to be performed by experts. To overcome this drawback, transfer learning (TL) has been utilized to solve several medical imaging tasks using pre-trained state-of-the-art models from the ImageNet dataset. However, there ...

  17. PDF A Study of Transfer Learning Methods within Natural Language Processing

    Transfer learning is a set of methods used to overcome the isolated learning paradigm by utilizing knowledge acquired for one task to solve related ones. By leveraging data from additional domains or tasks, models are able to generalize better and transfer knowledge between tasks.

  18. PDF Transfer of Training, Trainee Attitudes and Best Practices ...

    To provide insight into trainees attitude from a holistic view, we. conducted a multiple-case study to investigate trainees learning and transfer experience in depth. The findings from five. behavioral attitudes are perceived to be closely related to their learning and transfer.

  19. Transfer Learning in Computer Vision a case Study

    Computer Vision: A Case Study- Transfer Learning. The conclusion to the series on computer vision talks about the benefits of transfer learning and how anyone can train networks with reasonable accuracy. Usually, articles and tutorials on the web don't include methods and hacks to improve accuracy.

  20. What is Transfer Learning?

    Transfer learning is a very effective and fast way, to begin with, a problem. It gives the direction to move, and most of the time best results are also obtained by transfer learning. Below is the sample code using Keras for Transfer learning & fine-tuning with a custom training loop. Transfer Learning Implementations

  21. Case Studies

    Request a demo. We have a number of other case studies created through our partnerships with learning providers. Our partners prefer that we don't share these widely, however if you want to see a case study more relatable to your industry, then contact us so we can share them with you.

  22. Sustainability

    To further address the aforementioned issues, this study proposes a sustainable strategy for the cross-cultural transfer and adaptation of active learning teaching methods, within the context of a degree-level Sino-foreign cooperative education program. ... , and taking its 'Electro-Mechanical Systems' course as a case study, a PBL-centered ...

  23. Modeling User Engagement on Online Social Platforms

    This dissertation examines the predictability of user engagement on online social platforms by integrating theoretical perspectives from the literature on media and technology habits with principles of context-aware computing. It presents three studies, each targeting a different facet of technology-mediated communication, from social media use in general to more granular behaviors like active ...

  24. A Cross-City Federated Transfer Learning Framework: A Case Study on

    Data insufficiency problems (i.e., data missing and label scarcity) caused by inadequate services and infrastructures or imbalanced development levels of cities have seriously affected the urban computing tasks in real scenarios. Prior transfer learning methods inspire an elegant solution to the data insufficiency, but are only concerned with one kind of insufficiency issue and fail to give ...

  25. Clinician and staff experiences with frustrated patients during an

    Through the lens of an emergency preparedness framework, we examined clinician and staff reactions to and perceptions of their patients' experiences with the portal during an EHR transition at the Department of Veterans Affairs (VA). This qualitative case study was situated within a larger multi-methods evaluation of the EHR transition.

  26. Land

    As a case study, the automated machine learning method was applied to predict the spatial distribution of soil subgroups in Heshan farm. A total of 110 soil samples and 10 terrain variables were utilized in the designed experiments. To evaluate the performance, the proposed method was compared to each machine learning method with default ...

  27. Get the Facts about Transgender & Non-Binary Athletes

    Transgender and non-binary people, in particular trans and non-binary student athletes, are under attack by politicians at all levels of government, as well as in the media. 2021 officially surpassed 2015 as the worst year for anti-LGBTQ+ legislation in recent history, with 27 anti-LGBTQ+ bills enacted. 2022 looks to be continuing this trend ...

  28. [2312.11880] Point Cloud Segmentation Using Transfer Learning with

    Urban environments are characterized by complex structures and diverse features, making accurate segmentation of point cloud data a challenging task. This paper presents a comprehensive study on the application of RandLA-Net, a state-of-the-art neural network architecture, for the 3D segmentation of large-scale point cloud data in urban areas. The study focuses on three major Chinese cities ...

  29. Novel Automatic Classification of Human Adult Lung Alveolar ...

    Unlike previous studies, this novel study aims to automatically differentiate between healthy and infected AT2 cells with SARS-CoV-2 through using efficient AI-based models, which can aid in disease control and treatment. Therefore, we introduce a highly accurate deep transfer learning (DTL) approach that works as follows.

  30. 5 Free Digital Marketing Courses To Study In 2024

    Here are five free digital marketing courses you should seriously consider embarking on so you can launch a successful digital marketing career in 2024: 1. LinkedIn Learning. LinkedIn Learning has ...