• Open access
  • Published: 12 November 2020

Deep learning accelerators: a case study with MAESTRO

  • Hamidreza Bolhasani   ORCID: orcid.org/0000-0003-0698-6141 1 &
  • Somayyeh Jafarali Jassbi 1  

Journal of Big Data volume  7 , Article number:  100 ( 2020 ) Cite this article

7819 Accesses

6 Citations

25 Altmetric

Metrics details

In recent years, deep learning has become one of the most important topics in computer sciences. Deep learning is a growing trend in the edge of technology and its applications are now seen in many aspects of our life such as object detection, speech recognition, natural language processing, etc. Currently, almost all major sciences and technologies are benefiting from the advantages of deep learning such as high accuracy, speed and flexibility. Therefore, any efforts in improving performance of related techniques is valuable. Deep learning accelerators are considered as hardware architecture, which are designed and optimized for increasing speed, efficiency and accuracy of computers that are running deep learning algorithms. In this paper, after reviewing some backgrounds on deep learning, a well-known accelerator architecture named MAERI (Multiply-Accumulate Engine with Reconfigurable interconnects) is investigated. Performance of a deep learning task is measured and compared in two different data flow strategies: NLR (No Local Reuse) and NVDLA (NVIDIA Deep Learning Accelerator), using an open source tool called MAESTRO (Modeling Accelerator Efficiency via Spatio-Temporal Resource Occupancy). Measured performance indicators of novel optimized architecture, NVDLA shows higher L1 and L2 computation reuse, and lower total runtime (cycles) in comparison to the other one.

Introduction

The main idea of neural networks (NN) is based on biological neural system structure, which consists of several connected elements named neurons [ 1 ]. In biological systems, neurons get signals from dendrites and pass them to the next neurons via axon as shown in Fig.  1 .

figure 1

Typical biological neurons [ 20 ]

Neural networks are made up of artificial neurons for handling brain tasks like learning, recognition and optimization. In this structure, the nodes are neurons, links can be considered as synapses and biases as activation thresholds [ 2 ]. Each layer extracts some information related to the features and forwards them with a weight to the next layer. Output is the sum of all these information gains multiplied by their related weights. Figure  2 represents a simple artificial neural network structure.

figure 2

Simple artificial neural network structure

Deep neural networks are complex artificial neural networks with more than two layers. Nowadays, these networks are widely used for several scientific and industrial purposes such as visual object detection, segmentation, image classification, speech recognition, natural language processing, genomics, drug discovery, and many other areas [ 3 ].

Deep learning is a new subset of machine learning including algorithms that are used for learning concepts in different levels, utilizing artificial neural networks [ 4 ].

As Fig.  3 shows, if each neuron and its weight are represented by X i and W i j respectively, the output result (Y j ) would be:

figure 3

A typical deep neural network structure

where \(\sigma\) is the activation function. A popular function that is used for activation in deep neural networks is ReLU (Rectified Linear Unit) function, which is defined in Eq. ( 2 ).

Leaky ReLU, tanhh and Sigmoid functions are some other activation functions with less frequent usage [ 5 ].

As shown in Fig.  4 , each layer of a deep neural network’s role is to extract some features and send them to the next layer with its corresponding weight. For example, in the first layer, color properties (green, red blue) are gained; in the next layer, edge of objects are determined and so on.

figure 4

Deep learning setup for object detection [ 21 ]

Convolutional neural networks are a type of deep neural networks that is mostly used for recognition, mining and synthesis applications like face detection, handwritting recognition and natural language processing [ 6 ]. Since parallel computations is an unavoidable part of CNNs, several efforts and research works have been done for designing an optimized hardware for it. As a result, many application-specific integrated circuits (ASICs) as hardware accelerators have been introduced and evaluated in the recent decade [ 7 ]. In the next section, some of the most successful and impressive works related to CNN accelerators are introduced.

Related works

Tianshi et al. [ 8 ] proposed DianNao as a hardware accelerator for large-scale convolutional neural networks (CNNs) and deep neural networks (DNNs). The main focus of the suggested model is on the memory structure to be optimized for big neural network computations. The experimental results showed speedup in computation and reduction of overhead in performance and energy. This research also demonstrated that the accelerator can be implemented in very small area in order of 3 mm 2 and 485 mW power.

Zidong et al. [ 9 ] suggested ShiDianNao as a CNN accelerator for image processing close to a CMOS or CCD sensor. The performance and energy of this architecture is compared to CPU, GPU and DainNao, which has been discussed in previous work [ 8 ]. Utilizing SRAM instead of DRAM made it 60 times more enery effiecent than DianNao. It is also 50×, 30× and 1.87× faster than a mainstream CPU, GPU and DianNao, with just 65 nm usage area and 320 mW power.

Wenyan et al. [ 6 ] offered a flexible dataflow accelerator for convolutional neural networks called FlexFlow. Working on different types of parallelism is the substantial contribution of this model. Results of the tests showed 2–10 × performance speedup and 2.5–10 × power efficiency in comparison with three investigated baseline architectures.

Eyriss is a spatial architecture for energy efficient data flows for CNNs which presented by Yu-Hsin et al. [ 10 ]. This hardware model is based on a dataflow named row stationary (RS). This dataflow minimizes energy consumption by reusing computation of filter weights. The proposed RS dataflow is investigated on AlexNet CNN configuration, which proved energy efficiency improvement.

Morph is a flexible accelerator for 3D CNN-based video processing that offered by Katrik et al. [ 7 ]. Since the previous work and proposed architectures didn’t specificly focus on video processing, this model can be considered as a novelty in this area. Comparison of energy consumption in this architecture with previous idea, Eyriss [ 10 ] showed a high level of reduction that means energy saving. The main reason of this improvement is effective data reuse which reduces the access to higher level buffers and high cost off-cheap memory.

Michael et al. [ 11 ] described Buffets that is an efficient and composable accelerator and independent of any particular design. Through this research, explicit decoupled data orchestration (EDDO) is introduced which allows evaluation of energy efficiency in acceleators. Result of this work showed that with a smaller usage area, higher energy efficiency and lower control overhead is acquired.

Deep learning applications

Deep learning has a wide range of applications in recognition, classification and prediction, and since it tends to work like the human brain and consequently does the human jobs in a more accurate and low cost manner, its usage is dramatically increasing. More than 100 papers published from 2015 to 2020, helped categorize the main applications as below:

Computer vision

Translation

Health monitoring

Disease prediction

Medical image analysis

Drug discovery

Biomedicine

Bioinformatics

Smart clothing

Personal health advisors

Pixel restoration for photos

Sound restoration in videos

Describing photos

Handwriting recognition

Predicting natural disasters

Cyber physical security systems [ 12 ]

Intelligent transportation systems [ 13 ]

Computed tomography image reconstruction [ 14 ]

As mentioned previously, artificial intelligence and deep learning applications are growing drastically, but they have high complexity computation, energy consumption, costs and memory bandwidth. All these reasons were major motivations for developing deep learning accelerators (DLA) [ 15 ]. A DLA is a hardware architecture that is specially designed and optimized for deep learning purposes. Recent DLA architectures (e.g. OpenCL) have mainly focused on maximizing computation reuse and minimizing memory bandwidth, which led to higher speed and performance [ 16 ].

Generally, most of the accelerators support just fixed data flow and are not reconfigurable, but for doing huge deployments, they need to be programmable. Hyoukjun et al. [ 15 ] proposed a novel architecture named MAERI (Multiply-Accumulate Engine with Reconfigurable Interconnects), which is reconfigurable and employs ART (Augmented Reduction Tree) which showed 8 ~ 459% better utilization for different data flows over a strict network-on-chip (NoC) fabric. Figure  5 shows the overall structure of MAERI DLA.

figure 5

MAERI micro architecture [ 15 ]

In another research, Hyoukjun et al. offered a framework called “MAESTRO” (Modeling Accelerator Efficiency via Spatio-Temporal Resource Occupancy) for predicting energy performance and efficiency in DLAs [ 17 ]. MAESTRO is an open-source tool that is capable of computing many NoC parameters for a proposed accelerator and related data flow such as maximum performance (roofline throughput), compute runtime, total runtime, NoC analysis, L1 to L2 NoC bandwidth, L2 to L1 bandwidth analysis, buffer analysis, L1 and L2 computation reuse, L1 and L2 weight reuse, L1 and L2 input reuse and so on. The topology, tool flow and relationship between each of its blocks of this framework are presented in Fig.  6 .

figure 6

MAESTRO topology [ 15 ]

Results and discussion

In this paper, we used MAESTRO to investigate buffer, NoC, and performance parameters of a DLA in comparison to a classical architecture for a specific deep learning data flow. For running MAESTRO and getting the related analysis, some parameters should be configured, as follows:

LayerFile: Including the information related to the layers of neural network.

DataFlow File: Information related to data flow.

Vector Width: Width of the vectors.

NoCBand width: Bandwidth of NoC.

Multicast Supported: This logical indictor (True/False) is for defining that the NoC supports multicast or not.

NumAverageHopsinNoC: Average number of hops in the NoC.

NumPEs: Number of processing elements.

For the simulation of this paper, we configured the mentioned parameters as presented in Table 1 .

As presented in Table 1 , we have selected Vgg16_conv11 as LayerFile, which is a convolutional neural network that has proposed by K. Simonyan and A. Zisserman. This deep convolutional network model was offered for image recognition with 92.7% accuracy on ImageNet dataset [ 18 ].

Two different data flow strategies are investigated and compared in this study: NLR and NVDLA. NLR stands for “No Local Reuse” which expresses its specific strategy and NVDLA is a novel DLA designed by NVIDIA Co [ 19 ].

Other parameters such as vector width, NoC bandwidth, multicast support capability, average numbers of hops and numbers of processing elements in NoC have been selected based on a real hardware condition.

Simulation results demonstrated that NVDLA has better performance, runtime, higher computation reuse and lower memory bandwidth in comparison to NLR as presented in Table 2 and Figs. 7 , 8 , and 9 .

figure 7

Comparing L1 Weight and Input Reuse

figure 8

Comparing L2 Weight and Input Reuse

figure 9

Total Runtime comparison

Artificial intelligence, machine learning and deep learning are growing trends affecting our lives in almost all aspects of human’s life. These technologies make our life easier by assigning routine tasks of human resources to the machines that are much more accurate and fast. Therefore, any effort for optimizing performance, speed, and accuracy of these technologies is valuable. In this research, we focused on performance improvements of the hardware that are used for deep learning purposes named deep learning accelerators. Investigating recent researches conducted on these hardware accelerators shows that they can optimize costs, energy consumption, run time about 8–459% based on MAERI’s investigation by minimizing memory bandwidth and maximizing computation reuse. Utilizing an open source tool named MAESTRO, we compared buffer, NoC and performance parameters of NLR and NVDLA data flows. Results showed higher computation reuse for both L1 and L2 of the NVDLA data flow which is designed and optimized for deep learning purposes and studied as deep leraning accelerator in this study. The results showed that the customized hardware accelartor for deep learning (NVDLA) had much shorter total runtime in comparison with NLR.

Availability of data and materials

Abbreviations.

Multiply-accumulate engine with reconfigurable interconnects

No local reuse

NVIDIA deep learning accelerator

Modeling accelerator efficiency via spatio-temporal resource occupancy

Rectified linear unit

  • Deep learning accelerator

Neural network

Convolutional neural network

Deep neural network

Row stationary

Application-specific integrated circuits

Augmented reduction tree

Network on chip

L1 read sum

L1 write sum

L2 read sum

L2 write sum

Jurgen S. Deep learning in neural networks: an overview. Neural Netw. 2015;61:85–117.

Article   Google Scholar  

Muller B, Reinhardt J, Strickland MT. Neural networks: an introduction. Berlin: Springer; 2012. p. 14–5.

MATH   Google Scholar  

Yann L, Yoshua B, Geoffrey H. Deep learning. Nature. 2015;521:436–44.

Li D, Dong Y. Deep learning: methods and applications. Found Trends Signal Process. 2014;7:3–4.

MathSciNet   Google Scholar  

Jianqing F, Cong M, Yiqiao Z. A selective overview of deep learning. arXiv:1904.05526[stat.ML]. 2019.

Wenyan L, et al. FlexFlow: a flexible dataflow accelerator architecture for convolutional neural networks. In: IEEE ınternational symposium on high performance computer architecture. 2017.

Katrik H, et al. Morph: flexible acceleration for 3D CNN-based Video Understanding. In 51st annual IEEE/ACM international symposium on microarchitecture (MICRO). 2018.

Tianshi R, et al. DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning. In: ACM SIGARCH Computer Architecture News; 2014.

Zidong D, et al. ShiDianNao: Shifting Vision Processing Closer to the Sensor. In: ACM/IEEE 42nd annual ınternational symposium on computer architecture (ISCA). 2015.

Chen Y-H, Emer J, Sze V. Eyeriss: A spatial architecture for energy-efficient dataflow for convolutional neural networks. ACM SIGARCH Computer Achitect News. 2016;44:367–79.

Michael P, et al. Buffets: an efficient and composable storage idiom for explicit decoupled data orchestration. In: ASPLOS '19: proceedings of the twenty-fourth ınternational conference on architectural support for programming languages and operating systems. 2019, P 137.

Xia X, Marcin W, Fan X, Damasevicius R, Li Y. Multi-sink distributed power control algorithm for Cyber-physical-systems in coal mine tunnels. Comput Netw. 2019;161:210–9.

Song H, Li W, Shen P, Vasilakos A. Gradient-driven parking navigation using a continuous information potential field based on wireless sensor network. Inf Sci. 2017;408(2):100–14.

Google Scholar  

Bin Z, Dawid P, Marcin W. A regional adaptive variational PDE model for computed tomography image reconstruction. Pattern Recogn. 2019;92:64–81. https://doi.org/10.1016/j.patcog.2019.03.009 .

Hyoukjun K, Ananda S, Tushar K. MAERI: enabaling flexible dataflow mapping over DNN accelerators via reconfigurable interconnetcs. In: ASPLOS ’18, Proceedings of the twenty-third international conference on architectural support for programming languages and operating systems.

Uktu A, Shane O, Davor C, Andrew C. L, and Gordon R. C. An OpenCL deep learning accelerator on Arria 10. In: FPGA ’17, Proceedings of the 2017 ACM/SIGDA international symposium on field programmable gate arrays, pp. 55–64.

Hyoukjun K, Michael P, Tushar K. MAESTRO: an open-source infrastructure for modeling dataflows within deep learning accelerators. arXiv:1805.02566. 2018.

Karen S, Andrew Z. Very deep convolutional network for large-scale image recognition. arXiv:1409.1556. 2015

NVDLA Deep Learning Accelerator, https://nvdla.org . 2017.

George SE. The anatomy and physiology of the human stress response. A clinical guide to the treatment of the human stress responses. Berlin: Springer, pp 19–56.

Christian S, Alexander T, Dumitru E. Deep neural networks for object detection. Advances in neural information processing systems 26, NIPS 2013.

Download references

Acknowledgements

Not applicable.

Author information

Authors and affiliations.

Department of Computer Engineering, Science and Research Branch, Islamic Azad University, Tehran, Iran

Hamidreza Bolhasani & Somayyeh Jafarali Jassbi

You can also search for this author in PubMed   Google Scholar

Contributions

Investigating deep learning accelerators functionality. Analyzing a deep learning accelerator’s architecture. Performance measurement of NVIDIA deep learning accelerator as a case study. Higher computation reuse and lower total runtime for the studied deep learning accelerator in comparison with non-optimized architecture.

Corresponding author

Correspondence to Hamidreza Bolhasani .

Ethics declarations

Competing interests.

Evaluating a deep learning accelerator’s performance.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Bolhasani, H., Jassbi, S.J. Deep learning accelerators: a case study with MAESTRO. J Big Data 7 , 100 (2020). https://doi.org/10.1186/s40537-020-00377-8

Download citation

Received : 13 May 2020

Accepted : 01 November 2020

Published : 12 November 2020

DOI : https://doi.org/10.1186/s40537-020-00377-8

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Deep learning
  • Convolutional neural networks
  • Deep neural networks
  • Hardware accelerator

case study on deep learning

case study on deep learning

Login | Register

  • For Authors
  • For Reviewers
  • Editorial Team
  • Start Submission
  • Become a Reviewer
  • View Harvard Citation Style
  • View Vancouver Citation Style
  • View APA Citation Style
  • Download RIS
  • Download BibTeX

The Open Science of Deep Learning: Three Case Studies

orcid logo

Objective : An area of research in which open science may have particularly high impact is in deep learning (DL), where researchers have developed many algorithms to solve challenging problems, but others may have difficulty in replicating results and applying these algorithms. In response, some researchers have begun to open up DL research by making their resources available (e.g., code, datasets and/or pre-trained models) to the research community. This article describes three case studies in DL where openly available resources are used and we investigate the impact on the projects, the outcomes, and make recommendations for what to focus on when making DL resources available.

Methods : Each case study represents a single project using openly available DL resources for a research project. The process and progress of each case study is recorded along with aspects such as approaches taken, documentation of openly available resources, and researchers' experience with the openly available resources. The case studies are in multiple-document text summarization, optical character recognition (OCR) of thousands of text documents, and identifying unique language descriptors for sensory science.

Results : Each case study was a success but had its own hurdles. Some takeaways are well-structured and clear documentation, code examples and demos, and pre-trained models were at the core to the success of these case studies.

Conclusions : Openly available DL resources were the core of the success of our case studies. The authors encourage DL researchers to continue to make their data, code, and pre-trained models openly available where appropriate.

Keywords: Open Science, Deep Learning, Machine Learning, Open Source, Text Summarization, Optical Character Recognition, RDAP

Miller, C. & Hamilton, L. & Lahne, J., (2023) “The Open Science of Deep Learning: Three Case Studies”, Journal of eScience Librarianship 12(1), e626. doi: https://doi.org/10.7191/jeslib.626

Rights: Copyright © 2023 The Author(s). This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International License (CC-BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Downloads: Download HTML Download PDF

222 Downloads

Published on 16 feb 2023, peer reviewed, creative commons attribution 4.0, introduction.

In the field of machine learning (ML), deep learning (DL) has become a very powerful tool that has been incorporated into many new technologies, e.g., self-driving cars (Daily et al. 2017) and natural language processing (NLP) (Vaswani et al. 2017; Devlin et al. 2019; Sutskever, Vinyals, and Le 2014). DL refers to a subset of ML accomplished using deep neural networks. Many materials and codebases for DL are openly available through platforms like GitHub 1 . This allows those interested to use, improve, and build upon existing work to tackle new problems without starting from scratch. These openly available resources are at the heart of open science, which furthers research by allowing researchers to directly and indirectly collaborate to accomplish new goals. DL has proven to be a great collection of tools for addressing challenging problems. Putting the two together has great potential.

One common task in open science using DL is the process of one researcher taking another researcher’s freely available DL code and applying it to a new problem space with different data. This is different from using a flexible open source framework or library (Brownlee 2019; “Machine Learning Education” 2022; “Free Online Course to Learn the Basics of Deep Learning with Keras” 2022), because the initial codebase is usually developed for a specific task (e.g., to replicate results from a paper). We aim to compare multiple examples of using openly available materials for DL with varying degrees of documentation and extensibility.

This paper presents three case studies in which a Data and Informatics Consultant (DIC) embedded within University Libraries Data Services identified and adapted existing DL codebases to support research groups with new research goals. The case studies are presented in the order of increasing effort and complexity for the researchers to achieve their goals from the openly available resources. The involvement and support of the DIC will be described for each case study in later sections, along with the details, implications, and applications towards the library itself.

In the first case study, we present a workflow for adapting an open source tool to summarize thousands of news articles. The outcome, generalizable to many situations, is a pipeline that can concisely report key facts and events from the articles. In the second case study, we describe the development of an Optical Character Recognition (OCR) pipeline for archival research of typed notecards, here documenting a curated collection of thousands of clothing items. In the last case study, we describe the process of applying an NLP tool for resumé skill extraction to save time on a novel task: identifying descriptive language for whiskies from thousands of free-form text reviews. These case studies resulted in working solutions to challenging problems, thanks to researchers embracing open science.

Literature Review

The idea of open science is not new. Woelfle et.al. explained it in 2011 as analogous to asking your colleague for help. Openly sharing research between researchers accelerates the research process and makes science more transparent (Woelfle, Olliaro, and Todd 2011). Fecher and Friesike (2014) discussed five schools of thought on open science. The one most pertinent to our case studies is the democratic school of thought, focusing on “principal access to the products of research.” However, Heise and Pearce (2020) point out the challenges of making things open, including the slight differences between open access (“pure access to published knowledge”) and open science (“complete access to the entire scientific process”).

The ML and DL communities, inside and outside formal research groups, have a wide variety of open resources. Erickson et. al. (2017) provides a review of freely available ML/DL libraries. There are many websites dedicated to providing pre-trained models and datasets (e.g., huggingface 2 , Kaggle 3 ). On these sites, especially Kaggle, other researchers provide not only datasets but also code notebooks to run the experiments and explain the pipeline.

In the academic sphere, Computer Scientists have long made their research openly available online, independent of academic journals, a practice already widespread by 1998 (Giles, Bollacker, and Lawrence 1998). Today, there are prominent ML/DL journals using both open access and subscription-based publishing models. The Journal of Machine Learning Research, one of the highest-impact ML/DL journals (“Journal Rankings on Artificial Intelligence” 2022; Google Scholar 2022), is an open access journal whose establishment in 2001 led many editors to resign from the subscription-based Machine Learning Journal (Journal of Machine Learning Research 2001; Lewis 2001). Many ML/DL researchers continue to prioritize open access (Hutson 2018), and when ML/DL research is published in subscription-based journals, it is often still freely available as a pre-print. Since 2018, Computer Science is the most-published category on ArXiv.org (“Submissions by Category since 2009+ | ArXiv E-Print Repository” 2022).

As for librarian activities around these computational resources, there are several examples exemplifying such work and partnership. Lamba and Madhusudhan in (Lamba and Madhusudhan 2021a) describe the use of text mining in various contexts, most notably within libraries as presented in Chapter 1, “The Computational Library” (Lamba and Madhusudhan 2021b). Some Computer Science researchers work in the area of digital libraries with a close partnership to their respective academic libraries. One example is (“Digital Library Research Laboratory” 2022), a lab dedicated to partnering with academic libraries for research in information retrieval. A notable member of the lab is the Assistant Dean and Director of Information Technology within the Virginia Tech University Libraries (Ingram 2022). Another entity supporting computation in libraries is the Online Computer Library Center (“OCLC Research” 2022) which has a Research Library Partnership (RLP) that connects research libraries that support these libraries with 21st Century challenges.

Case study partnerships with libraries have occurred for multiple kinds of events and projects. Hackathons have become popular which entails creating a prototype using hardware and/or software within a short amount of time, generally around 24 hours. The Ohio State University library hosted such an event (Longmeier, Dotson, and Armstrong 2022). Librarians and a Computer Science faculty member partnered to design and teach a class on algorithm bias (Ramachandran, Cutchin, and Fu 2021).

Such partnerships and interactions between a university library and external entities, such as faculty members and students, show support for the positive outcomes such partnerships create. In our case, the openly available DL resources available, along with the skillset of the DIC to use them, provides an excellent opportunity for partnership and support for the university. Our presented case studies not only exemplify this, but also show the positive outcome of such openly available resources being increasingly accessible.

Case Studies

Text summarization.

The first case study is a semester-long project in a Computer Science class offered simultaneously at the graduate and senior capstone level. The focus of the project was to summarize thousands of news articles into a coherent, short summary. This “multi-document summarization” was a novel task at the time of the case study, as opposed to single-document summarization. The professor gave suggestions of how to process the articles with the ultimate goal being to make an abstractive summary as opposed to an extractive summary. An extractive summary takes pieces of the articles and uses them verbatim to create the summary. An abstractive summary is able to use words not present in the initial text to aid in summarization, more closely mimicking how humans produce a summary. The goal was to use one of the many available DL algorithms to produce an abstractive summary.

For this case study, the research group was interested in off-the-shelf, ready-to-use solutions. The single-semester time limit made developing something from scratch impractical. Various techniques for automated summarization were introduced to the class. The need for text summarization is supported by the challenge of having vast volumes of text available from various sources in which it is impractical to expect an individual to read and synthesize from many sources (Allahyari et al. 2017).

The library support for this project was provided by the DIC as the DIC was taking the class at the same time. During the case study, the DIC was able to utilize their training and resources within the library to support the research group.

When this case study took place in fall 2018, the state-of-the-art in summarization using DL was sequence-to-sequence (Sutskever, Vinyals, and Le 2014). Another recent and successful deep learning approach was the Pointer-Generator Network (PGN) (Abigail See, Liu, and Manning 2017). Fall 2018 also saw the release of the transformer architecture by Google (“BERT” 2018), which was published shortly before (Vaswani et al. 2017), and was a game changer for language models. However, this was released partway through the semester, well into the case study researcher’s development phase, so it was not used. The group researched and tried several open source implementations of sequence-to-sequence and PGN algorithms, with the PGN being the most successful for the task. A PGN is a hybrid between abstractive and extractive text summarization and provides an end-to-end solution for the research group. A main reason for this choice is that the researchers were unable to successfully get the identified sequence-to-sequence repositories to work within the tight deadlines of the class.

The corpus to be summarized consisted of approximately 12,000 articles related to the #NeverAgain hashtag scraped from the internet: a movement seeking to end gun violence in schools. The Graduate Teaching Assistant (GTA) performed scraping of the articles using the Twitter API to collect tentatively-relevant URLs followed by a web crawler. When duplicate, empty, or irrelevant articles were removed, around 3,600 articles remained. Article relevance was determined using Latent Dirichlet Allocation (LDA) topic modeling (Blei, Ng, and Jordan 2003).

The model used with the PGN was trained on the CNN/Daily Mail dataset (Hermann et al. 2015; Nallapati et al. 2016) and was downloaded and used for this case study, instead of training a model from scratch. The link can be found at this GitHub repository (Abi See 2017). This pre-trained model saved the case study researchers days of training on a computing cluster. They tried this themselves at first, and it took 3 to 4 days. If they needed to re-build the model to tweak some parameters, each retraining would take another 3 to 4 days. Hence, the pre-trained model saved much time.

In order to perform summarization, the articles needed to be converted into a specific binary format. The code to perform this was open source but assumed that the user was going to convert the CNN/Daily Mail dataset. The researchers updated the code in a few days so as to specify what set of articles to convert, making it more extensible. This produced the input format for the PGN. The code is openly available (Miller 2018).

The abstractive summary was made using code from a GitHub repository (Abi See 2017) implementing the PGN DL algorithm to perform multi-document summarization. Minimal coding was needed to adapt this open source code to the #NeverAgain corpus.

Documentation

This project was successful partly because the PGN documentation was very thorough, including required library version numbers, links to pre-trained models, and a full set of example Python commands to train and evaluate the model. The version numbers are very important, as some libraries like TensorFlow 4 , as well as Python itself, are not always backwards-compatible. The availability of pre-trained models and an example script ensured the project was completed on time.

case study on deep learning

Figure 1 : Text Summarization pipeline overview.

The researchers started with a pre-trained model and source code and applied them to a collection of documents, producing a summary. The resulting data pipeline can be seen in Figure 1, including a portion of the final summary. Generally, the quality of a summary is calculated as a ROUGE score (Lin 2004). The ROUGE scores were low for this summary, with entity coverage only being 7.25%. Entity coverage is how many entities, e.g., people, places, names, are identified in comparison to a “gold standard” summary, here created by human classmates. Even though the scores were low, we direct the reader to the summary result in Figure 1 which exemplifies that the summary was coherent and on-topic. Further details can be found in the final report (Arora et al. 2018). Either way, the openly available pre-trained model and source code allowed for this pipeline to be successfully created and resulted in a successful project for the class.

In terms of open science and open access, the challenges were minimal. There were some technical challenges in providing a processing environment for computationally performing the summarization. This was overcome by a computer made available to the DIC within the library that could be dedicated to the computational processing required by the project.

Separate from that, the conversion code for converting the articles into the desired binary format for the PGN took a couple of days to understand and adjust for the researchers' purpose. This was less of a challenge with open resources and more of a time commitment to learning another researcher’s code. The original conversion code could be improved to be more extensible, which is exactly what the DIC was able to accomplish.

Another challenge was the quality and conciseness of documentation. One repository identified was not used since the documentation had excess information making it harder to understand. The examples were too specific and did not document the general use case.

The researchers of this case study explored several approaches for text summarization with mixed success. They had to judge each based on two criteria: the quality of the benchmark summaries and the clarity of documentation. Poor documentation ruled out several summarization algorithms, with only the PGN being able to be adapted within the semester timeline.

In the end, they selected the TensorFlow version of PGN for reasons directly related to the key principles of open science. The directions were clear and concise, and the commands to run the PGN were simple. There was a provided pre-trained model. In DL, training a model can take days or weeks, so having an openly available model saves valuable time. This also provided more assurance that the model was correctly tested and verified, as it was created by the researchers who designed the algorithm. Having the code, model, and documentation all openly available made it possible for the group in the process of learning about DL to successfully apply a DL approach in a constrained time frame. Even data conversion was possible due to clearly written code.

An interesting point is that the tools for performing the summarization were openly available resources, but the article data was only open within the institution. This limits who could have possibly done this particular research project; however, the main focus in this project was the openly available DL resources that made the project a success. Hence, with the given open resources, other researchers could conduct a similar project with other available data.

There were several DL frameworks available to perform text summarization, with varying qualities of documentation. The availability of pre-trained models was also critical as it saved much time for the researchers since training a model could take days. Thankfully, the authors of the PGN also made a pre-trained model openly available, saving the researchers time and allowing the researchers to focus on the problems specific to the project. For more details on the project, the reader is directed to the full project report found at (Arora et al. 2018).

This project built expertise within the library to support other researchers at the university with similar needs. Tools to summarize vast amounts of documents could also positively impact library services by facilitating faster understanding of large archival collections and published text provided by the library.

Optical Character Recognition

The second case study is an optical character recognition (OCR) project focused on extracting text from scanned images of notecards describing a curated costume collection (~5100 notecards). OCR is the process of identifying text within an image and converting it into accessible text. The notecards contain different pieces of information of interest, such as the donor and description of an item. This information is organized in different layouts on the notecards. In order to identify the information, the researchers needed to automatically identify the layout, i.e., where this information is on the notecards.

To perform this task, the researchers used an open source version of a Masked Recurrent Convolutional Neural Network (MRCNN) (“Mask R-CNN for Object Detection and Segmentation” 2017) , a DL approach to identify objects within an image. In this case, the objects were each a piece of information from the notecards. This was done for two different layouts which comprised approximately 80% of the entire notecard collection. The rest of the notecards were left to be processed at a later date. This layout segmentation provided the input to the OCR algorithm chosen (discussed next).

Performing accurate OCR has been a challenge for many years (Hamad and Kaya 2016; Ahmed and Abidi 2019). One area with specific needs for OCR support is the humanities (Henry 2014) and this case study is an example of how one can address this need. At the time of this case study (Spring/Summer 2020), there were only a handful of open source OCR solutions. While exploring solutions such as Google’s open source tesseract (“Tesseract OCR” 2014), the team discovered the open source OCR provided by Clova 5 which had placed high in multiple competitions (“Focused Scene Text - Robust Reading Competition” 2019; “ICDAR2017 Robust Reading Challenge on COCO-Text” 2017; “ICDAR2019 Robust Reading Competition” 2019; “ICDAR2019 - ReCTS - Robust Reading Competition” 2019). Clova provides two tools for an OCR pipeline: text identification and text extraction processes. First, it identifies all of the individual pieces of text (normally single words) within an image using the DL approach (Y. Baek et al. 2019). Text extraction, or traditional OCR, is then conducted using a pre-trained DL model (J. Baek et al. 2019) applied to each identified piece of text. This tool is more efficient as it performs OCR on single “patches” of an image instead of processing the entire image through the OCR engine. This reduces the resources needed for processing each image as only a “subset” of an image is processed. A major reason why Clova OCR was chosen is that the Clova authors developed a framework to test different DL algorithms in conjunction to identify the best combination. This was unique to the Clova OCR solution and allowed the usage of state-of-the-art solutions.

A faculty member within the Fashion Merchandising and Design Department approached the library with the need to OCR thousands of descriptive scanned notecards in JPEG format. Each notecard describes a particular accessory or garment within the curated collection (“The Oris Glisson Historic Costume and Textile Collection” 2022). An example can be seen in Figure 2. The text is typeset and generally clean with some potential blurring and extra lines (e.g., “Identification No.” is underlined).

case study on deep learning

Figure 2 : Example notecard from the garment collection.

The faculty member sought the expertise of the library, specifically the DIC, to aid in performing this task. The DIC worked alongside the faculty member throughout the entire project to support and aid where necessary.

The text in the notecard photos was digitized using code from two GitHub repositories (“Clovaai/CRAFT-Pytorch” 2019; “Clovaai/Deep-Text-Recognition-Benchmark” 2019). One codebase identified where text is in an image and the second converted the identified text areas to machine readable text (performs the OCR). The researchers of this case study wrote "glue" code to integrate both repositories together and create a working OCR pipeline from input to final output. The OCR solution has support for GPU acceleration through the use of PyTorch 6 .

The documentation helped the research team piece everything together. Each code repository had a well-organized README accompanying the code repository and demo code with a low learning curve. The researchers started with the demo code for each and connected them together to create the final OCR pipeline.

case study on deep learning

Figure 3 : OCR pipeline overview.

The resulting data pipeline developed for this case study can be seen in Figure 3. The researchers “glued” together the two repositories and used an available pre-trained model to first identify where text was, and then perform the OCR.

The resulting output of the data pipeline can be seen as an example in Figure 4. The text identification code provided the location of each word on the JPEG, i.e., the location and dimensions of the bounding box of each word identified. Given this information, the researchers were able to use the Python library FPDF (Reingart 2013) to create a PDF and specify the locations to put the identified words. Here the font size is based on the dimensions of the bounding box for each word and the shade of blue represents the confidence level (probability) that the OCR is correct. A darker shade means higher confidence while lighter is less confident. As can be seen, the formatting is not ideal and fixing this is an area of future research for the team.

case study on deep learning

Figure 4 : Clova OCR output for one notecard. The shade of blue represents the confidence level (probability) that the OCR is correct. Darker shades means higher confidence while lighter is less confident.

Given the openly available code repositories and a pre-trained model, the researchers were able to accomplish OCR on thousands of scanned notecards describing a physical collection. The creation of the pipeline took roughly two weeks to complete and resulted in an end-to-end solution.

In terms of open science and open access, the challenges were a little more than the first case study as the researchers had to work with two repositories that required the creation of new code that connected them. As the repository authors did not provide a way for the text identification and OCR repositories to work together, it was up to the researchers to create a solution. The repositories could be improved by the repository authors providing a demo that tied them together. This would have been very beneficial.

To overcome any computational resource requirements, a specialized computer was made available to the DIC through the library. With this computer, the potential challenges of performing the OCR, which is very computationally intense, was solved.

The search for an OCR solution was short, ending with a GitHub repository that had ranked high in OCR competitions and linked to another open code repository able to identify where text is in an image. In the context of this paper, having such high quality open resources provided a robust OCR solution for the project.

It took mere minutes to test each repository separately, and two weeks to integrate the tools together. Given the complexity of DL code, this is a testament to the quality of the documentation. Each repository included well-documented pre-trained models and demo code requiring minimal set up. The initial results using the two libraries and the “glue” code were very promising, with the speed of implementation being a major advantage.

The openness of these available resources allowed the researchers to have a successful project and exemplify what is possible when such resources are made freely accessible.

The knowledge and skills learned and developed during this case study strengthened the OCR support the library could offer. This is a crucial service that can be utilized within the library when collections require OCR. Having a framework already in place allows for digitization through OCR to be more readily available and possible.

Descriptor Identification

The final case study involves extracting flavor descriptors from reviews of beverages. Flavor descriptors are words that describe the sensory profile of a food item, specifically whiskey, for this project. The set of possible flavor descriptors for a food product (its "flavor lexicon") can be difficult to extract due to the unique nature of words used as descriptors. They are commonly, but not always adjectives, and not all adjectives are useful descriptors. Frequency is not always key as some useful descriptors can be infrequent, and the topmost common adjectives often describe color (e.g., “red”, “black”, “brown”), intensity (e.g., “very”), or liking (e.g., “nice”, “pleasant”) (Bécue-Bertaut, Álvarez-Esteban, and Pagès 2008).

The descriptor extraction was the most challenging project, as only the core code was available online through a blog post (Intuition Engineering 2018). The researchers had to study the problem deeper to better understand what code "glue" and other pieces needed to be developed. The work required to accomplish this task took months, as the researchers were newer to developing DL algorithms. For example, the code that was available required a more involved understanding of DL models in order to define the model and format the input data (text). There are whole books written on text data preparation for ML and DL algorithms such as (Brownlee 2020a, 2020b). This exemplifies the need to study techniques for accomplishing this preparation. However, the success of the project would have potentially taken longer if not for the available open resources.

As in the OCR project, a faculty member approached the library looking for a collaborator with the skillset to aid in this project. Fortunately, the DIC within the library was able to partner with the faculty member and support the DL part of the project.

The research team was not able to use another researcher’s pre-trained model, as this case study had a unique challenge. To the author’s knowledge, at the time of the project (Summer 2019 - Fall 2021), there were no other trained models for identifying unique sensory terms. In the domain of sensory science, flavor lexicons are made using an experimental methodology called Descriptive Analysis (DA). DA uses a trained human panel that tastes a variety of products within a category and provides descriptive sensory terms for each product (Heymann, King, and Hopfer 2014). There are a few examples, however, of corpora of food descriptions being used to identify flavor descriptors for a lexicon (Ickes, Lee, and Cadwallader 2017).

The second, less common method of extracting descriptors is more similar to other keyword extraction problems that have some solutions using NLP such as RAKE (Rose et al. 2010) and YAKE (Campos et al. 2020). The blog (Intuition Engineering 2018) described an analogous problem identifying skills in resumés. Terms describing skills can likewise be very unique to the domain, such as words containing all capital letters or symbols, e.g., SQL or C++. Hence, why this analogous problem was used as a template. Only the core code for the resumé skill extraction tool was available, with functions to define the DL architecture in Python using Keras 7 along with a few other helper functions.

Our data set consisted of 8000+ whiskey reviews for training and testing. The data was scraped from WhiskyAdvocate (4288 reviews), WhiskyCast (2309 reviews), The Whiskey Jug (1095 reviews), and Breaking Bourbon (344 reviews). WhiskyAdvocate and WhiskyCast reviewers are professionals, with whiskey writing being a primary income source, while Breaking Bourbon and the Whiskey Jug are hobbyist blogs run by “semi-professional” reviewers.

Figure 5 : DL architecture used for descriptor identification. Figure 6 : Format described by the blog post for input data for training the model.

This case study required the most custom coding by far. The code snippets publicly available on the blog post were primarily focused on defining the DL neural network architecture (Figure 5) and the format of the input data for training (Figure 6). The research team needed to fill in the rest, which included adjusting the model architecture to use their feature set, pre-processing the data into the desired format using GloVe word embeddings (Pennington, Socher, and Manning 2014), and developing code to train the model and evaluate the results.

The description of the problem and DL solution in the blog post had clear steps, most of which had some code snippets to represent what was discussed. Certain steps, however, were only described in prose without accompanying code. How to implement these processes was left for the reader to figure out.

The partial code available that became the core of this project, as mentioned earlier, is in a blog post on TowardDataScience.com, which allows free access for some articles. The author(s) can also require a Medium.com membership to view their work, which is the case here as of the writing of this paper. Now that the case study researchers have successfully adapted these available resources, they have published an open access journal article describing the full technical details of the project (Miller, Hamilton, and Lahne 2021).

case study on deep learning

Figure 7 : Descriptor identification pipeline overview along with resources that were developed. The words in the last step highlighted in orange are the words identified as descriptors.

The resulting data pipeline developed for this case study can be seen in Figure 7 where the words in the last step highlighted in orange are the words identified as descriptors. The openly available resources were the DL model architecture and the format of the training data. From there, the researchers were able to train using the openly available GloVe word embeddings, then apply to text and identify which words are descriptors.

The result was a trained model that can accurately identify descriptors within a corpus of whisky review texts with a train/test accuracy of 99% and precision, recall, and F1-scores of 0.99. This shows that even though the available resources were limited, the researchers were still able to use them to solve a challenging problem.

Since one measure of success was whether the model can distinguish between if a word is a descriptor or not, the researchers applied a t-SNE to visualize if there was any clustering of descriptors and non-descriptors. A t-SNE (van der Maaten and Hinton 2008) is a dimensionality reduction algorithm used to visualize high dimensional data in a two-dimensional space. Word embeddings are a high-dimensional mathematical vector representation of words, one vector per word. It represents a conceptual space where words with similar meaning have vectors that are closer to each other. The result can be seen in Figure 8 where the brown “X”’s represents a word tagged as a descriptor and a blue dot those words tagged as not being a descriptor. The clusters represent that the model was able to identify the descriptor space, i.e., where in the conceptual space do the descriptors lie. There are some “speckles” of descriptors throughout the blue dot cluster which shows that some words may be conceptually similar, but their contextual meaning varies.

case study on deep learning

Figure 8 : Application of trained model to the corpus (minus the training and test set). Brown "X"'s represent a word labeled as a descriptor and blue dots are labeled as non-descriptors.

This was the most challenging case study, as the use of partially-open resources without a full demo required the most programming and engineering from the researchers and DIC. The project was successful and resulted in a full description of the problem in an open access journal (Miller, Hamilton, and Lahne 2021) with access to the code and data at request of the researchers. What could have been improved was more available code to aid in the creation of the entire pipeline. The only demo available was an online example for identifying skills from a resumé. This is beneficial to showcasing the solution, but does not help in engineering a solution.

Last is providing the computational resources needed for training an NLP model from scratch. Once again, a specialized computer was provided to the DIC through the library to address this challenge and allow the researchers to focus on the research itself.

The high accuracy, precision, recall, and F-1 scores shows that this case study was a great success. No pre-trained model was available, and while this increased the project timeline, the researchers’ model successfully learned some nuances of the sensory language of their domain as demonstrated by the t-SNE visualization.

Given these challenges, this case study exemplifies how a partially-open resource can inspire a project direction but also hinder it, e.g., the absence of code for all parts of the project. This contrasts with the other two case studies, which were able to proceed faster with fully-open DL resources. Despite these limitations, this case study demonstrates that research can still benefit from even partially-open resources.

The implications to the library from this project are fairly unique. The ability to develop a pipeline for pre-processing text, identifying training data, developing a custom language model, and training the model has many potential connections to text library collections. Custom models can be developed to identify unique information within collections and especially for vast archived collections. For example, these skills and techniques can be applied to historic collections to better understand elements of the collection, extracting key items and insights from the collections. The developed skill sets provide desired resources for people from outside the library along with those within.

Lessons Learned

A key part of open science is providing enough information for the scientific investigation to be replicated: a description of how an experiment was run or documentation for how a software tool works. However, not all documentation is created equal. Even with meticulous directions, success was not always possible. In the text summarization case study, the researchers ruled out one promising solution on GitHub because the README contained an excess of detail on how to use the software for only some specific cases and the commands were confusing to adapt to the researchers’ needs. A useful README contains step-by-step instructions of how to install the software, a list of dependencies, several base use cases, and ideally several pre-packaged demos. Too much detail can make more work for the end user who has to sift through it, especially if the documentation isn’t well-structured or is overly specific to the original use case. Functionality that exists in an openly-available library is only useful to the end user if the documentation explains when the functionality is useful and provides run commands demonstrating the functionality.

Beyond documentation, another tool to open science is demonstration–this could be in the form of a recorded or documented experiment or a set of code notebooks that walk through a tool. The OCR case study demonstrates the use of these tools in bolstering open science: a well-developed demo to showcase its functionality made it possible to adapt the original tool’s functionality for the researcher’s needs.

Access to pre-trained models was vital to the success of the first two case studies. These models allowed the researchers to progress through their projects and minimize time needed for set-up and application. Without pre-trained models, the researchers would need to perform the entire training pipeline themselves, which may not always be possible due to the lack of computing resources and/or expertise. In contrast, in the third case study the need to develop parts of the model and train it proved a significant barrier—although it was possible to overcome this, the time required (and the work that was presumably redundant to that done by the original authors of the model (Intuition Engineering 2018)) was much greater than in the other two case studies.

Conclusions

Using open resources, the researchers of these case studies were able to successfully pursue research projects with positive outcomes. Each case study started with the researchers identifying available resources, with each requiring different approaches. The support from, and partnership with, the library provided key eScience elements allowing each case study to be that much more successful.

Our observation is that more and more ML and DL researchers (and some companies) are making their resources (data, pre-trained models, code) openly available for the wider research community. This is a reason these case studies were able to find workable solutions without the need to be experts in the respective ML and DL topics and techniques.

Some takeaways are that documentation, clear code examples and demos, and pre-trained models were at the core to the success of these case studies. Also that eScience skills within the library can provide crucial support to making some projects more possible and allow such technologies discussed in this paper more accessible by those within and coming to the library.

We hope this collection of case studies will provide compelling evidence as to why the spirit of open science should be embraced not only by ML and DL researchers and practitioners, but also by other domains as well.

Ahmed, Muna, and Ali Abidi. 2019. REVIEW ON OPTICAL CHARACTER RECOGNITION .

Allahyari, Mehdi, Seyedamin Pouriyeh, Mehdi Assefi, Saeid Safaei, Elizabeth D. Trippe, Juan B. Gutierrez, and Krys Kochut. 2017. “Text Summarization Techniques: A Brief Survey.” arXiv:1707.02268. arXiv. https://doi.org/10.48550/ARXIV.1707.02268 .

Arora, Anuj, Chreston Miller, Jixiang Fan, Shuai Liu, and Yi Han. 2018. “Big Data Text Summarization for the NeverAgain Movement.” December. Virginia Tech. https://vtechworks.lib.vt.edu/handle/10919/86357 .

Baek, Jeonghun, Geewook Kim, Junyeop Lee, Sungrae Park, Dongyoon Han, Sangdoo Yun, Seong Joon Oh, and Hwalsuk Lee. 2019. “What Is Wrong With Scene Text Recognition Model Comparisons? Dataset and Model Analysis.” In 2019 IEEE/CVF International Conference on Computer Vision (ICCV) , 4714–4722. Seoul, Korea (South): IEEE. https://doi.org/10.1109/iccv.2019.00481 .

Baek, Youngmin, Bado Lee, Dongyoon Han, Sangdoo Yun, and Hwalsuk Lee. 2019. “Character Region Awareness for Text Detection.” In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , 9357–9366. Long Beach, CA, USA: IEEE. https://doi.org/10.1109/cvpr.2019.00959 .

Bécue-Bertaut, Mónica, Ramón Álvarez-Esteban, and Jérôme Pagès. 2008. “Rating of Products through Scores and Free-Text Assertions: Comparing and Combining Both.” Food Quality and Preference 19(1): 122–134. https://doi.org/10.1016/j.foodqual.2007.07.006 .

“BERT.” 2018. Google Research. https://github.com/google-research/bert .

Blei, David M., Andrew Y. Ng, and Michael I. Jordan. 2003. “Latent Dirichlet Allocation.” Journal of Machine Learning Research 3(Jan): 993–1022.

Brownlee, Jason. 2019. “Your First Deep Learning Project in Python with Keras Step-By-Step.” Machine Learning Mastery . July 23. https://machinelearningmastery.com/tutorial-first-neural-network-python-keras .

———. 2020a. Data Preparation for Machine Learning . https://machinelearningmastery.com/data-preparation-for-machine-learning .

———. 2020b. “8 Top Books on Data Cleaning and Feature Engineering.” Machine Learning Mastery . June 30. https://machinelearningmastery.com/books-on-data-cleaning-data-preparation-and-feature-engineering .

Campos, Ricardo, Vítor Mangaravite, Arian Pasquali, Alípio Jorge, Célia Nunes, and Adam Jatowt. 2020. “YAKE! Keyword Extraction from Single Documents Using Multiple Local Features.” Information Sciences 509: 257–289. https://doi.org/10.1016/j.ins.2019.09.013 .

“Clovaai/CRAFT-Pytorch.” 2019. Clova AI Research. https://github.com/clovaai/CRAFT-pytorch .

“Clovaai/Deep-Text-Recognition-Benchmark.” 2019. Clova AI Research. https://github.com/clovaai/deep-text-recognition-benchmark .

Daily, Mike, Swarup Medasani, Reinhold Behringer, and Mohan Trivedi. 2017. “Self-Driving Cars.” Computer 50(12): 18–23. https://doi.org/10.1109/mc.2017.4451204 .

Devlin, Jacob, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. “BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding.” arXiv. https://doi.org/10.48550/ARXIV.1810.04805 .

“Digital Library Research Laboratory.” 2022. Accessed November 15. https://dlib.vt.edu/content/dlib_vt_edu/en/index.html .

Erickson, Bradley J., Panagiotis Korfiatis, Zeynettin Akkus, Timothy Kline, and Kenneth Philbrick. 2017. “Toolkits and Libraries for Deep Learning.” Journal of Digital Imaging 30(4): 400–405. https://doi.org/10.1007/s10278-017-9965-6 .

Fecher, Benedikt, and Sascha Friesike. 2014. “Open Science: One Term, Five Schools of Thought.” In Opening Science: The Evolving Guide on How the Internet Is Changing Research, Collaboration and Scholarly Publishing , edited by Sönke Bartling and Sascha Friesike, 17–47. Cham: Springer International Publishing. https://doi.org/10.1007/978-3-319-00026-8_2 .

“Focused Scene Text - Robust Reading Competition.” 2019. https://rrc.cvc.uab.es/?ch=2&com=evaluation&task=3 .

“Free Online Course to Learn the Basics of Deep Learning with Keras.” 2022. Simplilearn.Com . Accessed June 23. https://i9simplex.simplilearn.com/learn-keras-for-beginners-free-course-skillup .

Giles, C. Lee, Kurt D. Bollacker, and Steve Lawrence. 1998. “CiteSeer: An Automatic Citation Indexing System.” In Proceedings of the Third ACM Conference on Digital Libraries - DL ’98 , 89–98. Pittsburgh, Pennsylvania, United States: ACM Press. https://doi.org/10.1145/276675.276685 .

Google Scholar. 2022. “Artificial Intelligence - Google Scholar Metrics.” Accessed June 22. https://scholar.google.com/citations?view_op=top_venues&hl=en&vq=eng_artificialintelligence .

Hamad, Karez, and Mehmet Kaya. 2016. “A Detailed Analysis of Optical Character Recognition Technology.” International Journal of Applied Mathematics, Electronics and Computers 4(Special Issue-1): 244–244. https://doi.org/10.18100/ijamec.270374 .

Heise, Christian, and Joshua M. Pearce. 2020. “From Open Access to Open Science: The Path From Scientific Reality to Open Scientific Communication.” SAGE Open 10(2): 2158244020915900. https://doi.org/10.1177/2158244020915900 .

Henry, Geneva. 2014. “Data Curation for the Humanities: Perspectives From Rice University.” In Research Data Management: Practical Strategies for Information Professionals , 347–374. Purdue University Press.

Hermann, Karl Moritz, Tomas Kocisky, Edward Grefenstette, Lasse Espeholt, Will Kay, Mustafa Suleyman, and Phil Blunsom. 2015. “Teaching Machines to Read and Comprehend.” Advances in Neural Information Processing Systems 28. https://proceedings.neurips.cc/paper/2015/hash/afdec7005cc9f14302cd0474fd0f3c96-Abstract.html .

Heymann, Hildegarde, Ellena S. King, and Helene Hopfer. 2014. “Classical Descriptive Analysis.”In Novel Techniques in Sensory Characterization and Consumer Profiling , CRC Press, 9–40. https://doi.org/10.1201/b16853 .

Hutson, Matthew. 2018. “Why Are AI Researchers Boycotting a New Nature Journal—and Shunning Others?” May 17. https://www.science.org/content/article/why-are-ai-researchers-boycotting-new-nature-journal-and-shunning-others .

“ICDAR2017 Robust Reading Challenge on COCO-Text.” 2017. https://rrc.cvc.uab.es/?ch=5&com=evaluation&task=2 .

“ICDAR2019 - ReCTS - Robust Reading Competition.” 2019. https://rrc.cvc.uab.es/files/ICDAR2019-ReCTS.pdf .

“ICDAR2019 Robust Reading Competition.” 2019. https://rrc.cvc.uab.es/files/ICDAR2019-ArT.pdf .

Ickes, Chelsea M., Soo-Yeun Lee, and Keith R. Cadwallader. 2017. “Novel Creation of a Rum Flavor Lexicon Through the Use of Web-Based Material.” Journal of Food Science 82(5): 1216–1223. https://doi.org/10.1111/1750-3841.13707 .

Ingram, William. 2022. “Researcher Profile | Virginia Tech.” https://experts.vt.edu/5675-william-a-ingram .

Intuition Engineering. 2018. “Deep Learning for Specific Information Extraction from Unstructured Texts.” Medium . July 31. https://towardsdatascience.com/deep-learning-for-specific-information-extraction-from-unstructured-texts-12c5b9dceada .

Journal of Machine Learning Research. 2001. “History of JMLR.” https://www.jmlr.org/history.html .

“Journal Rankings on Artificial Intelligence.” 2022. Accessed June 22. https://www.scimagojr.com/journalrank.php?category=1702&order=h&ord=desc .

Lamba, Manika, and Margam Madhusudhan. 2021a. Text Mining for Information Professionals: An Uncharted Territory . Cham: Springer International Publishing. https://doi.org/10.1007/978-3-030-85085-2 .

———. 2021b. “The Computational Library.” In Text Mining for Information Professionals: An Uncharted Territory , 1–31. Cham: Springer International Publishing. https://doi.org/10.1007/978-3-030-85085-2_1 .

Lewis, David D. 2001. “On Less Restrictive Access to Archival Research Literature.” SIGIR Forum 35(2). http://sigir.org/files/forum/F2001/sigirFall01Letters.html .

Lin, Chin-Yew. 2004. “ROUGE: A Package for Automatic Evaluation of Summaries.” In Text Summarization Branches Out , 74–81. Barcelona, Spain: Association for Computational Linguistics. https://aclanthology.org/W04-1013 .

Longmeier, Meris M., Daniel S. Dotson, and Julia N. Armstrong. 2022. “Fostering a Tech Culture through Campus Collaborations: A Case Study of a Hackathon and Library Partnership.” Science and Technology Libraries 41(2): 152–173. https://doi.org/10.1080/0194262x.2021.1963388 .

Maaten, Laurens van der, and Geoffrey Hinton. 2008. “Viualizing Data Using T-SNE.” Journal of Machine Learning Research 9(November): 2579–2605.

“Machine Learning Education.” 2022. TensorFlow . Accessed June 23. https://www.tensorflow.org/resources/learn-ml .

“Mask R-CNN for Object Detection and Segmentation.” 2017. Matterport, Inc. https://github.com/matterport/Mask_RCNN .

Miller, Chreston. 2018. “Process_data_for_pointer_summrizer.” https://github.com/chmille3/process_data_for_pointer_summrizer .

Miller, Chreston, Leah Hamilton, and Jacob Lahne. 2021. “Sensory Descriptor Analysis of Whisky Lexicons through the Use of Deep Learning.” Foods 10(7): 1633. https://doi.org/10.3390/foods10071633 .

Nallapati, Ramesh, Bowen Zhou, Cicero Nogueira dos Santos, Caglar Gulcehre, and Bing Xiang. 2016. “Abstractive Text Summarization Using Sequence-to-Sequence RNNs and Beyond.” arXiv. https://doi.org/10.48550/ARXIV.1602.06023 .

“OCLC Research.” 2022. OCLC . November 1. https://www.oclc.org/research/home.html .

Pennington, Jeffrey, Richard Socher, and Christopher D Manning. 2014. “GloVe: Global Vectors for Word Representation.” https://nlp.stanford.edu/pubs/glove.pdf .

Ramachandran, Shalini, Steven Matthew Cutchin, and Sheree Fu. 2021. “Raising Algorithm Bias Awareness among Computer Science Students through Library and Computer Science Instruction.” In 2021 ASEE Virtual Annual Conference Content Access . https://peer.asee.org/37634 .

Reingart, Mariano. 2013. “Pyfpdf: FPDF for Python.” https://github.com/reingart/pyfpdf .

Rose, Stuart, Dave Engel, Nick Cramer, and Wendy Cowley. 2010. “Automatic Keyword Extraction from Individual Documents.” In Text Mining , edited by Michael W. Berry and Jacob Kogan, 1–20. Chichester, UK: John Wiley & Sons, Ltd. https://doi.org/10.1002/9780470689646.ch1 .

See, Abi. 2017. “Abisee/Pointer-Generator.” https://github.com/abisee/pointer-generator .

See, Abigail, Peter J. Liu, and Christopher D. Manning. 2017. “Get To The Point: Summarization with Pointer-Generator Networks.” arXiv. https://doi.org/10.48550/ARXIV.1704.04368 .

“Submissions by Category since 2009+ | ArXiv E-Print Repository.” 2022. Accessed June 22. https://arxiv.org/about/reports/submission_category_by_year .

Sutskever, Ilya, Oriol Vinyals, and Quoc V Le. 2014. “Sequence to Sequence Learning with Neural Networks.” In Advances in Neural Information Processing Systems . Vol. 27. Curran Associates, Inc. https://proceedings.neurips.cc/paper/2014/hash/a14ac55a4f27472c5d894ec1c3c743d2-Abstract.html .

“Tesseract OCR.” 2014. C++. tesseract-ocr. https://github.com/tesseract-ocr/tesseract .

“The Oris Glisson Historic Costume and Textile Collection.” 2022. Accessed June 17. https://liberalarts.vt.edu/content/liberalarts_vt_edu/en/departments-and-schools/apparel-housing-and-resource-management/experience/collections/the-oris-glisson-historic-costume-and-textile-collection.html .

Vaswani, Ashish, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. “Attention Is All You Need.” Advances in Neural Information Processing Systems 30. https://proceedings.neurips.cc/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html .

Woelfle, Michael, Piero Olliaro, and Matthew H. Todd. 2011. “Open Science Is a Research Accelerator.” Nature Chemistry 3(10): 745–748. https://doi.org/10.1038/nchem.1149 .

https://github.com/ ↩

https://huggingface.co/ ↩

https://www.kaggle.com/ ↩

https://www.tensorflow.org/ ↩

https://clova.ai/ko ↩

https://www.pytorch.org ↩

https://keras.io/ ↩

Harvard-Style Citation

Miller, C , Hamilton, L & Lahne, J. (2023) 'The Open Science of Deep Learning: Three Case Studies', Journal of eScience Librarianship . 12(1) :e626. doi: 10.7191/jeslib.626

Show: Vancouver Citation Style | APA Citation Style

Vancouver-Style Citation

Miller, C , Hamilton, L & Lahne, J. The Open Science of Deep Learning: Three Case Studies. Journal of eScience Librarianship. 2023 2; 12(1) :e626. doi: 10.7191/jeslib.626

Show: Harvard Citation Style | APA Citation Style

APA-Style Citation

Miller, C Hamilton, L & Lahne, J. (2023, 2 15). The Open Science of Deep Learning: Three Case Studies. Journal of eScience Librarianship 12(1) :e626. doi: 10.7191/jeslib.626

Show: Harvard Citation Style | {% trans 'Vancouver Citation Style' %}

Non Specialist Summary

This article has no summary

12 Deep Learning Use Cases / Applications in Healthcare [2024]

case study on deep learning

The computing capability of deep learning models has enabled fast, accurate and efficient operations in healthcare. Deep learning networks are transforming patient care and they have a fundamental role for health systems in clinical practice. Computer vision, natural language processing, reinforcement learning are the most commonly used deep learning techniques in healthcare.

IDC claims that:

  • Research in the pharma industry is one of the fastest growing use cases
  • Global spending on AI will be more than $110 billion in 2024

Patient Care

1. medical imaging.

Image recognition and object detection are used in Magnetic Resonance (MR) and Computed tomography (CT) processes for image segmentation, disease detection & prediction. Deep learning models can make effective interpretations by a combination of aspects of imaging data, for example, tissue size, volume, and shape. These models can flag important areas in images. For example, deep learning algorithms are used for diabetic retinopathy detection, early detection of Alzheimer and ultrasound detection of breast nodules. Thanks to new advances in deep learning, most pathology and radiology images can be investigated in the future.

Deep learning algorithms simplify complex data analysis, so abnormalities are determined and prioritized more precisely. The insights that convolutional neural networks (CNNs) provide, help medical professionals to notice the health issues of their patients on time and more accurately. For example, CNNs identified melanoma disease in dermatology images with more than 10% accuracy than experts according to a study in 2018.

You can also read our article on Computer vision in the healthcare sector and companies in AI-powered medical imaging .

2. Healthcare data analytics

Deep learning models can analyze electronic health records (EHR) that contain structured and unstructured data, including clinical notes, laboratory test results, diagnosis, and medications at exceptional speeds with the most possible accuracy.

Also, smartphones and wearable devices provide useful information about lifestyle. They have the potential to transform data by using mobile apps to monitor medical risk factors for deep learning models. In 2019, Current Health’s AI wearable device became one of the first AI medical monitoring wearables approved by Food and Drug Administration (FDA) for use at home. This device can measure the pulse, respiration, oxygen saturation, temperature, and mobility of patients.

Feel free to read our examples on Healthcare Analytics for more.

3. Mental health chatbots

The use of AI-based mental health apps (including chatbots) such as Happify, Moodkit, Woebot, Wysa is increasing. Some of these chatbots can leverage deep learning models for more realistic conversations with patients. A study by Stanford University shows that an intelligent conversational agent can significantly decrease depression and anxiety symptoms in students and it is an efficient and engaging way to deliver mental health support.

4. Personalized medical treatments

Deep learning solutions allow healthcare organizations to deliver personalized patient care by analyzing patients’ medical history, symptoms, and tests. Natural language processing (NLP) provides insights from free-text medical information for most relevant medical treatments.

5. Prescription audit

Deep learning models can audit prescriptions vs patient health records to identify and correct possible diagnostic errors or errors in prescription.

6. Responding to patient queries

Deep learning-based chatbots support healthcare professionals or patients themselves to identify patterns in patient symptoms.

Health Insurance

7. underwriting.

Deep learning models help insurance companies to make offers to their customers by powerful predictive analytics. For more, check our article on how AI is used to improve underwriting processes .

8. Fraud detection

Also, deep learning algorithms identify medical insurance fraud claims by analyzing fraudulent behaviors and health data from different resources such as claims history, hospital-related information, and patient attributes.

Research & Development

9. drug discovery.

Contributions of deep learning models in the discovery and interaction prediction of drugs have been growing with new technological advances. Deep learning algorithms are able to identify viable drug combinations by processing genomic, clinical, and population data rapidly. Researchers in the pharmaceutical industry take advantage of deep learning toolkits to focus on patterns in these large data sets.

10. Genomics analysis

Deep learning models increase interpretability and provide a better understanding of biological data. Complex data analyzing capabilities of deep learning models support scientists while they study the interpretation of genetic variation and genome-based therapeutic development. CNNs are commonly used and they enable scientists to get attributes from fixed-size DNA sequence windows.

11. Mental health research

Researchers are trying to improve clinical practice in mental health by using deep learning models. For example, there are ongoing academic studies about understanding the effects of mental illness and other disorders on the brain by using deep neural networks. Researchers say that trained deep learning models can provide better results in some areas compared to standard machine learning models. For example, deep learning algorithms can learn to determine meaningful brain biomarkers.

Another study aims to build a cost-effective and digital data-driven and clinical decision support system in mental health with machine learning capabilities.

12. Covid-19

Usage of deep learning models has gained importance with the global COVID-19 outbreak. Researchers have started to study deep learning applications for

  • early detection of Covid-19
  • analyzing of Chest X-ray (CXR) Chest CT images
  • predicting intensive care unit admission
  • helping to find potential patients who have high risk for Covid-19
  • estimating need for mechanical ventilation

If you are ready to use deep learning in your business, we prepared a data driven list of companies offering deep learning platforms . Also, feel free to check out our AI use cases in healthcare research .

You can also check the following data annotation services and tools list to select the option that best fits your business needs:

  • Open-source data labeling platforms
  • Data annotation services
  • Medical data annotation tools
  • Video annotation tools

Further reading

  • data labeling
  • data annotation
  • video annotation
  • video annotation tools

If you need help in choosing among deep learning vendors who can help you get started, let us know:

This article was drafted by former AIMultiple industry analyst Ayşegül Takımoğlu.

case study on deep learning

Cem has been the principal analyst at AIMultiple since 2017. AIMultiple informs hundreds of thousands of businesses (as per similarWeb) including 60% of Fortune 500 every month. Cem's work has been cited by leading global publications including Business Insider , Forbes, Washington Post , global firms like Deloitte , HPE, NGOs like World Economic Forum and supranational organizations like European Commission . You can see more reputable companies and media that referenced AIMultiple. Throughout his career, Cem served as a tech consultant, tech buyer and tech entrepreneur. He advised businesses on their enterprise software, automation, cloud, AI / ML and other technology related decisions at McKinsey & Company and Altman Solon for more than a decade. He also published a McKinsey report on digitalization. He led technology strategy and procurement of a telco while reporting to the CEO. He has also led commercial growth of deep tech company Hypatos that reached a 7 digit annual recurring revenue and a 9 digit valuation from 0 within 2 years. Cem's work in Hypatos was covered by leading technology publications like TechCrunch and Business Insider . Cem regularly speaks at international technology conferences. He graduated from Bogazici University as a computer engineer and holds an MBA from Columbia Business School.

To stay up-to-date on B2B tech & accelerate your enterprise:

Next to Read

Synthetic data for computer vision: benefits & examples in 2024, healthcare analytics adoption model: in-depth guide in 2024, 43 healthtech ai vendors by area of focus & geography [2024].

Your email address will not be published. All fields are required.

Related research

Therapist Chatbots: Top Use Cases, Challenges & Best Practices

Therapist Chatbots: Top Use Cases, Challenges & Best Practices

Healthcare Analytics: Importance & Market Landscape in 2024

Healthcare Analytics: Importance & Market Landscape in 2024

Deep Learning for Recommender Systems: A Netflix Case Study

  • Harald Steck Netflix
  • Linas Baltrunas Netflix
  • Ehtsham Elahi Netflix
  • Dawen Liang Netflix
  • Yves Raimond Netflix
  • Justin Basilico Netflix

Deep learning has profoundly impacted many areas of machine learning. However, it took a while for its impact to be felt in the field of recommender systems. In this article, we outline some of the challenges encountered and lessons learned in using deep learning for recommender systems at Netflix. We first provide an overview of the various recommendation tasks on the Netflix service. We found that different model architectures excel at different tasks. Even though many deep-learning models can be understood as extensions of existing (simple) recommendation algorithms, we initially did not observe significant improvements in performance over well-tuned non-deep-learning approaches. Only when we added numerous features of heterogeneous types to the input data, deep-learning models did start to shine in our setting. We also observed that deep-learning methods can exacerbate the problem of offline–online metric (mis-)alignment. After addressing these challenges, deep learning has ultimately resulted in large improvements to our recommendations as measured by both offline and online metrics. On the practical side, integrating deep-learning toolboxes in our system has made it faster and easier to implement and experiment with both deep-learning and non-deep-learning approaches for various recommendation tasks. We conclude this article by summarizing our take-aways that may generalize to other applications beyond Netflix.

Recommender Systems, by James Gary

How to Cite

  • Endnote/Zotero/Mendeley (RIS)
  • The author(s) warrants that they are the sole author and owner of the copyright in the above article/paper, except for those portions shown to be in quotations; that the article/paper is original throughout; and that the undersigned right to make the grants set forth above is complete and unencumbered.
  • The author(s) agree that if anyone brings any claim or action alleging facts that, if true, constitute a breach of any of the foregoing warranties, the author(s) will hold harmless and indemnify AAAI, their grantees, their licensees, and their distributors against any liability, whether under judgment, decree, or compromise, and any legal fees and expenses arising out of that claim or actions, and the undersigned will cooperate fully in any defense AAAI may make to such claim or action. Moreover, the undersigned agrees to cooperate in any claim or other action seeking to protect or enforce any right the undersigned has granted to AAAI in the article/paper. If any such claim or action fails because of facts that constitute a breach of any of the foregoing warranties, the undersigned agrees to reimburse whomever brings such claim or action for expenses and attorneys’ fees incurred therein.
  • Author(s) retain all proprietary rights other than copyright (such as patent rights).
  • Author(s) may make personal reuse of all or portions of the above article/paper in other works of their own authorship.
  • Author(s) may reproduce, or have reproduced, their article/paper for the author’s personal use, or for company use provided that original work is property cited, and that the copies are not used in a way that implies AAAI endorsement of a product or service of an employer, and that the copies per se are not offered for sale. The foregoing right shall not permit the posting of the article/paper in electronic or digital form on any computer network, except by the author or the author’s employer, and then only on the author’s or the employer’s own web page or ftp site. Such web page or ftp site, in addition to the aforementioned requirements of this Paragraph, must provide an electronic reference or link back to the AAAI electronic server, and shall not post other AAAI copyrighted materials not of the author’s or the employer’s creation (including tables of contents with links to other papers) without AAAI’s written permission.
  • Author(s) may make limited distribution of all or portions of their article/paper prior to publication.
  • In the case of work performed under U.S. Government contract, AAAI grants the U.S. Government royalty-free permission to reproduce all or portions of the above article/paper, and to authorize others to do so, for U.S. Government purposes.
  • In the event the above article/paper is not accepted and published by AAAI, or is withdrawn by the author(s) before acceptance by AAAI, this agreement becomes null and void.

Information

  • For Readers
  • For Authors

Developed By

Part of the PKP Publishing Services Network

Copyright © 2021, Association for the Advancement of Artificial Intelligence. All rights reserved.

More information about the publishing system, Platform and Workflow by OJS/PKP.

  • Open access
  • Published: 08 June 2020

Deep learning in finance and banking: A literature review and classification

  • Jian Huang 1 ,
  • Junyi Chai   ORCID: orcid.org/0000-0003-1560-845X 2 &
  • Stella Cho 2  

Frontiers of Business Research in China volume  14 , Article number:  13 ( 2020 ) Cite this article

63k Accesses

88 Citations

70 Altmetric

Metrics details

Deep learning has been widely applied in computer vision, natural language processing, and audio-visual recognition. The overwhelming success of deep learning as a data processing technique has sparked the interest of the research community. Given the proliferation of Fintech in recent years, the use of deep learning in finance and banking services has become prevalent. However, a detailed survey of the applications of deep learning in finance and banking is lacking in the existing literature. This study surveys and analyzes the literature on the application of deep learning models in the key finance and banking domains to provide a systematic evaluation of the model preprocessing, input data, and model evaluation. Finally, we discuss three aspects that could affect the outcomes of financial deep learning models. This study provides academics and practitioners with insight and direction on the state-of-the-art of the application of deep learning models in finance and banking.

Introduction

Deep learning (DL) is an advanced technique of machine learning (ML) based on artificial neural network (NN) algorithms. As a promising branch of artificial intelligence, DL has attracted great attention in recent years. Compared with conventional ML techniques such as support vector machine (SVM) and k-nearest neighbors (kNN), DL possesses advantages of the unsupervised feature learning, a strong capability of generalization, and a robust training power for big data. Currently, DL has been applied comprehensively in classification and prediction tasks, computer visions, image processing, and audio-visual recognition (Chai and Li 2019 ). Although DL was developed in the field of computer science, its applications have penetrated diversified fields such as medicine, neuroscience, physics and astronomy, finance and banking (F&B), and operations management (Chai et al. 2013 ; Chai and Ngai 2020 ). The existing literature lacks a good overview of DL applications in F&B fields. This study attempts to bridge this gap.

While DL is the focus of computer vision (e.g., Elad and Aharon 2006 ; Guo et al. 2016 ) and natural language processing (e.g., Collobert et al. 2011 ) in the mainstream, DL applications in F&B are developing rapidly. Shravan and Vadlamani (2016) investigated the tools of text mining for F&B domains. They examined the representative ML algorithms, including SVM, kNN, genetic algorithm (GA), and AdaBoost. Butaru et al. ( 2016 ) compared performances of DL algorithms, including random forests, decision trees, and regularized logistic regression. They found that random forests gained the highest classification accuracy in the delinquency status.

Cavalcante et al. ( 2016 ) summarized the literature published from 2009 to 2015. They analyzed DL models, including multi-layer perceptron (MLP) (a fast library for approximate nearest neighbors), Chebyshev functional link artificial NN, and adaptive weighting NN. Although the study constructed a prediction framework in financial trading, some notable DL techniques such as long short-term memory (LSTM) and reinforcement learning (RL) models are neglect. Thus, the framework cannot ascertain the optimal model in a specific condition.

The reviews of the existing literature are either incomplete or outdated. However, our study provides a comprehensive and state-of-the-art review that could capture the relationships between typical DL models and various F&B domains. We identified critical conditions to limit our collection of articles. We employed academic databases in Science Direct, Springer-Link Journal, IEEE Xplore, Emerald, JSTOR, ProQuest Database, EBSCOhost Research Databases, Academic Search Premier, World Scientific Net, and Google Scholar to search for articles. We used two groups of keywords for our search. One group is related to the DL, including “deep learning,” “neural network,” “convolutional neural networks” (CNN), “recurrent neural network” (RNN), “LSTM,” and “RL.” The other group is related to finance, including “finance,” “market risk,” “stock risk,” “credit risk,” “stock market,” and “banking.” It is important to conduct cross searches between computer-science-related and finance-related literature. Our survey exclusively focuses on the financial application of DL models rather than other DL models like SVM, kNN, or random forest. The time range of our review was set between 2014 and 2018. In this stage, we collected more than 150 articles after cross-searching. We carefully reviewd each article and considered whether it is worthy of entering our pool of articles for review. We removed the articles if they are not from reputable journals or top professional conferences. Moreover, articles were discarded if the details of financial DL models presented were not clarified. Thus, 40 articles were selected for this review eventually.

This study contributes to the literature in the following ways. First, we systematically review the state-of-the-art applications of DL in F&B fields. Second, we summarize multiple DL models regarding specified F&B domains and identify the optimal DL model of various application scenarios. Our analyses rely on the data processing methods of DL models, including preprocessing, input data, and evaluation rules. Third, our review attempts to bridge the technological and application levels of DL and F&B, respectively. We recognize the features of various DL models and highlight their feasibility toward different F&B domains. The penetration of DL into F&B is an emerging trend. Researchers and financial analysts should know the feasibilities of particular DL models toward a specified financial domain. They usually face difficulties due to the lack of connections between core financial domains and numerous DL models. This study will fill this literature gap and guide financial analysts.

The rest of this paper is organized as follows. Section 2 provides a background of DL techniques. Section 3 introduces our research framework and methodology. Section 4 analyzes the established DL models. Section 5 analyzes key methods of data processing, including data preprocessing and data inputs. Section 6 captures appeared criteria for evaluating the performance of DL models. Section 7 provides a general comparison of DL models against identified F&B domains. Section 8 discusses the influencing factors in the performance of financial DL models. Section 9 concludes and outlines the scope for promising future studies.

Background of deep learning

Regarding DL, the term “deep” presents the multiple layers that exist in the network. The history of DL can be traced back to stochastic gradient descent in 1952, which is employed for an optimization problem. The bottleneck of DL at that time was the limit of computer hardware, as it was very time-consuming for computers to process the data. Today, DL is booming with the developments of graphics processing units (GPUs), dataset storage and processing, distributed systems, and software such as Tensor Flow. This section briefly reviews the basic concept of DL, including NN and deep neural network (DNN). All of these models have greatly contributed to the applications in F&B.

The basic structure of NN can be illustrated as Y  =  F ( X T w  +  c ) regarding the independent (input) variables X , the weight terms w , and the constant terms c . Y is the dependent variable and X is formed as an n  ×  m matrix for the number of training sample n and the number of input variables m . To apply this structure in finance, Y can be considered as the price of next term, the credit risk level of clients, or the return rate of a portfolio. F is an activation function that is unique and different from regression models. F is usually formulated as sigmoid functions and tanh functions. Other functions can also be used, including ReLU functions, identity functions, binary step functions, ArcTan functions, ArcSinh functions, ISRU functions, ISRLU functions, and SQNL functions. If we combine several perceptrons in each layer and add a hidden layer from Z 1 to Z 4 in the middle, we term a single layer as a neural network, where the input layers are the X s , and the output layers are the Y s . In finance, Y can be considered as the stock price. Moreover, multiple Y s are also applicable; for instance, fund managers often care about future prices and fluctuations. Figure  1 illustrates the basic structure.

figure 1

The structure of NN

Based on the basic structure of NN shown in Fig.  1 , traditional networks include DNN, backpropagation (BP), MLP, and feedforward neural network (FNN). Using these models can ignore the order of data and the significance of time. As shown in Fig.  2 , RNN has a new NN structure that can address the issues of long-term dependence and the order between input variables. As financial data in time series are very common, uncovering hidden correlations is critical in the real world. RNN can be better at solving this problem, as compared to other moving average (MA) methods that have been frequently adopted before. A detailed structure of RNN for a sequence over time is shown in Part B of the Appendix (see Fig. 7 in Appendix ).

figure 2

The abstract structure of RNN

Although RNN can resolve the issue of time-series order, the issue of long-term dependencies remains. It is difficult to find the optimal weight for long-term data. LSTM, as a type of RNN, added a gated cell to overcome long-term dependencies by combining different activation functions (e.g., sigmoid or tanh). Given that LSTM is frequently used for forecasting in the finance literature, we extract LSTM from RNN models and name other structures of standard RNN as RNN(O).

As we focus on the application rather than theoretical DL aspect, this study will not consider other popular DL algorithms, including CNN and RL, as well as Latent variable models such as variational autoencoders and generative adversarial network. Table 6 in Appendix shows a legend note to explain the abbreviations used in this paper. We summarize the relationship between commonly used DL models in Fig.  3 .

figure 3

Relationships of reviewed DL models for F&B domains

Research framework and methodology

Our research framework is illustrated in Fig.  4 . We combine qualitative and quantitative analyses of the articles in this study. Based on our review, we recognize and identify seven core F&B domains, as shown in Fig.  5 . To connect the DL side and the F&B side, we present our review on the application of the DL model in seven F&B domains in Section 4. It is crucial to analyze the feasibility of a DL model toward particular domains. To do so, we provide summarizations in three key aspects, including data preprocessing, data inputs, and evaluation rules, according to our collection of articles. Finally, we determine optimal DL models regarding the identified domains. We further discuss two common issues in using DL models for F&B: overfitting and sustainability.

figure 4

The research framework of this study

figure 5

The identified domains of F&B for DL applications

Figure  5 shows that the application domains can be divided into two major areas: (1) banking and credit risk and (2) financial market investment. The former contains two domains: credit risk prediction and macroeconomic prediction. The latter contains financial prediction, trading, and portfolio management. Prediction tasks are crucial, as emphasized by Cavalcante et al. ( 2016 ). We study this domain from three aspects of prediction, including exchange rate, stock market, and oil price. We illustrate this structure of application domains in F&B.

Figure  6 shows a statistic in the listed F&B domains. We illustrate the domains of financial applications on the X-axis and count the number of articles on the Y-axis. Note that a reviewed article could cover more than one domain in this figure; thus, the sum of the counts (45) is larger than the size of our review pool (40 articles). As shown in Fig.  6 , stock marketing prediction and trading dominate the listed domains, followed by exchange rate prediction. Moreover, we found two articles on banking credit risk and two articles on portfolio management. Price prediction and macroeconomic prediction are two potential topics that deserve more studies.

figure 6

A count of articles over seven identified F&B domains

Application of DL models in F&B domains

Based on our review, six types of DL models are reported. They are FNN, CNN, RNN, RL, deep belief networks (DBN), and restricted Boltzmann machine (RBM). Regarding FNN, several papers use the alternative terms of backpropagation artificial neural network (ANN), FNN, MLP, and DNN. They have an identical structure. Regarding RNN, one of its well-known models in the time-series analysis is called LSTM. Nearly half of the reviewed articles apply FNN as the primary DL technique. Nine articles apply LSTM, followed by eight articles for RL, and six articles for RNN. Minor ones that are applied in F&B include CNN, DBM, and RBM. We count the number of articles that use various DL models in seven F&B domains, as shown in Table  1 . FNN is the principal model used in exchange rate, price, and macroeconomic predictions, as well as banking default risk and credit. LSTM and FNN are two kinds of popular models for stock market prediction. Differently, RL and FNN are frequently used regarding stock trading. FNN, RL, and simple RNN can be conducted in portfolio management. FNN is the primary model in macroeconomic and banking risk prediction. CNN, LSTM, and RL are emerging research approaches in banking risk prediction. The detailed statistics that contain specific articles can be found in Table 5 in Appendix .

Exchange rate prediction

Shen et al. ( 2015 ) construct an improved DBN model by including RBM and find that their model outperforms the random walk algorithm, auto-regressive-moving-average (ARMA), and FNN with fewer errors. Zheng et al. ( 2017 ) examine the performance of DBN and find that the DBN model estimates the exchange rate better than FNN model does. They find that a small number of layer nodes engender a more significant effect on DBN.

Several scholars believe that a hybrid model should have better performance. Ravi et al. ( 2017 ) contribute a hybrid model by using MLP (FNN), chaos theory, and multi-objective evolutionary algorithms. Their Chaos+MLP + NSGA-II model Footnote 1 has a mean squared error (MSE) with 2.16E-08 that is very low. Several articles point out that only a complicated neural network like CNN can gain higher accuracy. For example, Galeshchuk and Mukherjee ( 2017 ) conduct experiments and claim that a single hidden layer NN or SVM performs worse than a simple model like moving average (MA). However, they find that CNN could achieve higher classification accuracy in predicting the direction of the change of exchange rate because of successive layers of DNN.

Stock market prediction

In stock market prediction, some studies suggest that market news may influence the stock price and DL model, such as using a magic filter to extract useful information for price prediction. Matsubara et al. ( 2018 ) extract information from the news and propose a deep neural generative model to predict the movement of the stock price. This model combines DNN and a generative model. It suggests that this hybrid approach outperforms SVM and MLP.

Minh et al. ( 2017 ) develop a novel framework with two streams combining the gated recurrent unit network and the Stock2vec. It employs a word embedding and sentiment training system on financial news and the Harvard IV-4 dataset. They use the historical price and news-based signals from the model to predict the S&P500 and VN-index price directions. Their model shows that the two-stream gated recurrent unit is better than the gated recurrent unit or the LSTM. Jiang et al. ( 2018 ) establish a recurrent NN that extracts the interaction between the inner-domain and cross-domain of financial information. They prove that their model outperforms the simple RNN and MLP in the currency and stock market. Krausa and Feuerriegel ( 2017 ) propose that they can transform financial disclosure into a decision through the DL model. After training and testing, they point out that LSTM works better than the RNN and conventional ML methods such as ridge regression, Lasso, elastic net, random forest, SVR, AdaBoost, and gradient boosting. They further pre-train words embeddings with transfer learning (Krausa and Feuerriegel 2017 ). They conclude that better performance comes from LSTM with word embeddings. In the sentiment analysis, Sohangir et al. ( 2018 ) compares LSTM, doc2vec, and CNN to evaluate the stock opinions on the StockTwits. They conclude that CNN is the optimal model to predict the sentiment of authors. This result may be further applied to predict the stock market trend.

Data preprocessing is conducted to input data into the NN. Researchers may apply numeric unsupervised methods of feature extraction, including principal component analysis, autoencoder, RBM, and kNN. These methods can reduce the computational complexity and prevent overfitting. After the input of high-frequency transaction data, Chen et al. ( 2018b ) establish a DL model with an autoencoder and an RBM. They compare their model with backpropagation FNN, extreme learning machine, and radial basis FNN. They claim that their model can better predict the Chinese stock market. Chong et al. ( 2017 ) apply the principal component analysis (PCA) and RBM with high-frequency data of the South Korean market. They find that their model can explain the residual of the autoregressive model. The DL model can thus extract additional information and improve prediction performance. More so, Singh and Srivastava ( 2017 ) describe a model involving 2-directional and 2-dimensional (2D 2 ) PCA and DNN. Their model outperforms 2D 2 with radial basis FNN and RNN.

For time-series data, sometimes it is difficult to judge the weight of long-term and short-term data. The LSTM model is just for resolving this problem in financial prediction. The literature has attempted to prove that LSTM models are applicable and outperform conventional FNN models. Yan and Ouyang ( 2017 ) apply LSTM to challenge the MLP, SVM, and kNN in predicting a static and dynamic trend. After a wavelet decomposition and a reconstruction of the financial time series, their model can be used to predict a long-term dynamic trend. Baek and Kim ( 2018 ) apply LSTM not only in predicting the price of S&P500 and KOSPI200 but also in preventing overfitting. Kim and Won ( 2018 ) apply LSTM in the prediction of stock price volatility. They propose a hybrid model that combines LSTM with three generalized autoregressive conditional heteroscedasticity (GARCH)-type models. Hernandez and Abad ( 2018 ) argue that RBM is inappropriate for dynamic data modeling in the time-series analysis because it cannot retain memory. They apply a modified RBM model called p -RBM that can retain the memory of p past states. This model is used in predicting market directions of the NASDAQ-100 index. Compared with vector autoregression (VAR) and LSTM, notwithstanding, they find that LSTM is better because it can uncover the hidden structure within the non-linear data while VAR and p -RBM cannot capture the non-linearity in data.

CNN was established to predict the price with a complicated structure. Making the best use of historical price, Dingli and Fournier ( 2017 ) develop a new CNN model. This model can predict next month’s price. Their results cannot surpass other comparable models, such as logistic regression (LR) and SVM. Tadaaki ( 2018 ) applies the financial ratio and converts them into a “grayscale image” in the CNN model. The results reveal that CNN is more efficient than decision trees (DT), SVM, linear discriminant analysis, MLP, and AdaBoost. To predict the stock direction, Gunduz et al. ( 2017 ) establish a CNN model with a so-called specially ordered feature set whose classifier outperforms either CNN or LR.

Stock trading

Many studies adopt the conventional FNN model and try to set up a profitable trading system. Sezer et al. ( 2017 ) combine GA with MLP. Chen et al. ( 2017 ) adopt a double-layer NN and discover that its accuracy is better than ARMA-GARCH and single-layer NN. Hsu et al. ( 2018 ) equip the Black-Scholes model and a three-layer fully-connected feedforward network to estimate the bid-ask spread of option price. They argue that this novel model is better than the conventional Black-Scholes model with lower RMSE. Krauss et al. ( 2017 ) apply DNN, gradient-boosted-trees, and random forests in statistical arbitrage. They argue that their returns outperform the market index S&P500.

Several studies report that RNN and its derivate models are potential. Deng et al. ( 2017 ) extend the fuzzy learning into the RNN model. After comparing their model to different DL models like CNN, RNN, and LSTM, they claim that their model is the optimal one. Fischer and Krauss ( 2017 ) and Bao et al. ( 2017 ) argue that LSTM can create an optimal trading system. Fischer and Krauss ( 2017 ) claim that their model has a daily return of 0.46 and a sharp ratio of 5.8 prior to the transaction cost. Given the transaction cost, however, LSTM’s profitability fluctuated around zero after 2010. Bao et al. ( 2017 ) advance Fischer and Krauss’s ( 2017 ) work and propose a novel DL model (i.e., WSAEs-LSTM model). It uses wavelet transforms to eliminate noise, stacked autoencoders (SAEs) to predict stock price, and LSTM to predict the close price. The result shows that their model outperforms other models such as WLSTM, Footnote 2 LSTM, and RNN in predictive accuracy and profitability.

RL is popular recently despite its complexity. We find that five studies apply this model. Chen et al. ( 2018a ) propose an agent-based RL system to mimic 80% professional trading strategies. Feuerriegel and Prendinger ( 2016 ) convert the news sentiment into the signal in the trading system, although their daily returns and abnormal returns are nearly zero. Chakraborty ( 2019 ) cast the general financial market fluctuation into a stochastic control problem and explore the power of two RL models, including Q-learning Footnote 3 and state-action-reward-state-action (SARSA) algorithm. Both models can enhance profitability (e.g., 9.76% for Q-learning and 8.52% for SARSA). They outperform the buy-and-hold strategy. Footnote 4 Zhang and Maringer ( 2015 ) conduct a hybrid model called GA, with recurrent RL. GA is used to select an optimal combination of technical indicators, fundamental indicators, and volatility indicators. The out-of-sample trading performance is improved due to a significantly positive Sharpe ratio. Martinez-Miranda et al. ( 2016 ) create a new topic of trading. It uses a market manipulation scanner model rather than a trading system. They use RL to model spoofing-and-pinging trading. This study reveals that their model just works on the bull market. Jeong and Kim ( 2018 ) propose a model called deep Q-network that is constructed by RL, DNN, and transfer learning. They use transfer learning to solve the overfitting issue incurred as a result of insufficient data. They argue that the profit yields in this system increase by four times the amount in S&P500, five times in KOSPI, six times in EuroStoxx50, and 12 times in HIS.

Banking default risk and credit

Most articles in this domain focus on FNN applications. Rönnqvist and Sarlin ( 2017 ) propose a model for detecting relevant discussions in texting and extracting natural language descriptions of events. They convert the news into a signal of the bank-distress report. In their back-test, their model reflects the distressing financial event of the 2007–2008 period.

Zhu et al. ( 2018 ) propose a hybrid CNN model with a feature selection algorithm. Their model outperforms LR and random forest in consumer credit scoring. Wang et al. ( 2019 ) consider that online operation data can be used to predict consumer credit scores. They thus convert each kind of event into a word and apply the Event2vec model to transform the word into a vector in the LSTM network. The probability of default yields higher accuracy than other models. Jurgovsky et al. ( 2018 ) employs the LSTM to detect credit card fraud and find that LSTM can enhance detection accuracy.

Han et al. ( 2018 ) report a method that adopts RL to assess the credit risk. They claim that high-dimensional partial differential equations (PDEs) can be reformulated by using backward stochastic differential equations. NN approximates the gradient of the unknown solution. This model can be applied to F&B risk evaluation after considering all elements such as participating agents, assets, and resources, simultaneously.

Portfolio management

Song et al. ( 2017 ) establish a model after combining ListNet and RankNet to make a portfolio. They take a long position for the top 25% stocks and hold the short position for the bottom 25% stocks weekly. The ListNetlong-short model is the optimal one, which can achieve a return of 9.56%. Almahdi and Yang ( 2017 ) establish a better portfolio with a combination of RNN and RL. The result shows that the proposed trading system respond to transaction cost effects efficiently and outperform hedge fund benchmarks consistently.

Macroeconomic prediction

Sevim et al. ( 2014 ) develops a model with a back-propagation learning algorithm to predict the financial crises up to a year before it happened. This model contains three-layer perceptrons (i.e., MLP) and can achieve an accuracy rate of approximately 95%, which is superior to DT and LR. Chatzis et al. ( 2018 ) examine multiple models such as classification tree, SVM, random forests, DNN, and extreme gradient boosting to predict the market crisis. The results show that crises encourage persistence. Furthermore, using DNN increases the classification accuracy that makes global warning systems more efficient.

Price prediction

For price prediction, Sehgal and Pandey ( 2015 ) review ANN, SVM, wavelet, GA, and hybrid systems. They separate the time-series models into stochastic models, AI-based models, and regression models to predict oil prices. They reveal that researchers prevalently use MLP for price prediction.

Data preprocessing and data input

Data preprocessing.

Data preprocessing is conducted to denoise before data training of DL. This section summarizes the methods of data preprocessing. Multiple preprocessing techniques discussed in Part 4 include the principal component analysis (Chong et al. 2017 ), SVM (Gunduz et al. 2017 ), autoencoder, and RBM (Chen et al. 2018b ). There are several additional techniques of feature selection as follows.

Relief: The relief algorithm (Zhu et al. 2018 ) is a simple approach to weigh the importance of the feature. Based on NN algorithms, relief repeats the process for n times and divides each final weight vector by n . Thus, the weight vectors are the relevance vectors, and features are selected if their relevance is larger than the threshold τ .

Wavelet transforms: Wavelet transforms are used to fix the noise feature of the financial time series before feeding into a DL network. It is a widely used technique for filtering and mining single-dimensional signals (Bao et al. 2017 ).

Chi-square: Chi-square selection is commonly used in ML to measure the dependence between a feature and a class label. The representative usage is by Gunduz et al. ( 2017 ).

Random forest: Random forest algorithm is a two-stage process that contains random feature selection and bagging. The representative usage is by Fischer and Krauss ( 2017 ).

Data inputs

Data inputs are an important criterion for judging whether a DL model is feasible for particular F&B domains. This section summarizes the method of data inputs that have been adopted in the literature. Based on our review, five types of input data in the F&B domain can be presented. Table  2 provides a detailed summary of the input variable in F&B domains.

History price: The daily exchange rate can be considered as history price. The price can be the high, low, open, and close price of the stock. Related articles include Bao et al. ( 2017 ), Chen et al. ( 2017 ), Singh and Srivastava ( 2017 ), and Yan and Ouyang ( 2017 ).

Technical index: Technical indexes include MA, exponential MA, MA convergence divergence, and relative strength index. Related articles include Bao et al. ( 2017 ), Chen et al. ( 2017 ), Gunduz et al. ( 2017 ), Sezer et al. ( 2017 ), Singh and Srivastava ( 2017 ), and Yan and Ouyang ( 2017 ).

Financial news: Financial news covers financial message, sentiment shock score, and sentiment trend score. Related articles include Feuerriegel and Prendinger ( 2016 ), Krausa and Feuerriegel ( 2017 ), Minh et al. ( 2017 ), and Song et al. ( 2017 ).

Financial report data: Financial report data can account for items in the financial balance sheet or the financial report data (e.g., return on equity, return on assets, price to earnings ratio, and debt to equity ratio). Zhang and Maringer ( 2015 ) is a representative study on the subject.

Macroeconomic data: This kind of data includes macroeconomic variables. It may affect elements of the financial market, such as exchange rate, interest rate, overnight interest rate, and gross foreign exchange reserves of the central bank. Representative articles include Bao et al. ( 2017 ), Kim and Won ( 2018 ), and Sevim et al. ( 2014 ).

Stochastic data: Chakraborty ( 2019 ) provides a representative implementation.

Evaluation rules

It is critical to judge whether an adopted DL model works well in a particular financial domain. We, thus, need to consider evaluation systems of criteria for gauging the performance of a DL model. This section summarizes the evaluation rules of F&B-oriented DL models. Based on our review, three evaluation rules dominate: the error term, the accuracy index, and the financial index. Table  3 provides a detailed summary. The evaluation rules can be boiled down to the following categories.

Error term: Suppose Y t  +  i and F t  +  i are the real data and the prediction data, respectively, where m is the total number. The following is a summary of the functional formula commonly employed for evaluating DL models.

Mean Absolute Error (MAE): \( {\sum}_{i=1}^m\frac{\left|{Y}_{t+i}-{F}_{t+i}\right|}{m} \) ;

Mean Absolute Percent Error (MAPE): \( \frac{100}{m}{\sum}_{i=1}^m\frac{\left|{Y}_{t+i}-{F}_{t+i}\right|}{Y_{t+i}} \) ;

Mean Squared Error (MSE): \( {\sum}_{i=1}^m\frac{{\left({Y}_{t+i}-{F}_{t+i}\right)}^2}{m} \) ;

Root Mean Squared Error (RMSE): \( \sqrt{\sum_{i=1}^m\frac{{\left({Y}_{t+i}-{F}_{t+i}\right)}^2}{m}} \) ;

Normalized Mean Square Error (NMSE): \( \frac{1}{m}\frac{\sum {\left({Y}_{t+i}-{F}_{t+i}\right)}^2}{\mathit{\operatorname{var}}\left({Y}_{t+i}\right)} \) .

Accuracy index: According to Matsubara et al. ( 2018 ), we use TP, TN, FP, and FN to represent the number of true positives, true negatives, false positives, and false negatives, respectively, in a confusion matrix for classification evaluation. Based on our review, we summarize the accuracy indexes as follows.

Directional Predictive Accuracy (DPA): \( \frac{1}{N}{\sum}_{t=1}^N{D}_t \) , if ( Y t  + 1  −  Y t ) × ( F t  + 1  −  Y t ) ≥ 0, D t  = 1, otherwise, D t  = 0;

Actual Correlation Coefficient (ACC): \( \frac{TP+ TN}{TP+ FP+ FN+ TN} \) ;

Matthews Correlation Coefficient (MCC): \( \frac{TP\times TN- FP\times FN}{\sqrt{\left( TP+ FP\right)\left( TP+ FN\right)\left( TN+ FP\right)\left( TN+ FN\right)}} \) .

Financial index: Financial indexes involve total return, Sharp ratio, abnormal return, annualized return, annualized number of transaction, percentage of success, average profit percent per transaction, average transaction length, maximum profit percentage in the transaction, maximum loss percentage in the transaction, maximum capital, and minimum capital.

For the prediction by regressing the numeric dependent variables (e.g., exchange rate prediction or stock market prediction), evaluation rules are mostly error terms. For the prediction by classification in the category data (e.g., direction prediction on oil price), the accuracy indexes are widely conducted. For stock trading and portfolio management, financial indexes are the final evaluation rules.

General comparisons of DL models

This study identifies the most efficient DL model in each identified F&B domain. Table  4 illustrates our comparisons of the error terms in the pool of reviewed articles. Note that “A > B” means that the performance of model A is better than that of model B. “A + B” indicates the hybridization of multiple DL models.

At this point, we have summarized three methods of data processing in DL models against seven specified F&B domains, including data preprocessing, data inputs, and evaluation rules. Apart from the technical level of DL, we find the following:

NN has advantages in handling cross-sectional data;

RNN and LSTM are more feasible in handling time series data;

CNN has advantages in handling the data with multicollinearity.

Apart from application domains, we can induce the following viewpoints. Cross-sectional data usually appear in exchange rate prediction, price prediction, and macroeconomic prediction, for which NN could be the most feasible model. Time series data usually appear in stock market prediction, for which LSTM and RNN are the best options. Regarding stock trading, a feasible DL model requires the capabilities of decision and self-learning, for which RL can be the best. Moreover, CNN is more suitable for the multivariable environment of any F&B domains. As shown in the statistics of the Appendix , the frequency of using corresponding DL models corresponds to our analysis above. Selecting proper DL models according to the particular needs of financial analysis is usually challenging and crucial. This study provides several recommendations.

We summarize emerging DL models in F&B domains. Nevertheless, can these models refuse the efficient market hypothesis (EMH)? Footnote 5 According to the EMH, the financial market has its own discipline. There is no long-term technical tool that could outperform an efficient market. If so, using DL models may not be practical in long-term trading as it requires further experimental tests. However, why do most of the reviewed articles argue that their DL models of trading outperform the market returns? This argument has challenged the EMH. A possible explanation is that many DL algorithms are still challenging to apply in the real-world market. The DL models may raise trading opportunities to gain abnormal returns in the short-term. In the long run, however, many algorithms may lose their superiority, whereas EMH still works as more traders recognize the arbitrage gap offered by these DL models.

This section discusses three aspects that could affect the outcomes of DL models in finance.

Training and validation of data processing

The size of the training set.

The optimal way to improve the performance of models is by enhancing the size of the training data. Bootstrap can be used for data resampling, and generative adversarial network (GAN) can extend the data features. However, both can recognize numerical parts of features. Sometimes, the sample set is not diverse enough; thus, it loses its representativeness. Expanding the data size could make the model more unstable. The current literature reported diversified sizes of training sets. The requirements of data size in the training stage could vary by different F&B tasks.

The number of input factors

Input variables are independent variables. Based on our review, multi-factor models normally perform better than single-factor models in the case that the additional input factors are effective. In the time-series data model, long-term data have less prediction errors than that for a short period. The number of input factors depends on the employment of the DL structure and the specific environment of F&B tasks.

The quality of data

Several methods can be used to improve the data quality, including data cleaning (e.g., dealing with missing data), data normalization (e.g., taking the logarithm, calculating the changes of variables, and calculating the t -value of variables), feature selection (e.g., Chi-square test), and dimensionality reduction (e.g., PCA). Financial DL models require that the input variables should be interpretable in economics. When inputting the data, researchers should clarify the effective variables and noise. Several financial features, such as technical indexes, are likely to be created and added into the model.

Selection on structures of DL models

DL model selection should depend on problem domains and cases in finance. NN is suitable for processing cross-sectional data. LSTM and other RNNs are optimal choices for time-series data in prediction tasks. CNN can settle the multicollinearity issue through data compression. Latent variable models like GAN can be better for dimension reduction and clustering. RL is applicable in the cases with judgments like portfolio management and trading. The return levels and outcomes on RL can be affected significantly by environment (observation) definitions, situation probability transfer matrix, and actions.

The setting of objective functions and the convexity of evaluation rules

Objective function selection affects training processes and expected outcomes. For predictions on stock price, low MAE merely reflects the effectiveness of applied models in training; however, it may fail in predicting future directions. Therefore, it is vital for additional evaluation rules for F&B. Moreover, it can be more convenient to resolve the objective functions if they are convex.

The influence of overfitting (underfitting)

Overfitting (underfitting) commonly happens in using DL models, which is clearly unfavorable. A generated model performs perfectly in one case but usually cannot replicate good performance with the same model and identical coefficients. To solve this problem, we have to trade off the bias against variances. Bias posits that researchers prefer to keep it small to illustrate the superiority of their models. Generally, a deeper (i.e., more layered) NN model or neurons can reduce errors. However, it is more time-consuming and could reduce the feasibility of applied DL models.

One solution is to establish validation sets and testing sets for deciding the numbers of layers and neurons. After setting optimal coefficients in the validation set (Chong et al. 2017 ; Sevim et al. 2014 ), the result in the testing sets reveals the level of errors that could mitigate the effect of overfitting. One can input more samples of financial data to check the stability of the model’s performance. This method is known as the early stopping. It stops training more layers in the network once the testing result has achieved an optimal level.

Moreover, regularization is another approach to conquer the overfitting. Chong et al. ( 2017 ) introduces a constant term for the objective function and eventually reduces the variates of the result. Dropout is also a simple method to address overfitting. It reduces the dimensions and layers of the network (Minh et al. 2017 ; Wang et al. 2019 ). Finally, the data cleaning process (Baek and Kim 2018 ; Bao et al. 2017 ), to an extent, could mitigate the impact of overfitting.

Financial models

The sustainability of the model.

According to our reviews, the literature focus on evaluating the performance of historical data. However, crucial problems remain. Given that prediction is always complicated, the problem of how to justify the robustness of the used DL models in the future remains. More so, whether a DL model could survive in dynamic environments must be considered.

The following solutions could be considered. First, one can divide the data into two groups according to the time range; performance can subsequently be checked (e.g., using the data for the first 3 years to predict the performance of the fourth year). Second, the feature selection can be used in the data preprocessing, which could improve the sustainability of models in the long run. Third, stochastic data can be generated for each input variable by fixing them with a confidence interval, after which a simulation to examine the robustness of all possible future situations is conducted.

The popularity of the model

Whether a DL model is effective for trading is subject to the popularity of the model in the financial market. If traders in the same market conduct an identical model with limited information, they may run identical results and adopt the same trading strategy accordingly. Thus, they may lose money because their strategy could sell at a lower price after buying at a higher.

Conclusion and future works

Concluding remarks.

This paper provides a comprehensive survey of the literature on the application of DL in F&B. We carefully review 40 articles refined from a collection of 150 articles published between 2014 and 2018. The review and refinement are based on a scientific selection of academic databases. This paper first recognizes seven core F&B domains and establish the relationships between the domains and their frequently-used DL models. We review the details of each article under our framework. Importantly, we analyze the optimal models toward particular domains and make recommendations according to the feasibility of various DL models. Thus, we summarize three important aspects, including data preprocessing, data inputs, and evaluation rules. We further analyze the unfavorable impacts of overfitting and sustainability when applying DL models and provide several possible solutions. This study contributes to the literature by presenting a valuable accumulation of knowledge on related studies and providing useful recommendations for financial analysts and researchers.

Future works

Future studies can be conducted from the DL technical and F&B application perspectives. Regarding the perspective of DL techniques, training DL model for F&B is usually time-consuming. However, effective training could greatly enhance accuracy by reducing errors. Most of the functions can be simulated with considerable weights in complicated networks. First, one of the future works should focus on data preprocessing, such as data cleaning, to reduce the negative effect of data noise in the subsequent stage of data training. Second, further studies on how to construct layers of networks in the DL model are required, particularly when considering a reduction of the unfavorable effects of overfitting and underfitting. According to our review, the comparisons between the discussed DL models do not hinge on an identical source of input data, which renders these comparisons useless. Third, more testing regarding F&B-oriented DL models would be beneficial.

In addition to the penetration of DL techniques in F&B fields, more structures of DL models should be explored. From the perspective of F&B applications, the following problems need further research to investigate desirable solutions. In the case of financial planning, can a DL algorithm transfer asset recommendations to clients according to risk preferences? In the case of corporate finance, how can a DL algorithm benefit capital structure management and, thus, maximize the values of corporations? How can managers utilize DL technical tools to gauge the investment environment and financial data? How can they use such tools to optimize cash balances and cash inflow and outflow? Until recently, DL models like RL and generative adversarial networks are rarely used. More investigations on constructing DL structures for F&B regarding preferences would be beneficial. Finally, the developments of professional F&B software and system platforms that implement DL techniques are highly desirable.

Availability of data and materials

Not applicable.

In the model, NSGA stands for non-dominated sorting genetic algorithm.

A combination of Wavelet transforms (WT) and long-short term memory (LSTM) is called WLSTM in Bao et al. ( 2017 ).

Q-learning is a model-free reinforcement learning algorithm.

Buy-and-hold is a passive investment strategy in which an investor buys stocks (or ETFs) and holds them for a long period regardless of fluctuations in the market.

EMH was developed from a Ph.D. dissertation by economist Eugene Fama in the 1960s. It says that at any given time, stock prices reflect all available information and trade at exactly their fair value at all times. It is impossible to consistently choose stocks that will beat the returns of the overall stock market. Therefore, this hypothesis implies that the pursuit of market-beating performance is more about chance than it is about researching and selecting the right stocks.

Almahdi, S., & Yang, S. Y. (2017). An adaptive portfolio trading system: A risk-return portfolio optimization using recurrent reinforcement learning with expected maximum drawdown. Expert Systems with Applications, 87 , 267–279.

Article   Google Scholar  

Baek, Y., & Kim, H. Y. (2018). ModAugNet: A new forecasting framework for stock market index value with an overfitting prevention LSTM module and a prediction LSTM module. Expert Systems with Applications, 113 , 457–480.

Bao, W., Yue, J., & Rao, Y. (2017). A deep learning framework for financial time series using stacked autoencoders and long-short-term memory. PLoS One, 12 (7), e0180944.

Butaru, F., Chen, Q., Clark, B., Das, S., Lo, A. W., & Siddique, A. (2016). Risk and risk management in the credit card industry. Journal of Banking & Finance, 72 , 218–239.

Cavalcante, R. C., Brasileiro, R. C., Souza, V. L. F., Nobrega, J. P., & Oliveira, A. L. I. (2016). Computational intelligence and financial markets: A survey and future directions. Expert System with Application, 55 , 194–211.

Chai, J. Y., & Li, A. M. (2019). Deep learning in natural language processing: A state-of-the-art survey. In The proceeding of the 2019 international conference on machine learning and cybernetics (pp. 535–540). Japan: Kobe.

Google Scholar  

Chai, J. Y., Liu, J. N. K., & Ngai, E. W. T. (2013). Application of decision-making techniques in supplier selection: A systematic review of literature. Expert Systems with Applications, 40 (10), 3872–3885.

Chai, J. Y., & Ngai, E. W. T. (2020). Decision-making techniques in supplier selection: Recent accomplishments and what lies ahead. Expert Systems with Applications, 140 , 112903. https://doi.org/10.1016/j.eswa.2019.112903 .

Chakraborty, S. (2019). Deep reinforcement learning in financial markets Retrieved from https://arxiv.org/pdf/1907.04373.pdf . Accessed 04 Apr 2020.

Chatzis, S. P., Siakoulis, V., Petropoulos, A., Stavroulakis, E., & Vlachogiannakis, E. (2018). Forecasting stock market crisis events using deep and statistical machine learning techniques. Expert Systems with Applications, 112 , 353–371.

Chen, C. T., Chen, A. P., & Huang, S. H. (2018a). Cloning strategies from trading records using agent-based reinforcement learning algorithm. In The proceeding of IEEE international conference on agents (pp. 34–37).

Chen, H., Xiao, K., Sun, J., & Wu, S. (2017). A double-layer neural network framework for high-frequency forecasting. ACM Transactions on Management Information Systems, 7 (4), 11.

Chen, L., Qiao, Z., Wang, M., Wang, C., Du, R., & Stanley, H. E. (2018b). Which artificial intelligence algorithm better predicts the Chinese stock market? IEEE Access, 6 , 48625–48633.

Chong, E., Han, C., & Park, F. C. (2017). Deep learning networks for stock market analysis and prediction: Methodology, data representations, and case studies. Expert Systems with Applications, 83 , 187–205.

Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., & Kuksa, P. (2011). Natural language processing (almost) from scratch. Journal of Machine Learning Research, 12 , 2493–2537.

Deng, Y., Bao, F., Kong, Y., Ren, Z., & Dai, Q. (2017). Deep direct reinforcement learning for financial signal representation and trading. IEEE Transactions on Neural Networks and Learning Systems, 28 (3), 653–664.

Dingli, A., & Fournier, K. S. (2017). Financial time series forecasting—A machine learning approach. International Journal of Machine Learning and Computing, 4 , 11–27.

Elad, M., & Aharon, M. (2006). Image denoising via sparse and redundant representations over learned dictionaries. IEEE Transactions on Image Processing, 15 (12), 3736–3745.

Feuerriegel, S., & Prendinger, H. (2016). News-based trading strategies. Decision Support Systems, 90 , 65–74.

Fischer, T., & Krauss, C. (2017). Deep learning with long short-term memory networks for financial market predictions. European Journal of Operational Research, 270 (2), 654–669.

Galeshchuk, S., & Mukherjee, S. (2017). Deep networks for predicting the direction of change in foreign exchange rates. Intelligent Systems in Accounting, Finance and Maangement, 24 (4), 100–110.

Gunduz, H., Yaslan, Y., & Cataltepe, Z. (2017). Intraday prediction of Borsa Istanbul using convolutional neural networks and feature correlations. Knowledge-Based Systems, 137 , 138–148.

Guo, Y., Liu, Y., Oerlemans, A., Lao, S., Wu, S., & Lew, M. S. (2016). Deep learning for visual understanding: A review. Neurocomputing, 187 , 27–48.

Han, J., Jentzen, A., & Weinan, E. (2018). Solving high-dimensional partial differential equations using deep learning. The proceedings of the National Academy of Sciences of the United States of America (PNAS) ; 8505–10).

Hernandez, J., & Abad, A. G. (2018). Learning from multivariate discrete sequential data using a restricted Boltzmann machine model. In The proceeding of IEEE 1st Colombian conference on applications in computational intelligence (ColCACI) (pp. 1–6).

Hsu, P. Y., Chou, C., Huang, S. H., & Chen, A. P. (2018). A market making quotation strategy based on dual deep learning agents for option pricing and bid-ask spread estimation.   The proceeding of IEEE international conference on agents (pp. 99–104).

Jeong, G., & Kim, H. Y. (2018). Improving financial trading decisions using deep Q-learning: Predicting the number of shares, action strategies and transfer learning. Expert Systems with Applications, 117 , 125–138.

Jiang, X., Pan, S., Jiang, J., & Long, G. (2018). Cross-domain deep learning approach for multiple financial market predictions. The proceeding of international joint conference on neural networks (pp. 1–8).

Jurgovsky, J., Granitzer, M., Ziegler, K., Calabretto, S., Portier, P. E., Guelton, L. H., & Caelen, O. (2018). Sequence classification for credit-card fraud detection. Expert Systems with Applications, 100 , 234–245.

Kim, H. Y., & Won, C. H. (2018). Forecasting the volatility of stock price index: A hybrid model integrating LSTM with multiple GARCH-type models. Expert Systems with Applications, 103 , 25–37.

Krausa, M., & Feuerriegel, S. (2017). Decision support from financial disclosures with deep neural networks and transfer learning Retrieved from https://arxiv.org/pdf/1710.03954.pdf Accessed 04 Apr 2020.

Book   Google Scholar  

Krauss, C., Do, X. A., & Huck, N. (2017). Deep neural networks, gradient-boosted trees, random forests: Statistical arbitrage on the S&P500. European Journal of Operational Research, 259 (2), 689–702.

Martinez-Miranda, E., McBurney, P., & Howard, M. J. W. (2016). Learning unfair trading: A market manipulation analysis from the reinforcement learning perspective. In The proceeding of 2016 IEEE conference on evolving and adaptive intelligent systems (EAIS) (pp. 103–109).

Chapter   Google Scholar  

Matsubara, T., Akita, R., & Uehara, K. (2018). Stock price prediction by deep neural generative model of news articles. IEICE Transactions on Information and Systems, 4 , 901–908.

Minh, D. L., Sadeghi-Niaraki, A., Huy, H. D., Min, K., & Moon, H. (2017). Deep learning approach for short-term stock trends prediction based on two-stream gated recurrent unit network. IEEE Access, 6 , 55392–55404.

Ravi, V., Pradeepkumar, D., & Deb, K. (2017). Financial time series prediction using hybrids of chaos theory, multi-layer perceptron and multi-objective evolutionary algorithms. Swarm and Evolutionary Computation, 36 , 136–149.

Rönnqvist, S., & Sarlin, P. (2017). Bank distress in the news describing events through deep learning. Neurocomputing, 264 (15), 57–70.

Sehgal, N., & Pandey, K. K. (2015). Artificial intelligence methods for oil price forecasting: A review and evaluation. Energy System, 6 , 479–506.

Sevim, C., Oztekin, A., Bali, O., Gumus, S., & Guresen, E. (2014). Developing an early warning system to predict currency crises. European Journal of Operational Research, 237 (3), 1095–1104.

Sezer, O. B., Ozbayoglu, M., & Gogdu, E. (2017). A deep neural-network-based stock trading system based on evolutionary optimized technical analysis parameters. Procedia Computer Science, 114 , 473–480.

Shen, F., Chao, J., & Zhao, J. (2015). Forecasting exchange rate using deep belief networks and conjugate gradient method. Neurocomputing, 167 , 243–253.

Singh, R., & Srivastava, S. (2017). Stock prediction using deep learning. Multimedia Tools Application, 76 , 18569–18584.

Sohangir, S., Wang, D., Pomeranets, A., & Khoshgoftaar, T. M. (2018). Big data: Deep learning for financial sentiment analysis. Journal of Big Data, 5 (3), 1–25.

Song, Q., Liu, A., & Yang, S. Y. (2017). Stock portfolio selection using learning-to-rank algorithms with news sentiment. Neurocomputing, 264 , 20–28.

Tadaaki, H. (2018). Bankruptcy prediction using imaged financial ratios and convolutional neural networks. Expert Systems with Applications, 117 , 287–299.

Wang, C., Han, D., Liu, Q., & Luo, S. (2019). A deep learning approach for credit scoring of peer-to-peer lending using attention mechanism LSTM. IEEE Access, 7 , 2161–2167.

Yan, H., & Ouyang, H. (2017). Financial time series prediction based on deep learning. Wireless Personal Communications, 102 , 683–700.

Zhang, J., & Maringer, D. (2015). Using a genetic algorithm to improve recurrent reinforcement learning for equity trading. Computational Economics, 47 , 551–567.

Zheng, J., Fu, X., & Zhang, G. (2017). Research on exchange rate forecasting based on a deep belief network. Neural Computing and Application, 31 , 573–582.

Zhu, B., Yang, W., Wang, H., & Yuan, Y. (2018). A hybrid deep learning model for consumer credit scoring. In The proceeding of international conference on artificial intelligence and big data (pp. 205–208).

Download references

Acknowledgments

The constructive comments of the editor and three anonymous reviewers on an earlier version of this paper are greatly appreciated. The authors are indebted to seminar participants at 2019 China Accounting and Financial Innovation Form at Zhuhai for insightful discussions. The corresponding author thanks the financial supports from BNU-HKBU United International College Research Grant under Grant R202026.

BNU-HKBU United International College Research Grant under Grant R202026.

Author information

Authors and affiliations.

Department of Mathematics, The Hong Kong University of Science and Technology, Hong Kong, China

Division of Business and Management, BNU-HKBU United International College, Zhuhai, China

Junyi Chai & Stella Cho

You can also search for this author in PubMed   Google Scholar

Contributions

JH carried out the collections and analyses of the literature, participated in the design of this study and preliminarily drafted the manuscript. JC initiated the idea and research project, identified the research gap and motivations, carried out the collections and analyses of the literature, participated in the design of this study, helped to draft the manuscript and proofread the manuscript. SC participated in the design of the study and the analysis of the literature, helped to draft the manuscript and proofread the manuscript. The authors read and approved the final manuscript.

Corresponding author

Correspondence to Junyi Chai .

Ethics declarations

Competing interests.

The authors declare that they have no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Part A. Summary of publications in DL and F&B domains

Part b. detailed structure of standard rnn.

The abstract structure of RNN for a sequence cross over time can be extended, as shown in Fig. 7 in Appendix , which presents the inputs as X , the outputs as Y , the weights as w , and the Tanh functions.

figure 7

The detailed structure of RNN

Part C. List of abbreviations

Rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Huang, J., Chai, J. & Cho, S. Deep learning in finance and banking: A literature review and classification. Front. Bus. Res. China 14 , 13 (2020). https://doi.org/10.1186/s11782-020-00082-6

Download citation

Received : 02 September 2019

Accepted : 30 April 2020

Published : 08 June 2020

DOI : https://doi.org/10.1186/s11782-020-00082-6

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Literature review
  • Deep learning

case study on deep learning

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • My Account Login
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 16 December 2021

Deep learning-based landslide susceptibility mapping

  • Mohammad Azarafza 1 ,
  • Mehdi Azarafza 2 ,
  • Haluk Akgün 3 ,
  • Peter M. Atkinson 4 &
  • Reza Derakhshani 5 , 6  

Scientific Reports volume  11 , Article number:  24112 ( 2021 ) Cite this article

13k Accesses

157 Citations

2 Altmetric

Metrics details

  • Natural hazards
  • Solid Earth sciences

Landslides are considered as one of the most devastating natural hazards in Iran, causing extensive damage and loss of life. Landslide susceptibility maps for landslide prone areas can be used to plan for and mitigate the consequences of catastrophic landsliding events. Here, we developed a deep convolutional neural network (CNN–DNN) for mapping landslide susceptibility, and evaluated it on the Isfahan province, Iran, which has not previously been assessed on such a scale. The proposed model was trained and validated using training (80%) and testing (20%) datasets, each containing relevant data on historical landslides, field records and remote sensing images, and a range of geomorphological, geological, environmental and human activity factors as covariates. The CNN–DNN model prediction accuracy was tested using a wide range of statistics from the confusion matrix and error indices from the receiver operating characteristic (ROC) curve. The CNN–DNN model was evaluated comprehensively by comparing it to several state-of-the-art benchmark machine learning techniques including the support vector machine (SVM), logistic regression (LR), Gaussian naïve Bayes (GNB), multilayer perceptron (MLP), Bernoulli Naïve Bayes (BNB) and decision tree (DT) classifiers. The CNN–DNN model for landslide susceptibility mapping was found to predict more accurately than the benchmark algorithms, with an AUC = 90.9%, IRs = 84.8%, MSE = 0.17, RMSE = 0.40, and MAPE = 0.42. The map provided by the CNN–DNN clearly revealed a high-susceptibility area in the west and southwest, related to the main Zagros trend in the province. These findings can be of great utility for landslide risk management and land use planning in the Isfahan province.

Similar content being viewed by others

case study on deep learning

Research on the influence of different sampling resolution and spatial resolution in sampling strategy on landslide susceptibility mapping results

case study on deep learning

Comparative study on landslide susceptibility mapping based on unbalanced sample ratio

case study on deep learning

Measuring landslide vulnerability status of Chukha, Bhutan using deep learning algorithms

Introduction.

Landslides, one of the most common and potentially catastrophic geo-hazards, are complicated geological phenomena that occur in many geospatial environments and geomaterials 1 , 2 , 3 , 4 , 5 . Landslides are considered the second largest geo-hazard globally, causing extensive financial losses annually, according to the United Nations Development Program 6 , 7 , 8 . Current opinion is that the best way to minimise landslide risk is to monitor, assess and pinpoint landslide-prone areas reliably 9 . Thus, mapping landslide-susceptible areas can be essential to manage and restrict the potential impacts of landslides in vulnerable regions 10 , 11 , 12 , 13 , 14 , 15 , 16 , 17 . Landslide susceptibility assessment is not straightforward and generally requires detailed investigation of a range of factors underpinning susceptibility to produce zonation maps which delineate susceptible regions in a spatially explicit manner. Such spatial information on susceptibility can be especially valuable in policy-making and management decision-making to mitigate and reduce the risks related to landsliding 18 , 19 , 20 , 21 .

Landslide susceptibility mapping has been undertaken based on quantitative, semi-quantitative and qualitative methods (which can be further categorised as deterministic, statistic-probabilistic, heuristic, inventory-based, geostatistical and knowledge-based) 22 , 23 , 24 , 25 , 26 , 27 , 28 , 29 , 30 , 31 , 32 . Residual uncertainty within landslide susceptibility assessments has led to the development of more complex approaches to attain acceptable levels of accuracy. The largest sources of uncertainty in susceptibility modelling are related to the inventory database. Geological complexity, geomorphological deformations and land-use and landscape changes are the main causes of the uncertainties 33 . In this regard, development of more accurate models is important. Recently, knowledge-based approaches, namely, machine learning techniques such as logistic regression, support vector machines, random forests, artificial neural networks, and deep neural networks, have been applied for landslide susceptibility mapping to increase mapping accuracy 22 , 34 , 35 , 36 . These methods have improved capabilities concerning process adaptability and precision 37 , 38 .

Shallow learning (e.g., the multilayer perceptron, MLP) is machine learning where the learning is from data described by pre-defined (i.e., manually extracted) features. In deep learning, the feature extraction is computed automatically without manual human intervention. Deep learning methods have gained popularity because they often outperform conventional shallow learning methods by extracting informative features automatically from raw data with little or no pre-processing due to their complex architecture 39 , 40 . Deep learning networks (DNNs) have become extremely popular, including convolutional neural networks (CNNs). The CNN is a regularised version of a supervised learning framework that employs a sequence of mathematical operations arranged in network layers, including the convolution, pooling, batch normalisation, dense, dropout and fully-connected layers 41 . The general CNN architecture is illustrated in Fig.  1 . CNNs are used commonly for object detection, feature extraction, pattern recognition, and land-cover exploration. These deep learning techniques are applied mostly to analyse remote sensing images with special emphasis on detecting and recognising 'events' and pattern recognition problems 42 . In terms of landslide susceptibility, CNNs are suitable for detecting historical landslide locations and landslide hazard analysis 8 . CNNs have also been applied for landslide recognition based on remote sensing images 32 , 37 , 42 , 43 . In relation to landslide susceptibility assessment, research has shown that deep learning is more effective than shallow learning 44 , 45 , 46 , 47 . While the application of CNNs has providedd increased accuracy for landslide susceptibility mapping, it has not provided the desirable or consistent accuracy. Thus, hybrid models are considered in this research.

figure 1

The main CNN–DNN architecture 48 .

This research aimed to assess the suitability of a coupled CNN–DNN neural network for landslide susceptibility analysis. The assessment was undertaken in the Isfahan province, Iran. The folowing objectives were set: ( i ) what are the main triggering factors for landslide occurrence probability?, ( ii ) Can the CNN–DNN predictive model provide more accurate results than regular models? and ( iii ) Can the CNN–DNN model provide the highest accuracy for susceptibility mapping. The CNN–DNN model was evaluated against a series of benchmark machine learning techniques, including the support vector machine (SVM), logistic regression (LR), Gaussian naïve Bayes (GNB), multilayer perceptron (MLP), Bernoulli naïve Bayes (BNB) and decision tree (DT) classifiers. After preparing the landslide 'covariates' (or factors) relevant to landslide occurrence in the study area, the various algorithms were used to predict landslide susceptibility spatially, and areas of high susceptibility were investigated further. The prediction results were tested using confusion matrices (i.e., overall accuracy, precision, recall and F1-score) and receiver operating characteristic curves (ROC).

Analysis method

In deep learning and data mining, the extraction of features plays an important role. These extracted features can be used for classification or prediction with high accuracy. Since spatial prediction (i.e., mapping) is crucial for a range of applications including crisis management, urban planning and geo-hazard assessment (including landslide susceptibility assessment), the coupled CNN and DNN classifier has found wide applicability 8 , 37 , 42 , 49 . In the CNN–DNN classifier, the input data are evaluated by convolution, pooling, batch normalisation, dense, dropout and fully connected layers to predict the outputs (Fig.  1 ). The number of layers can be increased, thus, increasing the learning depth. The input data provide the first layer of evaluation as a data matrix in which each element has a specific feature value. Hence, the input layer is the primary feature map modified and organised by each convolutional layer and unit. These units extract different features from the input data. The first convolutional layer extracts some low-level features (e.g., lines, edges, corners). Further convolutional layers learn iteratively more intricate representations or features. Pooling is a critical manipulation in a CNN. Max-pooling is the most common manipulation amongst the different pooling approaches. Max-pooling aims to divide the feature maps into several rectangular zones and provide the maximum value for each zone 42 . Batch normalisation (or batch -norm) aims to increase the speed, performance and stability of the network. Batch normalisation is used to normalise the input layer by re-centring and re-scaling. The dense or regular densely-connected layer is commonly used as a linear/non-linear layer applied to the input and returned to the output. Fully connected layers connect every neuron in a preceding layer to every neuron in a subsequent layer. This is, in principle, the same as the traditional MLP network 41 . Combining these layers in the sequence can extract the desired features and, thereby, classify the input data into the desired classes.

Knowledge-based approaches have received significant attention in landslide susceptibility analyses where machine learning methods such as the CNN and DNN have provided highly accurate results. These methods, now considered common procedures, are applied to analyse visual imagery for image recognition and classification. CNNs and DNNs are regularised versions of the MLP, consisting of an input and an output layer and multiple hidden layers. The hidden layers of the CNN are typically concluded with a series of convolutional layers with multiplication or another dot product (e.g., the activation function is mostly RELU). On the other hand, the DNN finds the correct mathematical manipulation to transform the input into the output (based on linear or non-linear relationships). Each mathematical manipulation is considered a layer, and complex DNNs have many layers 40 , 41 . Since 2019, the application of the CNNs and DNNs in landslide susceptibility analyses has led to establishment of the potential of deep learning for landslide susceptibility mapping 32 , 42 , 46 . More widely, implementation of the coupled CNN–DNN has led to increased accuracy compared to the implementation of these two methods separately. We, thus, develop a coupled CNN–DNN methodology to assess landslide susceptibility.

Study site and data

Study location.

The study area is located in the Isfahan province of central Iran and covers an area of approximately 106,786 km 2 (Fig.  2 ). Markazi, Qom and Semnan provinces are located to the north, and the Fars and Kohgiluyeh-Boyerahmad provinces are located to the south of the Isfahan province. The city of Isfahan, which is the capital of the Isfahan province, is considered to be the historical, cultural and touristic capital of Iran. The Isfahan province experiences a moderate and dry climate that ranges from 10.6 to 40.6 °C annually (the average annual temperature has been recorded as 16.7 °C). The annual rainfall of Isfahan has been recorded to range from 16.5 to 217.3 mm, with an average annual rainfall of 116.9 mm 50 . Figure  2 provides the location of 222 historical landsides that were identified during comprehensive field survey and from areal imagery. Geologically, the study area is located on a plain with rocky outcrops and mountains towards the north-western and south-western parts. The Zagros suture zone is the result of a collision between the Arabian and Central Iran tectonic plates 51 . The main tectonic trend in the region follows the Zagros Mountains. The trend is aligned NW–SE and has affected the geo-structures in the region, including fault orientations, folding and shear zone formation 52 . Although most of the study area is covered by Quaternary sediments, the geological formations in the region include late Triassic rocks 53 . The geo-structures in the region can lead to different sliding and land-movement activities. The most important reasons for landslides in this province relate to tectonic structures rather than geological unit characteristics. Naturally, sedimentary rocks, especially marl formations, are more affected by landslides than igneous formations in the region, with the most important driving factors for landslides being tectonics and seismotectonics 54 .

figure 2

Location map of the study region using ArcGIS 10.4.1 software package 55 .

Landslide covariates

The selection of a set of influencing factors is considered a key step in landslide susceptibility analysis 56 . Both full-length field surveys and remote sensing observations were acquired to provide a detailed landslide assessment of the study area. During the field surveys, 222 historic landslides in the study area were identified to determine landslide-prone areas. Several triggering factors, as used in numerous studies on machine learning-based landslide susceptibility modelling, categorized into several groups 57 , 58 , were used as landslide conditioning factors. The selection of the triggering factors required several considerations related to the dependency of triggering elements, measurability, non-redundancy and relevance of geological characteristics. The main factors influencing landslide occurrence were identified by preparing a spatial landslide inventory database that included the spatiotemporal distribution of historical landslides and a set of potential influencing factors. As a result, four main groups of factors were identified as the most effective elements that triggered landslide movements, including geomorphologic (i.e., altitude variation, slope aspect, slope curvature, profile curvature), geological (i.e., geo-units, distance to faults, land-use, soil type, hydrologic variation, slope-dip), environmental (i.e., climate, watershed, drainage pattern, vegetation) and human activity-related (i.e., distance to roads, distance to cities) covariates. These covariates were identified based on expert knowledge from fieldwork and remote sensing imagery. Table 1 provides information about the selected covariates used in this study. Before these data can be used in susceptibility modelling, it could be subject to multicollinearity and correlated variables 21 . The multicollinearity is a phenomenon in which one predictor variable in a regression model can be predicted linearly from others. To test for multicollinearity variance inflation factors (VIF) are commonly used 21 , 59 . A VIF > 5 indicates potential multicollinearity. In this article, all selected triggering factors produced VIF values less than 2.1 (Table 1 ).

Data preparation

In this research, four groups of covariates were considered for landslide susceptibility analysis. The inventory-based dataset was prepared using a digital elevation model (DEM) and Landsat TM (5–8), and ETM + satellite sensor imagery provided by the Geotechnology Unit, Department of Geological Engineering, Middle East Technical University. The dataset included 222 recorded historical landside locations that were retrieved from technical documents, fieldwork and areal images taken from landsliding sites, checked using GPS coordinates and site-survey. The predictive models were fitted based on both landslide and non-landslide cells (i.e., where landslides did not occur). The flat plain area in Isfahan province was considered as contributing non-landslide cells (112 points in the dataset) mostly located in the east of the province. According to Huang et al. 33 , three methods exist for attaining non-landslide grid cells: the seed-cell procedure, random selection and flat locations (slope lower than 2°). This study used random selection, while including flat locations as well (due to the geomorphological condition of the province). After providing the main database, this database was divided into training and testing sets (80% and 20% of the information from the ground survey, respectively). The training set comprises 60% landslide − 40% non-landslide; while testing set comprises 55% landslide − 45% non-landslide. Figures  3 , 4 , 5 and 6 present maps of the landslide covariates to support visual assessment of the performance of the various methods tested. The ArcGIS v10.4 software was used to produce the landslide susceptibility maps. All evaluated spatial data were converted to spatially defined layers to produce the landslide susceptibility maps. The proposed algorithm was implemented in the Python high-level programming language. The results of the CNN–DNN evaluation were extracted as shapefiles and used as information layers in a GIS environment.

figure 3

The geomorphologic factors used in the analysis: ( a ) altitude variation, ( b ) slope aspect, ( c ) slope curvature, ( d ) profile curvature using ArcGIS 10.4.1 software package 55 .

figure 4

The geologic factors used in the analysis: ( a ) geo-units, ( b ) distance to faults, ( c ) land-use, ( d ) soil type, ( e ) hydrologic variation, ( f ) slope dip using ArcGIS 10.4.1 software package 55 .

figure 5

The environmental factors used in the analysis: ( a ) climate, ( b ) watershed, ( c ) drainage pattern, ( d ) vegetation using ArcGIS 10.4.1 software package 55 .

figure 6

The human-activity related factors used in the analysis: ( a ) distance to roads, ( b ) distance to cities using ArcGIS 10.4.1 software package 55 .

Methodology

The study was conducted in several stages. First, ground survey was performed to estimate and record historical landslides in the study area. Second, by considering both the feature extraction of the CNN and the classification capabilities of the DNN, it was possible to identify highly susceptible (high risk) areas, potentially with high accuracy. In the next stage the model tested by using performance criteria, error models and the ROC curve.

This research evaluated the suitability of the proposed CNN–DNN method to produce detailed landslide susceptibility maps for the Isfahan province in Iran. The performance of the CNN–DNN was evaluated against several high-quality benchmark approaches through a range of appropriate statistical measures. A total of 15 landslide covariates, falling into four main groups, were fed into the CNN–DNN. All covariate layers were normalised and then entered into the model to standardise and prepare the information for landslide susceptibility analysis. The CNN was used for feature extraction, and the DNN was used to sort pixels into the high-susceptible and low-susceptible groups. Table 2 provides the hyperparameters used in the study. Hyperparameters are commonly used to optimize the fitting process which can increase the machine learning model prediction accuracy 21 . The objective of selecting the hyperparameters is to optimize the evaluation values 38 , 61 . Different optimisers were used for the hyperparameters, noting that some optimizers provide more accurate results than others 61 . The presented study used the grid search technique for the assessments. The hyperparameters that provide the highest accuracy were chosen for the final training and testing of the respective machine learning models 21 .

Figure  7 presents a flowchart describing the process applied for susceptibility assessment. As seen in the figure, the landslides dataset includes 222 historical cases and field survey recordings divided randomly into training (80%) and testing (20%) datasets. The database consists of the landslide inventory datasets (training and validation) and the landslide triggering factors. These factors were subsequently evaluated by calculating their weights from the relationship between the landslide occurrences and landslide triggering factors and then these results were checked 62 . There is no standard for the selection of triggering factors in susceptibility mapping, but the chosen factors have to be measurable depending on a particular area’s characteristics 63 . As mentioned, the test and train datasets represented 20–80%, respectively, of the primary database, taking their spatial distributions into account. Considering the test/train ratio is important for the model learning rate, that is, the response to the estimated error each time the model weights are updated. In fact, the learning rate controls how quickly the model is adapted to the problem. Smaller learning rates require more training epochs as smaller changes are made to the weights at each update, whereas larger learning rates result in rapid changes and require fewer training epochs. Specifically, the learning rate is a configurable hyperparameter used in the training of neural networks that has a small positive value, often in the range between 0.0 and 1.0. The learning rate used in this study was selected by optimizers, which for 0.01 and no momentum were scheduled via callbacks in Keras support. Pearson's Phi coefficient was used to assess the susceptibility classification, and each of the landslide influencing factors was used in this process. The coefficient takes into account true and false positives and negatives. It is generally a balanced measure that can be used even if the class proportions are of very different sizes.

figure 7

The processing flowchart of the proposed model.

Figure  8 presents the Pearson's coefficients for each layer. These information layers constituted the landslide dataset and were input to the CNN to extract more informative features for susceptibility assessment. These feature representations were then used in the DNN model to produce the susceptibility map. As is well known, some of the covariates, such as land cover, can be accepted by the proposed method, but some others, such as land use, must be modified before input to the CNN–DNN. In this regard, we used the class weight argument in the Keras package to select a large weight for unbalanced classes in such factors (to produce balanced values).

figure 8

Pearson's coefficient for each information layer.

This classification is based on a set of influencing factors (which cover extrinsic and intrinsic elements) trained on historical landslide occurrences (a total of 222 landslides) characterising very high and high susceptibility zones. The historical landslide data were prepared and extracted from shapefiles implemented in a GIS environment and evaluated for each input factor.

To assess the proposed methodology rigorously, its accuracy was evaluated using statistics from the OA and ROC and compared with the accuracies of common machine learning methods, including the SVM, LR, GNB, MLP, BNB and DT classifiers. From the confusion matrix, the mean squared error (MSE), root mean square error (RMSE) and mean absolute percentage error (MAPE) were used to measure the model accuracy. In this regard, the algorithm was run for 5000 iterations (epochs) using the training and validation datasets. The stochastic gradient descent (SGD), RMSprop and Adagrad high-dimensional optimisers were used as objective functions with suitable smoothness properties to provide accurate results. This helped to reduce the computational burden by balancing the number of iterations against the convergence rate.

The performance of the proposed methodology was estimated based on both the confusion matrix and the algorithm performance matrix. The performance matrix is a specific table that visualises the performance of a prediction algorithm based on its predicted values, and it contains the sensitivity, specificity and 1-specificity parameters. For classification tasks, true positives (TP), true negatives (TN), false positives (FP) and false negatives (FN) are used to compare the results of the classifier in question with trusted external judgments (Hearty 2016). Precision, also called the positive predictive value, is the fraction of relevant instances (TP) amongst the retrieved instances.

Recall (sensitivity) is the total fraction of relevant instances.

Therefore, both precision and recall are based on measures of relevance 41 . The false-positive rate can be calculated as ‘1-specificity’, where specificity is defined as:

Accuracy can be a misleading metric for imbalanced datasets. For example, for a prediction set with 95 positive and 5 negative values, classifying all values as negative gives a 0.95 accuracy score. On the other hand, the F1-score, the harmonic mean of precision and recall, provides approximately the average of the two values when they are close and is more generally the harmonic mean.

The overall accuracy (OA) represents the probability that a test will correctly classify an individual; that is, the sum of TP plus TN divided by the total number of the individuals tested:

OA is, thus, also the weighted average of ‘sensitivity’ and ‘specificity’ (Aggarwal, 2018). The application of the performance matrix helps to characterise the trustworthiness of the classifier in question.

The proposed models

Landslide hazard susceptibility assessment was conducted by applying the proposed CNN–DNN methodology to evaluate landslide susceptibility in the study area (Fig.  9 ). The OA and ROC controlled the result of the proposed model. The ROC curve is a graphical description that shows the diagnostic ability of a binary classifier system as its discrimination threshold is varied. As a result, the OA and AUC from the ROC curve represent the accuracy of the classifiers. Figure  10 presents the OAs and loss models for the CNN–DNN. These figures show that the estimated OA is 0.909, and the loss is reduced to 0.20 in 5000 epochs. According to the evaluated hazard map presented in Fig.  9 , susceptible and hazardous areas in the west and southern parts of the Isfahan province manifest spatially as visual stripes. DNN optimisers such as SGF, RMSprop and Adagrad estimated the modified IRs for the CNN–DNN (Figs.  11 , 12 , 13 ). IRs can be used as the AUC value of OA to control the performance of the algorithm. These stripes follow the northwest-southeast trace, which represents the main Zagros trend in the region. Therefore, it can be suggested that geological factors have had the most significant impact on landslide occurrence in the Isfahan province.

figure 9

Landslide susceptibility map for the proposed model using ArcGIS 10.4.1 software package 55 .

figure 10

The OA and loss function values were obtained for the applied model.

figure 11

The SGD optimiser results for the proposed model.

figure 12

The RMSprop optimiser results for the proposed model.

figure 13

The Adagrad optimiser results for the proposed model.

Benchmark comparison

To evaluate the performance of the proposed CNN–DNN model rigorously, a large range of state-of-the-art and widely applied machine learning techniques were tested using the same accuracy statistics as applied to the proposed method. These benchmark methods include the SVM, LR, GNB, MLP, BNB and DT classifiers. Thus, the OA, IRs and ROC were also obtained for these benchmark methods (Fig.  14 and Table 3 ). By comparing the accuracy evaluation for the CNN–DNN with those for the benchmark approaches, it can be observed that the CNN–DNN model was able to predict landslide susceptibility with higher accuracy than the other classifiers. The IRs and AUC estimated for the CNN–DNN and benchmark methods indicate that the CNN–DNN method has significantly greater accuracy than the benchmarks. This suggests that the extracted features can more accurately characterise landslide susceptibility than the benchmark methods as measured with the AUC and IR indices. More specifically, the CNN–DNN (AUC = 90.9%; IRs = 84.8%) achieved greater prediction accuracy than the corresponding single classifiers such as SVM (AUC = 81.5%; IRs = 80.1%), LR (AUC = 78.3%; IRs = 72.2%), GNB (AUC = 80.1%; IRs = 68.7%), BNB (AUC = 50.0%; IRs = 61.0%), MLP (AUC = 50.9%; IRs = 61.8%) and DT (AUC = 85.5%; IRs = 80.0%) as revealed through the measured indices. The MSE, RMSE and MAPE values were also obtained for the various classifiers (Table 4 ). According to this table, the CNN–DNN model outperformed the benchmark methods.

figure 14

ROC results for the CNN–DNN model and the benchmark methods.

As Table 3 shows the values of the classification metrics, the proposed model performed more accurately than all six benchmarks in all metrics. The proposed model produced the highest rate of ROC accuracy with a value of 90%. After the proposed model, the decision tree classifier achieved the next best performance, with accuracy approximately 5% lower than for the proposed model. The lowest estimated accuracy of 50% was achieved by the MLP and BNB, which is 40% less than for the proposed model. Regarding accuracy criteria, the proposed model produced an accuracy of 84.8%, and the closest algorithm (SVM) to the proposed model was approximately 4.7 less than the proposed model. The MLP produced the lowest accuracy of 61%. The average precision for the two susceptibility classes of the proposed model is 84%. The lowest average accuracy of the MLP and BNB is 34% (a difference from the proposed model of more than 50%). The average recall rate for the proposed model is 88%, and the minimum recall rate is 55% for BNB. For the F1-score, the average is 85.5% for the proposed model, and, as for the other three criteria examined, this is the largest value amongst all models. The DT algorithm produced the next largest F1-score, with an average value of 83.5% and a difference of almost 2% less than the proposed model.

We investigated the potential of a coupled deep neural network (CNN–DNN) to predict landslide susceptibility spatially. The algorithm was evaluated using data with a spatial resolution of 30 m representing the Isfahan province, Iran. Indices associated with historical landslide occurrences (a total of 222 landslides) were used as the landslide inventory dataset, and this was divided randomly into training (80%) and testing (20%) sets for the analysis. Four main covariates, including geomorphologic, geologic, environmental and human activity-related covariates, were identified based on field and remote sensing investigations. The CNN–DNN model was able to produce a susceptibility map for the study area with appropriate accuracy. The results show a significant increase in landslide susceptibility prediction accuracy compared to the benchmark models. Notwithstanding the high accuracy achieved by the proposed CNN–DNN predictive model for landslide susceptibility mapping, this study has some limitations that could be considered in future research. Theses limitation can be addressed as: ( i ) the primary database was provided based on fieldwork, historical landslide records and remote-sensing information. The limited number of reference landslides in the recorded data (as is commonly the case) made modelling challenging; ( ii ) the data on the triggering factors were highly dependent on the spatial resolutions of satellite sensor imagery and DEM data quality, which affected directly the quality of the input database; ( iii ) the predictive model required strong processors to manage the inputs during landslide susceptibility assessments. Thus, for future scientific research involving, for example, even finer spatial resolution images the adequacy of the available processors needs to be considered for landslide susceptibility analysis.

Referring to Fig.  9 , which presents the landslide susceptibility assessment results in the study area, it is clear that the main risk area lies in the west and southwest part of Isfahan. Geo-structural studies suggest that the high-susceptible areas are located in the Zagros folded zone and follow the main Zagros trend in the province. Thus, it can be stated that the geological-based triggering factors play important roles in determining landslide occurrence in Isfahan. Fieldwork suggested the effect of geo-structures as triggers of landslide movements. It is interesting then that the CNN–DNN model was able to provide detailed mapping to corroborate this.

The benchmark classifiers SVM, LR, GNB, MLP, BNB and DT were used to validate the predictive performance of the CNN–DNN model. Comparison of the proposed model with the benchmark methods demonstrated the superiority of the proposed CNN–DNN approach. A review of recent studies on landslide susceptibility assessment demonstrated that applications of deep neural networks in susceptibility analysis are expanding 37 , 42 , 44 , 45 . Wang et al. 42 , Sameen et al. 37 , Fang et al. 32 , and Pham et al. 8 used a CNN as the principle method for the assessment of landslide susceptibility for different locations, with an evaluation accuracy of 0.813, 0.835, 0.798, and 0.889; respectively 8 , 32 , 42 , 44 . This indicates that the CNN can be used as a basic predictive model. However, the coupled CNN–DNN model in this paper was able to increase the accuracy further, to reach 0.909.

The CNN–DNN method uses a first-stage CNN component that attempts to extract meaningful semantic information from low-level input covariates that may be related to the target for prediction, in this case, landslide susceptibility classes. The results suggest that the first-stage CNN is efficient in extracting suitable environmental features related to landslide susceptibility. This is important because it is unclear whether landslides should be considered as spatially continuous phenomena or spatial objects 18 , 19 , 20 , 26 , 27 , 28 , 29 , 64 . On the one hand, landslides are complex geomorphological processes manifested as changes in states in space and time, including variation within the landslide (rupture zone and impacted area). Thus, at a fine spatial scale, one might consider a continuous statistical model appropriate for landslide susceptibility mapping. On the other hand, landslides create discrete rupture zones and impacted areas that appear against a landscape background. In this sense, and at a scale where variation between landslides becomes more important, landslides can be considered discrete objects.

The problem with the above duality between the continua- and object-based views of the world becomes obvious when considering the characterisation of existing landslides and prediction of yet-to-occur landslides. Landslides do not occur at a pixel, but rather occupy some positive area. As such, conventional methods, which are commonly pixel-based, insufficiently characterise the landslide as a spatially extensive phenomenon. They also run into difficulties in predicting yet-to-occur landslides because predictions of susceptibility are constrained to a pixel. The CNN–DNN model deals directly with these two problems by analysing spatial patches of data rather than pixels.

The second problem we leave as an open question for future research. Specifically, the CNN–DNN can transform the spatial information in the input covariates into meaningful higher-order feature representations about landslide susceptibility. This makes sense concerning landslides when one considers the conditions that may lead to failure. These conditions are often spatial, requiring not the conditions at a point to be satisfied, but the conjunction of several conditions over an area to be satisfied. For example, it may not be enough for the slope at a single point to be high. Landsliding may be more likely if that same high slope falls in the context of surrounding land which, for example, concentrates water to that point (e.g., via overland flow or throughflow). This requirement for context is true of many of the in situ factors that underpin the susceptibility of a location to fail. Thus, the CNN–DNN approach proposed in this research is an excellently matched algorithm to the specific characteristics of the landsliding phenomenon and problem under study.

Conclusions

Landslide susceptibility mapping is one of the most challenging tasks in geo-hazard assessment. In this context, application of modern deep learning techniques can be advantageous for analysis. Here, we applied a novel CNN–DNN predictive model for assessment of landslide susceptibility in Isfahan province, Iran. The model was fitted between historical landslides data (which accounted for different types of landsliding) and various triggering factors. The proposed CNN–DNN model produced a very high accuracy, outperforming a wide range of benchmark approaches, specifically the SVM, LR, GNB, MLP, BNB and DT methods. More specifically, the CNN–DNN (AUC = 90.9%; IRs = 84.8%) achieved greater prediction accuracy than the corresponding single classifiers such as SVM (AUC = 81.5%; IRs = 80.1%), LR (AUC = 78.3%; IRs = 72.2%), GNB (AUC = 80.1%; IRs = 68.7%), BNB (AUC = 50.0%; IRs = 61.0%), MLP (AUC = 50.9%; IRs = 61.8%) and DT (AUC = 85.5%; IRs = 80.0%) as revealed through the measured indices. Also, the CNN–DNN (MSE = 0.17, RMSE = 0.40, MAPE = 0.42) produced smaller error indices than the benchmark models: SVM (MSE = 0.28, RMSE = 0.42, MAPE = 0.44), LR (MSE = 0.25, RMSE = 0.50, MAPE = 0.54), GNB (MSE = 0.29, RMSE = 0.54, MAPE = 0.63), BNB (MSE = 0.38, RMSE = 0.62, MAPE = 0.65), MLP (MSE = 0.38, RMSE = 0.62, MAPE = 0.68), and DT (MSE = 0.28, RMSE = 0.42, MAPE = 0.44). We, thus, recommend the CNN–DNN approach for landslide susceptibility mapping. Importantly, the CNN component of the approach has great advantages for landslide susceptibility mapping precisely because it matches well, and takes advantage of, the spatially extensive nature of the landslide phenomenon itself.

The CNN–DNN model predicted a high-susceptibility zone in the west and south-western parts of the study area, appearing as a stripe aligned with the northwest-southeast main Zagros trend in the region.

Colesanti, C. & Wasowski, J. Investigating landslides with space-borne Synthetic Aperture Radar (SAR) interferometry. Eng. Geol. 88 , 173–199. https://doi.org/10.1016/j.enggeo.2006.09.013 (2006).

Article   Google Scholar  

Highland, L. & Bobrowsky, P. T. The Landslide Handbook: A Guide to Understanding Landslides (US Geological Survey Reston, 2008).

Google Scholar  

Chen, Z. et al. Landslide research in China. Q. J. Eng. Geol. Hydrogeol. 49 , 279–285. https://doi.org/10.1144/qjegh2016-100 (2016).

Tang, H., Wasowski, J. & Juang, C. H. Geohazards in the three Gorges Reservoir Area, China-Lessons learned from decades of research. Eng. Geol. 261 , 105267. https://doi.org/10.1016/j.enggeo.2019.105267 (2019).

Wasowski, J. et al. Recurrent rock avalanches progressively dismantle a mountain ridge in Beichuan County, Sichuan, most recently in the 2008 Wenchuan earthquake. Geomorphology 374 , 107492. https://doi.org/10.1016/j.geomorph.2020.107492 (2021).

Azarafza, M., Ghazifard, A., Akgün, H. & Asghari-Kaljahi, E. Landslide susceptibility assessment of South Pars Special Zone, southwest Iran. Environ. Earth Sci. 77 , 805. https://doi.org/10.1007/s12665-018-7978-1 (2018).

Cascini, L. Applicability of landslide susceptibility and hazard zoning at different scales. Eng. Geol. 102 , 164–177. https://doi.org/10.1016/j.enggeo.2008.03.016 (2008).

Pham, V. D., Nguyen, Q.-H., Nguyen, H.-D., Pham, V.-M. & Bui, Q.-T. Convolutional neural network: Optimised moth flame algorithm for shallow landslide susceptible analysis. IEEE Access 8 , 32727–32736. https://doi.org/10.1109/ACCESS.2020.2973415 (2020).

Abella, E. A. C. & Van Westen, C. J. Qualitative landslide susceptibility assessment by multicriteria analysis: a case study from San Antonio del Sur, Guantánamo, Cuba. Geomorphology 94 , 453–466. https://doi.org/10.1016/j.geomorph.2006.10.038 (2008).

Article   ADS   Google Scholar  

Lee, S. & Choi, J. Landslide susceptibility mapping using GIS and the weight-of-evidence model. Int. J. Geogr. Inf. Sci. 18 , 789–814. https://doi.org/10.1080/13658810410001702003 (2004).

Manzo, G., Tofani, V., Segoni, S., Battistini, A. & Catani, F. GIS techniques for regional-scale landslide susceptibility assessment: The Sicily (Italy) case study. Int. J. Geogr. Inf. Sci. 27 , 1433–1452. https://doi.org/10.1080/13658816.2012.693614 (2013).

Feizizadeh, B. & Blaschke, T. An uncertainty and sensitivity analysis approach for GIS-based multicriteria landslide susceptibility mapping. Int. J. Geogr. Inf. Sci. 28 , 610–638. https://doi.org/10.1080/13658816.2013.869821 (2014).

Article   PubMed   PubMed Central   Google Scholar  

Firomsa, M. & Abay, A. Landslide assessment and susceptibility zonation in Ebantu district of Oromia region, western Ethiopia. Bull. Eng. Geol. Environ. 78 , 4229–4239. https://doi.org/10.1007/s10064-018-1398-z (2019).

Milevski, I. & Dragićević, S. Landslides susceptibility zonation of the territory of north macedonia using analytical hierarchy process approach. Contrib. Sect. Nat. Math. Biotechn. Sci. 40 , 115–126. https://doi.org/10.20903/csnmbs.masa.2019.40.1.136 (2019).

Article   CAS   Google Scholar  

Peethambaran, B., Anbalagan, R., Kanungo, D., Goswami, A. & Shihabudheen, K. A comparative evaluation of supervised machine learning algorithms for township level landslide susceptibility zonation in parts of Indian Himalayas. CATENA 195 , 104751. https://doi.org/10.1016/j.catena.2020.104751 (2020).

Fang, Z., Wang, Y., Peng, L. & Hong, H. A comparative study of heterogeneous ensemble-learning techniques for landslide susceptibility mapping. Int. J. Geogr. Inf. Sci. 35 , 321–347. https://doi.org/10.1080/13658816.2020.1808897 (2021).

Yan, Y. et al. Volunteered geographic information research in the first decade: A narrative review of selected journal articles in GIScience. Int. J. Geogr. Inf. Sci. 34 , 1765–1791. https://doi.org/10.1080/13658816.2020.1730848 (2020).

Rahman, M. et al. Development of flood hazard map and emergency relief operation system using hydrodynamic modeling and machine learning algorithm. J. Clean. Prod. 133 , 127594. https://doi.org/10.1016/j.jclepro.2021.127594(2021) (2021).

Rahman, M. et al. Flood susceptibility assessment in Bangladesh using machine learning and multi-criteria decision analysis. Earth Syst. Environ. 3 , 585–601. https://doi.org/10.1007/s41748-019-00123-y (2019).

Dewan A.M., Hazards, risk, and vulnerability. In: Floods in a Megacity , 35–74. https://doi.org/10.1007/978-94-007-5875-9_2 (2013).

Adnan, M. S. G. et al. Improving spatial agreement in machine learning-based landslide susceptibility mapping. Remote Sens. 12 , 3347. https://doi.org/10.3390/rs12203347 (2020).

Zêzere, J., Pereira, S., Melo, R., Oliveira, S. & Garcia, R. A. Mapping landslide susceptibility using data-driven methods. Sci. Total Environ. 589 , 250–267. https://doi.org/10.1016/j.scitotenv.2017.02.188 (2017).

Article   ADS   PubMed   CAS   Google Scholar  

Huabin, W., Gangjun, L., Weiya, X. & Gonghui, W. GIS-based landslide hazard assessment: an overview. Prog. Phys. Geogr. 29 , 548–567. https://doi.org/10.1191/0309133305pp462ra (2005).

Ruff, M. & Czurda, K. Landslide susceptibility analysis with a heuristic approach in the Eastern Alps (Vorarlberg, Austria). Geomorphology 94 , 314–324. https://doi.org/10.1016/j.geomorph.2006.10.032 (2008).

Nefeslioglu, H., Sezer, E., Gokceoglu, C., Bozkir, A. & Duman, T. Assessment of landslide susceptibility by decision trees in the metropolitan area of Istanbul, Turkey. Math. Probl. Eng. 2010 , 901095. https://doi.org/10.1155/2010/901095 (2010).

Article   MATH   Google Scholar  

Atkinson, P. M. & Massari, R. Autologistic modelling of susceptibility to landsliding in the Central Apennines, Italy. Geomorphology 130 , 55–64. https://doi.org/10.1016/j.geomorph.2011.02.001 (2011).

Eker, A. M., Dikmen, M., Cambazoğlu, S., Düzgün, ŞH. & Akgün, H. Evaluation and comparison of landslide susceptibility mapping methods: A case study for the Ulus district, Bartın, northern Turkey. Int. J. Geogr. Inf. Sci. 29 , 132–158. https://doi.org/10.1080/13658816.2014.953164 (2015).

Okalp, K. & Akgün, H. National level landslide susceptibility assessment of Turkey utilising public domain dataset. Environ. Earth Sci. 75 , 847. https://doi.org/10.1007/s12665-016-5640-3 (2016).

Maes, J. et al. Landslide risk reduction measures: A review of practices and challenges for the tropics. Prog. Phys. Geogr. 41 , 191–221. https://doi.org/10.1177/0309133316689344 (2017).

Hong, H. et al. Landslide susceptibility assessment at the Wuning area, China: A comparison between multi-criteria decision making, bivariate statistical and machine learning methods. Nat. Hazards 96 , 173–212. https://doi.org/10.1007/s11069-018-3536-0 (2019).

Pham, B. T. & Prakash, I. A novel hybrid model of bagging-based naïve bayes trees for landslide susceptibility assessment. Bull. Eng. Geol. Env. 78 , 1911–1925. https://doi.org/10.1007/s10064-017-1202-5 (2019).

Fang, Z., Wang, Y., Peng, L. & Hong, H. Integration of convolutional neural network and conventional machine learning classifiers for landslide susceptibility mapping. Comput. Geosci. 139 , 104470. https://doi.org/10.1016/j.cageo.2020.104470 (2020).

Zêzere, J.-L. et al . Effects of landslide inventories uncertainty on landslide susceptibility modelling. In: Landslide Processes: From Geomorphologic Mapping to Dynamic Modelling.Edition: Strasbourg , 81–86 (2009).

Chen, W., Pourghasemi, H. R. & Zhao, Z. A GIS-based comparative study of Dempster-Shafer, logistic regression and artificial neural network models for landslide susceptibility mapping. Geocarto Int. 32 , 367–385. https://doi.org/10.1080/10106049.2016.1140824 (2017).

Aditian, A., Kubota, T. & Shinohara, Y. Comparison of GIS-based landslide susceptibility models using frequency ratio, logistic regression, and artificial neural network in a tertiary region of Ambon, Indonesia. Geomorphology 318 , 101–111. https://doi.org/10.1016/j.geomorph.2018.06.006 (2018).

Sevgen, E., Kocaman, S., Nefeslioglu, H. A. & Gokceoglu, C. A novel performance assessment approach using photogrammetric techniques for landslide susceptibility mapping with logistic regression, ANN and random forest. Sensors 19 , 3940. https://doi.org/10.3390/s19183940 (2019).

Article   ADS   PubMed Central   Google Scholar  

Sameen, M. I., Pradhan, B. & Lee, S. Application of convolutional neural networks featuring Bayesian optimisation for landslide susceptibility assessment. CATENA 186 , 104249. https://doi.org/10.1016/j.catena.2019.104249 (2020).

Sun, D., Wen, H., Wang, D. & Xu, J. A random forest model of landslide susceptibility mapping based on hyperparameter optimization using Bayes algorithm. Geomorphology 362 , 107201 (2020).

LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521 , 436–444. https://doi.org/10.1038/nature14539 (2015).

Chauhan, S. et al. A comparison of shallow and deep learning methods for predicting cognitive performance of stroke patients from MRI lesion images. Front. Neuroinform. 13 , 53. https://doi.org/10.3389/fninf.2019.00053 (2019).

Aggarwal, C. C. Neural Networks and Deep Learning Vol. 497 (Springer, 2018).

Book   Google Scholar  

Wang, Y., Fang, Z. & Hong, H. Comparison of convolutional neural networks for landslide susceptibility mapping in Yanshan County, China. Sci. Total Environ. 666 , 975–993. https://doi.org/10.1016/j.scitotenv.2019.02.263 (2019).

Ding, A., Zhang, Q., Zhou, X. & Dai, B. in 2016 31st Youth Academic Annual Conference of Chinese Association of Automation (YAC). 444–448 (IEEE, 2016).

Xiao, L., Zhang, Y. & Peng, G. Landslide susceptibility assessment using integrated deep learning algorithm along the China-Nepal highway. Sensors 18 , 4436. https://doi.org/10.3390/s18124436 (2018).

Van Dao, D. et al. A spatially explicit deep learning neural network model for the prediction of landslide susceptibility. CATENA 188 , 104451. https://doi.org/10.1016/j.catena.2019.104451 (2020).

Huang, F. et al. A deep learning algorithm using a fully connected sparse autoencoder neural network for landslide susceptibility prediction. Landslides 17 , 217–229. https://doi.org/10.1007/s10346-019-01274-9 (2020).

Bui, D. T., Tsangaratos, P., Nguyen, V.-T., Van Liem, N. & Trinh, P. T. Comparing the prediction performance of a Deep Learning Neural Network model with conventional machine learning models in landslide susceptibility assessment. CATENA 188 , 104426. https://doi.org/10.1016/j.catena.2019.104426 (2020).

Cichy, R. M., Khosla, A., Pantazis, D., Torralba, A. & Oliva, A. Comparison of deep neural networks to spatio-temporal cortical dynamics of human visual object recognition reveals hierarchical correspondence. Sci. Rep. 6 , 27755. https://doi.org/10.1038/srep27755 (2016).

Article   ADS   PubMed   PubMed Central   CAS   Google Scholar  

Prakash, N., Manconi, A. & Loew, S. Mapping landslides on EO data: Performance of deep learning models vs traditional machine learning models. Remote Sens. 12 , 346. https://doi.org/10.3390/rs12030346 (2020).

Iran Meteorological Organization. http://www.irimo.ir (2021).

Ghanbarian, M. A., Yassaghi, A. & Derakhshani, R. Detecting a sinistral transpressional deformation belt in the Zagros. Geosciences 11 , 226. https://doi.org/10.3390/geosciences11060226 (2021).

Article   ADS   CAS   Google Scholar  

Ghanbarian, M. A. & Derakhshani, R. Systematic Variations in the Deformation Intensity in the Zagros Hinterland Fold-and-Thrust Belt (Zeitschrift der Deutschen Gesellschaft für Geowissenschaften, 2021).

Aghanabati, A. Geology of Iran (Geological Survey of Iran, 2004).

Ghorbani, M. A summary of geology of Iran. In: The Economic Geology of Iran , 45–64 (Springer, 2013). https://doi.org/10.1007/978-94-007-5625-0_2 .

ArcGIS. (2021) https://desktop.arcgis.com/en/arcmap/10.4/get-started/setup/arcgis-desktop-quick-start-guide.htm .

Reichenbach, P., Rossi, M., Malamud, B. D., Mihir, M. & Guzzetti, F. A review of statistically-based landslide susceptibility models. Earth Sci. Rev. 180 , 60–91. https://doi.org/10.1016/j.earscirev.2018.03.001 (2018).

Yao, X., Tham, L. & Dai, F. Landslide susceptibility mapping based on support vector machine: A case study on natural slopes of Hong Kong, China. Geomorphology 101 , 572–582 (2008).

Rossi, M., Guzzetti, F., Reichenbach, P., Mondini, A. C. & Peruccacci, S. Optimal landslide susceptibility zonation based on multiple forecasts. Geomorphology 114 , 129–142 (2010).

Fox, J. et al . Package ‘Car’ (R Foundation for Statistical Computing, 2018).

Iran Water Resources Management Company. https://www.wrm.ir/ (2021).

Rahman, M. et al. Application of stacking hybrid machine learning algorithms in delineating multi-type flooding in Bangladesh. J. Environ. Manage. 295 , 113086. https://doi.org/10.1016/j.jenvman.2021.113086 (2021).

Article   PubMed   Google Scholar  

Mersha, T. & Meten, M. GIS-based landslide susceptibility mapping and assessment using bivariate statistical methods in Simada area, northwestern Ethiopia. Geoenviron. Disasters 7 , 20 (2020).

Ayalew, L. & Yamagishi, H. The application of GIS based logistic regression for landslide susceptibility mapping in the KakudaYahiko Mountains Central Japan. Geomorphology 65 (1), 15–31 (2005).

Ahmad, H. et al. Geohazards susceptibility assessment along the upper indus basin using four machine learning and statistical models. ISPRS Int. J. Geo Inf. 10 (5), 315. https://doi.org/10.3390/ijgi10050315 (2021).

Download references

Author information

Authors and affiliations.

Department of Civil Engineering, University of Tabriz, Tabriz, Iran

Mohammad Azarafza

Department of Computer Engineering, University of Tabriz, Tabriz, Iran

Mehdi Azarafza

Department of Geological Engineering, Middle East Technical University (METU), Ankara, Turkey

Haluk Akgün

Lancaster Environment Centre, Faculty of Science and Technology, Lancaster University, Lancaster, UK

Peter M. Atkinson

Department of Geology, Shahid Bahonar University of Kerman, Kerman, Iran

Reza Derakhshani

Department of Earth Sciences, Utrecht University, Utrecht, The Netherlands

You can also search for this author in PubMed   Google Scholar

Contributions

Mo.A. and Me.A. conceived the idea for the manuscript prepared the code, data analysis, GIS maps and drafted the manuscript, H.A. provided supervision; visualization and data-controlling, P.M.A. and R.D. provided supervision, verification, editing, and modification. All of the authors collaborated in finalizing the manuscript.

Corresponding author

Correspondence to Reza Derakhshani .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Azarafza, M., Azarafza, M., Akgün, H. et al. Deep learning-based landslide susceptibility mapping. Sci Rep 11 , 24112 (2021). https://doi.org/10.1038/s41598-021-03585-1

Download citation

Received : 21 September 2021

Accepted : 06 December 2021

Published : 16 December 2021

DOI : https://doi.org/10.1038/s41598-021-03585-1

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

This article is cited by

Fast hybrid methods for modeling landslide susceptibility in ardal county.

  • Shangshang Xu

Scientific Reports (2024)

Statistical analysis of the landslides triggered by the 2021 SW Chelgard earthquake (ML = 6) using an automatic linear regression (LINEAR) and artificial neural network (ANN) model based on controlling parameters

  • A. A. Ghaedi Vanani

Natural Hazards (2024)

Debris flow susceptibility mapping in alpine canyon region: a case study of Nujiang Prefecture

  • Wenxue Jiang

Bulletin of Engineering Geology and the Environment (2024)

Landslide susceptibility mapping using the uncertain and parameter free density-based clustering (UPFDBCAN) algorithm

  • Deborah Simon Mwakapesa
  • Maosheng Zhang

International Journal of Earth Sciences (2024)

Landslide susceptibility prediction and mapping using the LD-BiLSTM model in seismically active mountainous regions

  • Jingjing Wang
  • Michel Jaboyedoff
  • Qianjun Zhao

Landslides (2024)

By submitting a comment you agree to abide by our Terms and Community Guidelines . If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

case study on deep learning

Customer Stories

Find out how these industry leaders are driving innovation with AI, accelerated compute, and simulation to modernize their businesses.

  • Company Overview
  • Venture Capital (NVentures)
  • NVIDIA Foundation
  • Social Responsibility
  • Technologies
  • Company Blog
  • Technical Blog
  • Stay Informed
  • Events Calendar
  • GTC AI Conference
  • NVIDIA On-Demand
  • Executive Insights
  • Startups and VCs
  • Documentation
  • Technical Training
  • Training for IT Professionals
  • Professional Services for Data Science

case study on deep learning

  • Privacy Policy
  • Manage My Privacy
  • Do Not Sell or Share My Data
  • Terms of Service
  • Accessibility
  • Corporate Policies
  • Product Security

Help | Advanced Search

Computer Science > Machine Learning

Title: a neuro-symbolic explainer for rare events: a case study on predictive maintenance.

Abstract: Predictive Maintenance applications are increasingly complex, with interactions between many components. Black box models are popular approaches based on deep learning techniques due to their predictive accuracy. This paper proposes a neural-symbolic architecture that uses an online rule-learning algorithm to explain when the black box model predicts failures. The proposed system solves two problems in parallel: anomaly detection and explanation of the anomaly. For the first problem, we use an unsupervised state of the art autoencoder. For the second problem, we train a rule learning system that learns a mapping from the input features to the autoencoder reconstruction error. Both systems run online and in parallel. The autoencoder signals an alarm for the examples with a reconstruction error that exceeds a threshold. The causes of the signal alarm are hard for humans to understand because they result from a non linear combination of sensor data. The rule that triggers that example describes the relationship between the input features and the autoencoder reconstruction error. The rule explains the failure signal by indicating which sensors contribute to the alarm and allowing the identification of the component involved in the failure. The system can present global explanations for the black box model and local explanations for why the black box model predicts a failure. We evaluate the proposed system in a real-world case study of Metro do Porto and provide explanations that illustrate its benefits.

Submission history

Access paper:.

  • Other Formats

References & Citations

  • Google Scholar
  • Semantic Scholar

BibTeX formatted citation

BibSonomy logo

Bibliographic and Citation Tools

Code, data and media associated with this article, recommenders and search tools.

  • Institution

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs .

Particle Swarm Optimized Deep Learning Models for Rainfall Prediction: A Case Study in Aizawl, Mizoram

Ieee account.

  • Change Username/Password
  • Update Address

Purchase Details

  • Payment Options
  • Order History
  • View Purchased Documents

Profile Information

  • Communications Preferences
  • Profession and Education
  • Technical Interests
  • US & Canada: +1 800 678 4333
  • Worldwide: +1 732 981 0060
  • Contact & Support
  • About IEEE Xplore
  • Accessibility
  • Terms of Use
  • Nondiscrimination Policy
  • Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity. © Copyright 2024 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.

Issue Cover

  • Previous Article
  • Next Article

1 Introduction

2 inertial flow sculpting, 3 related works, 4 deep reinforcement learning, 5 implementation of reinforcement learning-based design, 6 results and discussion, 7 conclusions and future works, acknowledgment, a case study of deep reinforcement learning for engineering design: application to microfluidic devices for flow sculpting.

Contributed by the Design Automation Committee of ASME for publication in the J ournal of M echanical D esign .

  • Split-Screen
  • Article contents
  • Figures & tables
  • Supplementary Data
  • Peer Review
  • Open the PDF for in another window
  • Cite Icon Cite
  • Permissions
  • Search Site

Lee, X. Y., Balu, A., Stoecklein, D., Ganapathysubramanian, B., and Sarkar, S. (September 16, 2019). "A Case Study of Deep Reinforcement Learning for Engineering Design: Application to Microfluidic Devices for Flow Sculpting." ASME. J. Mech. Des . November 2019; 141(11): 111401. https://doi.org/10.1115/1.4044397

Download citation file:

  • Ris (Zotero)
  • Reference Manager

Efficient exploration of design spaces is highly sought after in engineering applications. A spectrum of tools has been proposed to deal with the computational difficulties associated with such problems. In the context of our case study, these tools can be broadly classified into optimization and supervised learning approaches. Optimization approaches, while successful, are inherently data inefficient, with evolutionary optimization-based methods being a good example. This inefficiency stems from data not being reused from previous design explorations. Alternately, supervised learning-based design paradigms are data efficient. However, the quality of ensuing solutions depends heavily on the quality of data available. Furthermore, it is difficult to incorporate physics models and domain knowledge aspects of design exploration into pure-learning-based methods. In this work, we formulate a reinforcement learning (RL)-based design framework that mitigates disadvantages of both approaches. Our framework simultaneously finds solutions that are more efficient compared with supervised learning approaches while using data more efficiently compared with genetic algorithm (GA)-based optimization approaches. We illustrate our framework on a problem of microfluidic device design for flow sculpting, and our results show that a single generic RL agent is capable of exploring the solution space to achieve multiple design objectives. Additionally, we demonstrate that the RL agent can be used to solve more complex problems using a targeted refinement step. Thus, we address the data efficiency limitation of optimization-based methods and the limited data problem of supervised learning-based methods. The versatility of our framework is illustrated by utilizing it to gain domain insights and to incorporate domain knowledge. We envision such RL frameworks to have an impact on design science.

Expensive design explorations often arise in several engineering problems. Solving the inverse problems for the design of micro-structures [ 1 ], manufacturing processes [ 2 ], the design of nanophotonic devices [ 3 ], topology optimization [ 4 ], and the discovery of new materials [ 5 ] are representative examples. A forward (physics) problem involves computation of a system response or a characteristic for a specific design configuration. In contrast, inverse problems involve identifying the configuration that results in a specific desired system response. While forward models exist to simulate the outcome of an engineering process for a design configuration, inverse models are generally difficult to obtain due to the complex (and noninjective) nature of the physical phenomena.

Design problems have been posed and successfully solved as optimization problems. A popular class of optimization methods utilizes gradient information of the objective function to solve inverse design problems. While gradient calculations provide rich information about the search neighborhood, computing high-dimensional gradients can be computationally very intensive. In contrast, another class of methods is based on a gradient-free approach that utilizes pattern search or evolutionary strategies to explore the search space. Evolutionary methods have been extensively utilized in a variety of design problems [ 6 – 8 ], typically evaluating a forward model of the problem to explore the design space and finding an optimal result. Such methods are easily deployed on consumer hardware (making democratized design more accessible) and can yield important design insights from sparse samples of a large design space. However, such methods can be computationally inefficient, as they do not typically utilize insights from previous experiments, instead exploring similar regions of the design space with each search. This inefficiency especially stands out in design tools intended to enable nonexpert users to make use of abstracted models on consumer-grade hardware [ 7 , 9 ] or in meta-optimization routines that repeatedly solve the optimization problem within a larger design process.

Alternatively, learning-based approaches such as supervised learning have also been recently applied to solve multiple inverse problems due to the resurgence in the popularity of machine learning [ 10 – 12 ]. Although these methods can be more data efficient because they attempt to recognize patterns present in data and leverage that knowledge to propose solutions, their performance relies heavily on the quality of the dataset [ 13 ]. Additionally, the solutions found by both evolutionary and learning approaches are typically constrained by the designer’s input, hence limiting the range of solutions that can be discovered. Thus, a method to solve the inverse design problem that is data-driven, explores the solution space creatively, and has the flexibility to find multiple solutions without requiring user-defined constraints will be a valuable tool to the designer.

In this context, we propose a deep reinforcement learning (deep RL)-based design framework with the characteristics mentioned above to solve the inverse design. In particular, we show that within a spectrum of design paradigms, which ranges from optimization to learning-based methods (as visualized in Fig. 1 ), a RL-based design strategy that lies in the middle of the spectrum can aid the design process in multiple ways. While our proposed framework is applicable to general objective functions, we hypothesize that its usefulness is especially prevalent in a smaller class of problems such as the inverse flow design problem as presented as a case study here. From a high-level perspective, RL works by having an agent interact with an environment and getting rewards/penalties for good/poor decisions that it makes. Initially, the agent’s actions are highly random and exploratory. Over time, the agent learns from the reward signals which actions are more favorable with respect to attaining its goal and exploits that knowledge to reach the goal.

Spectrum of design exploration strategies to solve the inverse problem with varying components of optimization and learning. We propose a deep RL (DRL) framework that lies toward the learning end of the spectrum to solve more generic design goals and another RL framework (DRL + targeted refinement) to solve more specific design goals that includes a higher component of optimization.

Spectrum of design exploration strategies to solve the inverse problem with varying components of optimization and learning. We propose a deep RL (DRL) framework that lies toward the learning end of the spectrum to solve more generic design goals and another RL framework (DRL + targeted refinement) to solve more specific design goals that includes a higher component of optimization.

A few attempts have been made to use RL in the design process [ 14 , 15 ]. Nevertheless, these methods only incorporate RL in one narrow aspect of design or employ simple reward functions that might not scale for design problems where a high degree of exploration is needed. RL algorithms that are more sophisticated and customized have also been studied to tackle problems with high-dimensional actions space [ 16 , 17 ]. However, these works mainly focus on the machine learning aspects of improving algorithmic efficiency for the RL agent. In this work, we consider engineering design applications of advanced RL methodologies. Specifically, we show that with appropriate engineering of the environment, it is possible to apply off-the-shelf deep RL algorithms to solve high-dimensional design problems that require a high degree of exploration in the design space. This is the main contribution of this paper. We present a proof-of-concept using a real-life case study, namely, microfluidic device design for flow sculpting. We demonstrate that a deep RL framework can be utilized to support the manual design process of designing a microfluidic device via interleaving the decisions of a human and the RL agent as well as generating a fully automated design. We compare the RL agent’s performance with a state-of-the-art genetic algorithm (GA)-based design approach. Additionally, we propose the use of principal component analysis (PCA) to visually differentiate the trajectories generated by the RL agent and GA in the projected design space to understand the differences in the solutions obtained by each method.

While our proposed framework is generic, we choose a specific engineering platform with a large and complex design space (and correspondingly difficult inverse problem) to demonstrate this proof-of-concept. This problem involves inertial flow sculpting, a recently developed method for manipulating fluid flow in microfluidic devices. Inertial flow sculpting is accomplished by placing bluff-body obstacles in confined channel flow, with each obstacle inducing localized deformations to the cross-sectional structure of the fluid (see Fig. 2 ). In inertial flows (1 < Re < 100 with Reynolds number Re = UD / ν for fluid average velocity U , kinematic viscosity ν , and channel hydraulic diameter D ), fore-aft asymmetry in the structure-induced vorticity creates an irreversible deformation to the fluid. If the fluid inertia is low enough to prevent eddies and time-dependent flow effects (Re < 100) [ 18 ], a sequence of obstacles (typically cylindrical pillars) can be spaced apart (≥6 d / w for pillar diameter d and channel width w ) to prevent cross-talk, enabling each pillar to independently operate on the cross-sectional fluid flow structure, contributing to a more complex net flow deformation at the end of the pillar sequence. This kind of control over the cross-sectional structure of flowing fluid has enabled novel manufacturing techniques for complex microfiber and microparticle materials [ 19 – 23 ], and new methods for controlling reagents [ 24 ] and particles [ 25 ] in microfluidic flows.

An overview of inertial flow sculpting, showing a three-pillar sequence (d/w = 0.5, y/w = −0.25) sculpting the cross-sectional flow shape (shown as a blue-colored flow stream). In the forward model (depicted below the top-down channel view), each pillar’s flow deformation is precomputed as a two-dimensional advection map, which is rapidly sampled to deform each fluid element in the microchannel flow cross section. (Color version online.)

An overview of inertial flow sculpting, showing a three-pillar sequence ( d / w = 0.5, y / w = −0.25) sculpting the cross-sectional flow shape (shown as a blue-colored flow stream). In the forward model (depicted below the top-down channel view), each pillar’s flow deformation is precomputed as a two-dimensional advection map, which is rapidly sampled to deform each fluid element in the microchannel flow cross section. (Color version online.)

Predicting how fluid will be deformed by a given pillar sequence—the forward problem in flow sculpting—has several experimentally validated models reported in the literature, using either a graphics processing unit [ 7 , 9 , 26 ] or a central processing unit (CPU) [ 6 ] to rapidly simulate flow shapes for arbitrary microfluidic device designs in well under one second on modest computing hardware. This speed is enabled by the independent nature of each flow sculpting operator (pillar) due to large interpillar spacing, which allows for a pillar’s net flow deformation to be precomputed as a two-dimensional advection map, which is used to deform a discretized set of fluid states representing a channel cross section (see Fig. 2 ). A flow sculpting device with some arbitrary sequence of pillars can then be simulated in real-time by sampling a library of precomputed flow deformations, with each pillar accepting an incoming two-dimensional flow pattern as an input, modifying the flow shape according to its precomputed deformation, and then sending this new flow pattern to the next pillar in the sequence (as shown in Fig. 2 ). Note that cross-streamline diffusion (which blurs sculpted shapes) and the pressure required to drive flow (which increases with additional pillars and longer channels) will practically limit sequences to perhaps 10–15 pillars in length, depending on the choice of fluid materials, size and construction of the microfluidic device, and the equipment used to drive flow.

Efficiently using flow sculpting for microscale manufacturing and lab-on-a-chip scenarios mentioned above, and other flow engineering applications, requires one to correctly arrange a pillar sequence to sculpt flow into a desired cross-sectional fluid flow shape—i.e., the inverse problem in flow sculpting. While the forward model is very fast and can be used to build intuition for inertial flow physics, manually designing a flow sculpting device using only the forward model can be extremely difficult owing to the staggering combinatorial complexity within this design space. Using the same inlet flow pattern, a variety of disparate flow shapes can be sculpted using less than ten pillars in a sequence [ 9 ], when sampling from four different pillar diameters ( d / w = {3/8, 4/8, 5/8, 6/8}) and eight equally spaced lateral positions (roughly 10 15 possible pillar sequences). At the same time, there is considerable degeneracy in the space, where many input pillar sequences can lead to similar flow shapes [ 6 , 9 ]. In addition, there is currently no method to perform a “reality check” on the feasibility of a desired fluid flow shape—it may be extremely rare in the design space or impossible to create. Hence, an automated method is needed to efficiently solve the inverse problem in flow sculpting.

Several computational methods have previously been proposed to solve the inverse design of the flow sculpting problem. While gradient-based methods are often used for shape-based optimization in fluid flows [ 27 , 28 ], such schemes are not readily applied to the flow sculpting problem in particular for reasons which were enumerated by Stoecklein et al. [ 6 ]. In contrast, evolutionary algorithms (e.g., GAs) can easily incorporate a forward model with user-specified (image-based) fitness functions to effectively explore the search space. However, like gradient-based methods, GAs do not use prior knowledge of the design space and can also be time consuming, as each fitness function evaluation will compute the forward model to simulate a candidate solution. Additional discussions on the inability to apply gradient-based methods to flow sculpting and the need for faster design iterations are provided in the Supplemental Material on the ASME Digital Collection.

On the supervised learning side of the spectrum in Fig. 1 , Lore et al. [ 29 ] created a deep learning-based method combining a convolutional neural network with simultaneous multiclass classification (CNN-SMC) to simultaneously predict a fixed pillar sequence for a target. The main drawback of this method is that the length of the predicted pillar sequences is fixed, and as Stoecklein et al. [ 30 ] later showed, the performance of this method depends heavily on the sampling of the training data from the design space, even if the space is uniformly covered. Another approach that Lore et al. [ 31 ] proposed consists of an action prediction network (APN) and an intermediate transformation network (ITN). In essence, the APN-ITN breaks down the inverse problem into smaller subproblems, where the ITN predicts possible intermediate shapes of a flow’s deformation and the APN predicts the pillars in between such transformation. The authors showed that they were able to get flow shapes that had an average pixel-match-rate (PMR) of 0.819 when using APN-ITN compared to CNN-SMC that achieved an average PMR of 0.603. Nonetheless, the responsibility of choosing the number of subdivisions falls onto the user, and obtaining good results may result in exceedingly long pillar sequences, which have practical limitations as outlined in the earlier description of flow sculpting. Overall, these methods often do not produce an efficient solution to the inverse problem.

More recently, Lee et al. [ 32 ] proposed a deep RL framework to solve the flow sculpting problem. The framework trains an RL agent to learn a shorter path for a single design objective. They also suggested the application of transfer learning to improve the learning rate of an agent learning a different objective. These methods are denoted as single objective learner and single objective + transfer learning in Fig. 1 . Though this framework utilizes a learning paradigm, it does not effectively exploit learning as the RL agent is ultimately optimized for a single objective and fails when presented with drastically different target designs. Hence, it falls on the optimization side of the design spectrum and these challenges motivated the developments detailed next.

We begin our discussion with a brief background on deep RL in the context of design exploration and optimization. We leverage the flow sculpting problem to illustrate how deep RL can facilitate and improve a design process.

4.1 Reinforcement Learning Environment.

To incorporate the design process in a deep RL framework, we built a design environment modeled after OpenAI gym’s environment [ 33 ] within which the RL agent can interact and explore. In a generic design problem, the environment first generates an initial state of the design and a target. The RL agent then acts on the state of the environment and returns an action. A problem-specific model in the environment uses this action to transition the current state to the next state, and an evaluator assesses if the updated design satisfies the target design’s requirements. The initial design forms the basis of the state of the environment, and depending on the application, the target design may or may not be included as part of the state of the environment. In a multi-objective setting such as the flow sculpting problem, conditioning the agent with the target design can be helpful in its learning process. In contrast, RL agents optimizing for a single goal may not require this conditioning as the agent is capable of learning the goal from the reward signal.

In the case of flow sculpting, the environment first generates a target flow shape and an initial flow shape. Together, these flow shapes form the state of the environment which is observed by the RL agent. Based on the state, the RL agent decides which pillars best deform the current flow shape into the target flow shape. The action of the RL agent is passed back to the environment, where a forward physics model evaluates the deformation of the current flow shape given the RL agent’s decision. Next, the environment updates the current flow shape with the deformed flow shape and generates a reward signal based on the similarity between the current and target flow shapes. The RL agent repeats the process until the flow is deformed to a shape similar to the desired target flow shape. A key point to note here is that in this problem formulation, the state transitions are Markovian, i.e., the deformation of the flow only depends on the current flow shape and the pillar placed, since the pillars are spaced out far enough for each flow deformation to saturate, as discussed by Stoecklein et al. [ 6 ]. The general pipeline of this process is shown in Fig. 3 .

Pipeline of the RL framework as a design strategy to solve the inverse problem, applied on the flow sculpting problem to illustrate how this framework can be used to effectively explore the design space to propose more efficient solutions

Pipeline of the RL framework as a design strategy to solve the inverse problem, applied on the flow sculpting problem to illustrate how this framework can be used to effectively explore the design space to propose more efficient solutions

4.2 Reinforcement Learning Agent.

For the RL agent, we implemented a value-iteration-based learning algorithm called the double deep Q-network (DoubleDQN) [ 34 ]. We decided upon a value-iteration method rather than policy gradient methods [ 35 – 37 ] or model-based methods [ 38 , 39 ] because it strikes a perfect balance between being more sample efficient than policy gradient methods, while avoiding the need to model complex physical environments as required in model-based methods.

In a DQN [ 40 ], the RL agent is parameterized by a deep neural network that learns to approximate the Q -value of a state. The true Q -value of the state is given as the immediate reward plus the sum of discounted reward for taking an action given a state of the environment, as defined by the recursive Bellman ’ s equation . This process of learning the Q-function that satisfies Bellman ’ s equation is also known as Q-learning [ 41 ]. Hence, the RL agent approximates the Q -values of given state-action pairs and selects the pair which returns the maximum cumulative reward. While interacting with the environment, the RL agent also simultaneously updates its network parameters using the reward signal provided by the environment until the error between its approximations and the true Q -value is minimized.

The DoubleDQN is an extension of the DQN where the Q -value estimation and action selection are decoupled using separate neural networks to solve the inherent problem of overestimating Q -values. In essence, the DoubleDQN improves the stability and convergence rate of the RL agent, as presented by Hasselt et al. [ 34 ] and Zuo et al. [ 42 ].

4.3 Hindsight Experience Replay.

Besides using the DoubleDQN to improve the stability of training, we implemented a technique called hindsight experience replay (HER) developed by Andrychowicz et al. [ 43 ]. Good design decisions that result in good reward signals are often sparse during the initial training, thus hampering learning of the RL agent. HER solves this issue by augmenting the Markov transitions so that poor decisions are converted into good decisions with rich reward signals by modifying the target design to the design achieved by the agent. HER is also beneficial in scenarios where design simulations are expensive or when data are limited, as it utilizes failed transitions to artificially create more data without running extra simulations. Therefore, variants of HER have been applied in tandem with RL algorithms to solve various problems such as in [ 44 – 46 ]. The pseudocode of a DoubleDQN with HER is shown in Algorithm 1 and an illustration of this technique applied to flow sculpting is shown in Fig. 4 .

Illustration of the HER to improve the sample efficiency of the RL agent. Using this, poor transitions are repurposed into good transitions which the RL agent can learn for future goals. The top row represents a poor transition and the bottom row represents an example of how HER uses this transition for better learning.

Illustration of the HER to improve the sample efficiency of the RL agent. Using this, poor transitions are repurposed into good transitions which the RL agent can learn for future goals. The top row represents a poor transition and the bottom row represents an example of how HER uses this transition for better learning.

DoubleDQN with HER

  1   Initialize: Q-network parameters θ , replay buffer B , max episode length j , exploration ratio ϵ

 2    for   M episodes   do

 3     Sample a target flow shape

 4      while   i less than j steps and s i is not terminal   do

 5       Take random or argmax Q action, a i , following probability ϵ

 6       Transition to next state s ′ i

 7       Evaluate state s′ i and compute reward r i

 8       Store transition ( s i , a i , r i , s′ i ) in B

 9       Augment HER transition and store in B

10      if   Time to update   then

11      Sample minibatch of transitions ( s , a , r , s′ ) from B

12      Perform optimization and update Q-network parameters θ

A forward model is available to simulate data.

The transitions of the states are Markovian.

There is a limit of seven pillars in a microfludic device.

Fewer pillars reflect a more efficient design.

A flow shape with 90% matching pixels is a success.

5.1 Episodic Length.

In our application, we randomly sampled 500,000 seven-pillar sequences from a set of 32 pillar configurations and evaluated the deformations of the initial flow using the forward physics model. For each new episode, we randomly sample a deformed flow shape and initialize that as the episode’s target flow shape. Since the targets were generated using seven pillars, we limit the length of an episode to seven steps so that the RL agent is required to reach the target using seven pillars or less. If the RL agent fails to reach the target in seven steps, the episode terminates and a new target flow shape is sampled for a new episode. We would also like to point out that the maximum length of the pillars is an arbitrary but reasonable number that we have assumed when formulating the problem. It does not impact the framework in any significant way.

5.2 Evaluation Metric.

5.3 reward function., 5.4 network architecture..

Since the state of the environment is images of flow shapes, we parameterized the RL agent using a CNN. The CNN consists of three convolutional layers followed by three dense layers, with a final layer representing an array of 32 pillars and their corresponding Q -values. Additionally, we use a mixture of tanh and ReLU activation functions in between the layers so that the predicted Q -values are kept within the scale of the true reward. Details of the CNN are shown in Fig. 5 .

Illustration of RL agent’s network architecture. Final dense layer has 32 output channels to represent the 32 possible actions. Final tanh activation layer is used to scale Q-values that reflects the range of the true reward.

Illustration of RL agent’s network architecture. Final dense layer has 32 output channels to represent the 32 possible actions. Final tanh activation layer is used to scale Q -values that reflects the range of the true reward.

5.5 Targeted Refinement.

In addition to evaluating the effectiveness of the RL agent with target flow shapes generated using different random seeds, we tested the capability of the RL framework by employing transfer learning techniques and evaluated the RL agent on targets sampled from outside the training distribution that were sculpted using more than seven pillars. To distinguish the RL agent that was trained with 500,000 different targets with the RL agent that was further refined, we denote the first agent simply as the deep reinforcement learner (DRL) and the latter as the deep reinforcement learner with targeted refinement (DRL + TR). To implement the targeted refinement step, we initialize a new RL agent using the trained weights of the DRL and train it on a single target flow shape with a minimal amount of exploration to encourage faster convergence.

In this section, we present the results of training the RL agent to solve the flow sculpting problem. Figure 6 shows the percentage of target flow shapes the RL agent achieves in an interval during training and evaluation. Each training interval consists of a 1000 randomly sampled target flow shapes, and each evaluation interval consists of 20 target flow shapes. The training and evaluation target flow shapes were sampled using different random seeds. From Fig. 6 , we observe that the training of the RL agent converges approximately after 100 intervals or 100,000 target flow shapes. After 100 intervals, the RL agent consistently reaches 90% of the target flow shapes in a given interval. The evaluation results display a similar trend where the agent undergoes one evaluation interval after every training interval. Given that the agent also achieves 90% of the target flow shapes for every evaluation interval, we validate that the RL agent is truly learning to generalize to most of the flow shapes instead of memorizing sequences for particular flow shapes. Additional results comparing the DRL agent in other variants of the flow sculpting environment are provided in the Supplemental Material on the ASME Digital Collection.

Training and evaluation performance of the DRL agent. In both cases, the RL agent consistently finds a solution to 90% of the randomly generated target flow shapes approximately after 100 training intervals

Training and evaluation performance of the DRL agent. In both cases, the RL agent consistently finds a solution to 90% of the randomly generated target flow shapes approximately after 100 training intervals

6.1 Deep Reinforcement Learner.

Next, we present the performance of the trained DRL in multiple scenarios. To illustrate the capability of the RL agent as a design tool, we show a few examples of target flow shapes and the corresponding flow shapes that the RL agent created during evaluation in Fig. 7 . As tabulated in Table 1 , the RL agent is able to find a shorter sequence of pillars that sculpts the initial flow shape to shapes that are similar to the target flow shapes. Figure 8 shows the total distribution of pillar sequence lengths for flow shapes that the RL agent successfully sculpted during evaluation. As can be seen, the distribution of pillar sequence length is heavily skewed to the left with most of the target flow shapes being sculpted with three or less pillars. This implies that the RL agent is able to bypass trivial intermediate shapes or find a more efficient trajectory to reach the target.

Examples of target flow shapes sampled during evaluation and corresponding flow shapes designed by the DRL agent

Examples of target flow shapes sampled during evaluation and corresponding flow shapes designed by the DRL agent

Normalized distribution of pillar sequence length for solutions successfully designed by the DRL agent during evaluation

Normalized distribution of pillar sequence length for solutions successfully designed by the DRL agent during evaluation

Comparison of pillar sequences crafted using the forward model versus pillar sequences designed by the DRL agent with the corresponding PMR of the flow shapes

Additionally, we analyzed the entire design space for two different target flow shapes (targets 1 and 2 in Fig. 7 ) using a brute force approach. 2 The PMR values for every possible pillar sequence of length 1–7 (≈34 billion flow shapes) were computed with respect to the target flow shapes and compared with the designs found by the DRL agent. These analyses are shown in Figs. 9 and 10 , where we observed that the designs discovered by the DRL agent are in the 99th percentile of the overall design space. This demonstrates that while better solutions (in terms of PMR) do exist, the DRL agent is capable of using an extremely sparse subset (500,000 ≈ 1.4 E − 5 % ⁠ ) of the entire design space to reach multiple different targets. Additional details of these analyses are also shown in the Supplemental Material on the ASME Digital Collection.

Histogram of all possible solutions in the design space and their corresponding PMR for target 1. The data points in the gray shaded area denote alternative pillar sequences that result in flow shapes with a higher PMR than the sequence designed by the DRL agent.

Histogram of all possible solutions in the design space and their corresponding PMR for target 1. The data points in the gray shaded area denote alternative pillar sequences that result in flow shapes with a higher PMR than the sequence designed by the DRL agent.

Histogram of all possible solutions in the design space and their corresponding PMR for target 2. The data points in the gray shaded area denote alternative pillar sequences that result in flow shapes with a higher PMR than the sequence designed by the DRL agent.

Histogram of all possible solutions in the design space and their corresponding PMR for target 2. The data points in the gray shaded area denote alternative pillar sequences that result in flow shapes with a higher PMR than the sequence designed by the DRL agent.

6.2 Targeted Refinement for Out of Sample Designs.

In this section, we provide additional results on the performance of the DRL evaluated on a few unique flow shapes that are fundamental in microfluidic applications as discussed in [ 6 ] and the applicability of transfer learning techniques in these situations. We note that most of these flow shapes were sampled using seven or more pillars from the forward model; therefore, these target flow shapes are considered out-of-distribution samples. Using just the weights of the DRL, the RL agent is able to reach three out of the six given targets even without any need of targeted refinement as shown in Fig. 11 and Table 2 . This demonstrates that the RL agent is able to extend its knowledge to reach targets that are not far away from the distribution of shapes that it has learned. Hence, it can be leveraged as a preliminary design strategy in the design process before applying any targeted refinement steps.

Example of fundamental shape transformations that the DRL designed; the target flow shapes were generated using sequences longer than the training sequences

Example of fundamental shape transformations that the DRL designed; the target flow shapes were generated using sequences longer than the training sequences

Comparison of pillar sequences designed using forward model versus pillar sequences designed by the DRL for flow shapes as shown in Fig. 11  

For the other three shapes that the DRL did not reach, we further optimize the RL agent by initializing the weights of the DRL and training it conditioned on the single target. The results show that the RL agent is able to consistently reach the target flow shapes with only a finite number of update steps and a constant exploration ratio of 0.01. Figure 12 and Table 3 show the results of the RL agent before and after applying targeted refinement. In particular, we show that even shapes such as Shift, as shown in Fig. 2 , that the DRL can reach, applying the targeted refinement step can further enhance the quality of the solution, as shown by the increase in PMR in Table 3 . This exemplifies the effectiveness of transfer learning techniques where the knowledge of a previously trained agent can be leveraged to reach particularly hard to achieve (out-of-sample) designs.

Performance comparison of the DRL vs DRL + TR for a few fundamental flow shapes. Targeted refinement step can be utilized to improve the performance of the DRL for more complex target flow shapes. In the case of the Shift transformation, applying the targeted refinement step on the DRL also improves the PMR.

Performance comparison of the DRL vs DRL + TR for a few fundamental flow shapes. Targeted refinement step can be utilized to improve the performance of the DRL for more complex target flow shapes. In the case of the Shift transformation, applying the targeted refinement step on the DRL also improves the PMR.

Comparison of pillar sequences proposed by the DRL versus DRL + TR for corresponding flow shapes shown in Fig. 12  

6.3 Reinforcement Learning Framework for Domain Insight.

Next, we show how the RL framework can be utilized to gain insights on the exploration of the RL agent in the design space to reach the target. We first extract all the states s that the RL agent explored during the training process. Since any particular state may be visited multiple times, we remove the repeated states to obtain the unique states s unique . Using this, we generate the flow shapes I unique for the corresponding unique states and perform the PCA [ 47 ] on these flow shapes to reduce their dimensionality to two dimensions as in [ 30 ].

In Fig. 13 , the flow sculpting design space is visualized by mapping flow shape images onto the reduced two-dimensional PCA-space (PCA was performed on the entire dataset of I unique , but we only show 50,000 points out of 300,000 points for the purpose of a clearer visualization). The flow shape encapsulate (referenced in Table 2 ) is used as an anecdotal example to compare the pillar sequences found by the RL agent, GA, and the ground truth (i.e., the pillar sequence used to create the target flow shape). Each point in a trajectory represents the flow shape at that step in a pillar sequence, projected onto the PCA-space. The comparison of the three trajectories as seen in the PCA-space is shown in Fig. 13(a) . Here, the GA chromosome is constrained to seven pillars to allow a flexible design space in a similar manner as the CNN-SMC from [ 29 ]. We observe that the RL agent is able to reach the final shape by taking a different path with a similarity of PMR = 0.963 to the final target image. Both the RL agent and GA are able to reach the target with lesser pillars than the ground truth.

Principal component analysis of the unique states visited by the RL agent and comparison of the paths taken by GA, RL agent, and ground truth target

Principal component analysis of the unique states visited by the RL agent and comparison of the paths taken by GA, RL agent, and ground truth target

Additionally, we show three other paths from the DRL’s evaluation (targets 1, 2, and 3 in Fig. 7 and Table 1 ) as anecdotal examples. In flow shape 1 and flow shape 3 (corresponding to targets 1 and 3), we see that although the RL agent takes fewer steps, it only reaches a proximal shape with a lower similarity (PMR = 0.92). This strong correlation between PMR and proximity of points in PCA reduced dimensions establishes the strong correlation of the shapes with the PCA (helping us understand the RL agent better). In flow shape 2 (corresponding to target 2) and flow shape 3, we notice that the RL agent takes fewer steps, but these steps are larger in the PCA-space (correlated with dissimilar steps to reach the target). The particular sequence of pillars corresponding to the fittest individual from GA also takes distant steps to reach the target with high accuracy; however, arriving at that sequence using GA comes with a high computational cost (often thousands of forward model simulations, regardless of the chromosome size [ 7 ]), whereas the DRL agent is able to leverage sequential information to efficiently traverse the design space. Additional results contrasting the paradigms of DRL and GA can also be found in the Supplemental Material on the ASME Digital Collection.

6.4 Reinforcement Learning Framework With Human in the Loop.

We demonstrate how the trained RL agent can be combined with user input to generate feasible designs with a human-in-the-loop framework. By exploiting the nature of the RL agent always taking the next best step to reach a goal, a user can participate in the design process by making manual decisions intermittently and allowing the RL agent to optimize the design from that point onwards. This can be an extremely powerful tool as it allows the user to both impart domain knowledge in the design process while experimenting and also gain useful insights into the solution space.

In the context of flow sculpting, instead of performing automatic pillar sequence design using the RL agent, the user can impose certain constraints on the design, for example, starting with a specific pillar or using the same pillar for the first two pillars. Thus, the RL framework enables the user to inject such decisions and lets the RL agent find a sequence that reaches the target design. Figure 14 shows an example of utilizing the RL agent to discover multiple trajectories with human input. Trajectory (d) illustrates the original sequence of steps that was sculpted using seven pillars. Using a fully automated RL design process, the RL agent is able to reach a similar shape using just two or three steps as shown in trajectories (c) and (e). Meanwhile, trajectories (a), (b), and (f) show design processes with user input. Red segments of the trajectories highlight a user’s decision, while green segments represent the decisions of the RL agent. Additionally, the position of the intermediate states in the reward bands represents the divergence of the flow shape from the target flow shape. Trajectory (b) shows one possible configuration of pillars where the RL agent is able to reach the target even with three manual (possibly poor) pillar placements. Conversely, trajectories (a) and (f) show sample trajectories where a user’s constraints might be too tight for the RL agent to recover from the diverging states to reach the target design.

Possible trajectories optimized by the RL agent with a human-in-the-loop framework. Blue trajectory represents the sequence of shape transformations using the forward model to sculpt the target shape. Green segments of the trajectory show design decisions made by the RL agent and red segments represent manual design decisions. (Color version online.)

Possible trajectories optimized by the RL agent with a human-in-the-loop framework. Blue trajectory represents the sequence of shape transformations using the forward model to sculpt the target shape. Green segments of the trajectory show design decisions made by the RL agent and red segments represent manual design decisions. (Color version online.)

Hence, this framework can be used as a potential avenue for the user to glean useful insights on the feasibility of certain imposed constraints by observing the performance of the agent. Therefore, using the RL framework as a design tool can facilitate the manual design process in addition to automatic designs. More importantly, this framework may potentially be generalized and applied to other design problems where a designer can have a varying degree of involvement.

The realization of design frameworks for inverse problems in engineering has significant impacts. As the amount of engineering and simulation data proliferates, it is imperative to have design strategies that incorporate techniques that exploits the vast amount of data (learning) as well as domain knowledge (optimization). We propose a deep reinforcement learning framework that lies in the middle of that design exploration tools spectrum, thus utilizing the advantages of both learning and optimization concepts. Using the flow sculpting problem as an illustrative design problem, we show that the RL agent learns multiple feasible solutions for a wide variety of target flow shapes using techniques that effectively utilize simulation data. Additionally, we demonstrate that the RL agent can be further refined through a targeted refinement step to solve for more complicated goals without searching through the entire design space again. This leads to reduced computation and, consequently, shorter design life cycles. Using the states explored by the RL agent, we conducted PCA on the unique states and visualize the paths the RL agent takes to reach the goal in the PCA coordinates in comparison to GA and the forward model, which demonstrates how the exploration of the RL agent can be applied to gain insight into the design space. Finally, we show how the RL framework can be combined with user’s input to aid designers in discovering new solutions while gaining valuable insight into the design space. While we provide a successful deep RL-based design case study here, there are still many avenues of work that has yet to be explored. One future extension of this work is to extend the proposed framework for continuous design spaces and compare performance with existing gradient-based methods such as [ 27 , 48 ]. This will allow us to better understand the benefits and drawbacks of our approach. Finally, we believe that this framework can be extended to other design problems with similar formulations using some of the best practices provided in this paper to tune this framework to specific design applications.

The computation was performed on 128 CPU cores for five days to generate all the designs in the design space.

This work was supported, in part, by U.S. AFOSR under the YIP grant FA9550-17-1-0220, DARPA-PA-18-02-02 (AIRA), and NSF DMREF 1435587. The computations were performed using the Nova Cluster at Iowa State University (Funder ID: 10.13039/100009227), which is funded by NSF MRI Grant No. 1726447. Any opinions, findings, and conclusions or recommendations expressed in this publication are those of the authors and do not necessarily reflect the views of the sponsoring agencies.

Supplementary data

Supplementary File & Supplementary Figures 1-4

Get Email Alerts

Related articles, related proceedings papers, related chapters, affiliations.

  • Accepted Manuscripts
  • ASME Journal Media
  • About the Journal
  • Editorial Board
  • Information for Authors
  • Call for Papers
  • Rights and Permission
  • Online ISSN 1528-9001
  • Print ISSN 1050-0472

ASME Journals

  • About ASME Journals
  • Submit a Paper
  • Title History

ASME Conference Proceedings

  • About ASME Conference Publications and Proceedings
  • Conference Proceedings Author Guidelines

ASME eBooks

  • About ASME eBooks
  • ASME Press Advisory & Oversight Committee
  • Book Proposal Guidelines
  • Frequently Asked Questions
  • Publication Permissions & Reprints
  • ASME Membership

Opportunities

  • Faculty Positions
  • ASME Community

American Society of Mechanical Engineers Logo

  • Accessibility
  • Privacy Statement
  • Terms of Use
  • Get Adobe Acrobat Reader

This Feature Is Available To Subscribers Only

Sign In or Create an Account

A generalisable tool path planning strategy for free-form sheet metal stamping through deep reinforcement and supervised learning

  • Open access
  • Published: 22 April 2024

Cite this article

You have full access to this open access article

case study on deep learning

  • Shiming Liu 1 ,
  • Zhusheng Shi   ORCID: orcid.org/0000-0002-3640-3958 1 ,
  • Jianguo Lin 1 &

16 Accesses

Explore all metrics

Due to the high cost of specially customised presses and dies and the advance of machine learning technology, there is some emerging research attempting free-form sheet metal stamping processes which use several common tools to produce products of various shapes. However, tool path planning strategies for the free forming process, such as reinforcement learning technique, derived from previous path planning experience are not generalisable for an arbitrary new sheet metal workpiece. Thus, in this paper, a generalisable tool path planning strategy is proposed for the first time to realise the tool path prediction for an arbitrary sheet metal part in 2-D space with no metal forming knowledge in prior, through deep reinforcement (implemented with 2 heuristics) and supervised learning technologies. Conferred by deep learning, the tool path planning process is corroborated to have self-learning characteristics. This method has been instantiated and verified by a successful application to a case study, of which the workpiece shape deformed by the predicted tool path has been compared with its target shape. The proposed method significantly improves the generalisation of tool path planning of free-form sheet metal stamping process, compared to strategies using pure reinforcement learning technologies. The successful instantiation of this method also implies the potential of the development of intelligent free-form sheet metal stamping process.

Similar content being viewed by others

case study on deep learning

Deep reinforcement learning methods for structure-guided processing path optimization

case study on deep learning

Deep representation learning and reinforcement learning for workpiece setup optimization in CNC milling

case study on deep learning

Deep Reinforcement Learning for autonomous pre-failure tool life improvement

Avoid common mistakes on your manuscript.

Introduction

Sheet metal components are nowadays ubiquitous in various industrial products, such as automobile, aircrafts and high-speed trains. Benefit from the short forming cycle of contemporary advanced sheet metal stamping techniques, which make it feasible for the mass production of lightweight sheet metals, the manufacturing budgets are constantly reduced, and a burgeoning era of industrialisation arises. However, the formed products from sheet metal stamping technology are subject to the unalterable shapes of punch and dies, for which the limited forming flexibility impedes the applicability of the off-the-shelf stamping equipment to new sheet metal components. In addition, the extraordinarily high capital cost for specialised punches and dies, especially for large-scale stamping, leads to expensive prototyping and arduous research and development of novel sheet metal designs. Thus, to extricate sheet metal manufacture from these constraints and to fulfill the requirement of high-volume personalised production in sheet metal forming industry nowadays (Bowen et al., 2022 ), flexible forming processes, which can change workpiece geometry without requiring different tool sets, were developed (Allwood & Utsunomiya, 2006 ). An emerging free-form sheet metal stamping technique was brought up (Liu et al., 2022 ), which consecutively deforms a sheet metal to its target shape from blank using several small-scale punch and dies of different shapes. In this regard, of particular concern is the generation and optimisation of the forming tool path which could yield the forming result comparable to the forming target.

Due to the forming characteristics of the traditional sheet metal stamping process, the sheet metal part is usually formed within a few or just one forming step, for which no research on tool path for stamping can be found. In sheet metal forming industry, most studies involving tool path generation and optimisation were performed for incremental sheet metal forming (ISF) process, which deforms sheet metal to its target shape with a sequence of incremental deformations. Attanasio et al. ( 2006 ) manually designed several tool paths for a two point ISF to manufacture an automotive part, by varying the step depth and scallop height. They found that setting low values of both these parameters can improve the final dimensional accuracy and surface quality. Similarly, Tanaka et al. ( 2005 ) manually generated tool paths for an incremental sheet punching (ISP) process based on the target workpiece CAD, tool shape, crossfeed, depth and tool path mode, of which the deformed workpiece had a maximum length of 76 mm. Azaouzi and Lebaal ( 2012 ) proposed a tool path optimisation strategy for single point ISF using the response surface method and sequential quadratic programming algorithm, which was tested for a spiral tool path and realised through finite element analysis (FEA). This method was reported to reduce the manufacturing time and improve the homogeneity of thickness distribution of asymmetric parts. Malhotra et al. ( 2011 ) proposed a tool path generation strategy to alleviate the unintentionally formed stepped features on the component base occurring in a multi-pass single point ISF process, by combining in-to-out and out-to-in tool paths for each intermediate shape. It was found that this strategy effectively reduced the occurrence of stepped features compared to pure out-to-in tool paths.

Over the past decade, machine learning technology has seen its unprecedented development in image recognition and natural language processing thanks to the remarkably increased computation power of central processing units (CPUs). Impressed by its extraordinary learning capability, researchers started to harness machine learning or deep learning technologies in sheet metal forming industry, such as ISF (Nagargoje et al., 2021 ). Most of them focused on process monitoring (Kubik et al., 2022 ), surrogate model for forming results prediction (Low et al., 2022 ) and process parameters prediction (Liu et al., 2021 ). Machine learning is well known through three categories of techniques (Monostori et al., 1996 ): supervised learning (SL), unsupervised learning, and reinforcement learning (RL). With regard to forming tool path planning, most applications exploited supervised and reinforcement learning techniques. Opritescu and Volk ( 2015 ) and Hartmann et al. ( 2019 ) utilised supervised learning neural networks for optimal tool path prediction for 2-D and 3-D automated driving processes (Kraftforming), respectively. Curvature distribution on target workpiece surface was computed as inputs, and they reported that the careful workpiece digitisation was of great importance to achieve good learning efficiency. The tool path for automated wheeling process was predicted by Rossi and Nicholas ( 2018 ) using fully convolutional network (FCN), with 75% prediction accuracy. Störkle et al. ( 2019 ) used linear regressor, decision tree, random decision forest, support vector machine and Gaussian process regressor to predict the optimal local support force and support angle distribution along a tool path in an ISF process. Liu et al. ( 2022 ) developed a recursive tool path prediction framework for a rubber-tool forming process, which embedded a deep supervised learning model for tool path planning. They compared the performance of three series of state-of-the-art models, including single CNNs, cascaded networks and convolutional long short-term memory (LSTM) models in tool path learning, from which the convolutional LSTM was reported to be the most superior. Compared to supervised learning, reinforcement learning applications to tool path planning of sheet metal forming process have been significantly ignored. This could be due to the expensive acquisition of computational or experimental data for RL algorithms training. Störkle et al. ( 2016 ) proposed a RL-based approach for the tool path planning and adjustment of an ISF process, which increased the geometric accuracy of the formed part. Liu et al. ( 2020 ) used a reinforcement learning algorithm, namely deep Q-learning, for the tool path learning of a simple free-form sheet metal stamping process. The FE computation was interfaced to the Q-learning algorithm as the RL environment, which provided real-time forming data for algorithm training.

Although there have been numerous studies of tool path planning for various sheet metal forming processes, they all have a common issue in generalising the methods to completely different target workpiece shapes, which hinders the widespread applications of machine learning based tool path planning strategies. In other words, new data have to be acquired to train the machine learning models or algorithms again to have a good prediction accuracy for different target, especially for approaches exploiting reinforcement learning. Generalisation gap is a common issue in RL applications (Kirk et al., 2021 ), which is a challenge under constant research. An evident reason leading to its inferior generalisation is that the data collected during RL training are mostly lying on the path towards a certain optimisation target. With a completely different target, the model would fail in generalisation since it was trained without useful data towards the new target.

Table  1 briefly compares the methods introduced above in tool path planning and summarises their deficiency in terms of real-world application. “Curse of dimensionality” indicates that the method can be error-prone once the target workpiece shape becomes complex, since the available data would become sparse and exponentially increased training data is required to obtain a reliable prediction result.

The aim of this research is to explore the generalisation of deep learning technologies in forming tool path planning for a 2-D free-form sheet metal stamping process. A generalisable tool path planning strategy, through the design of deep reinforcement and deep supervised learning technologies at different stages, was proposed in this paper. In this strategy, RL was used to explore the optimal tool paths for the target workpiece, with which the efficient tool path for a certain group of workpieces was learned using SL. With no metal forming knowledge in prior, the path planning process was corroborated to possess self-learning characteristics, from which the path planning results can be self-improved over time. The generalisation of this strategy was realised by factorising the entire target workpiece into several segments, which were classified into three groups. The optimal tool paths for several typical workpiece segments from each group were learned from scratch through deep reinforcement learning, and deep supervised learning models were used to generalise the intrinsic forming pattern of each group of segments. Six deep RL algorithms, from two different categories, were compared regarding their tool path learning performance for the free-form stamping process. The RL process was enhanced with the introduction of two forming heuristics. Three deep SL models were trained with two tool path datasets consist of different data amount and their performance were evaluated in terms of forming goal achievement and the dimension error of the deformed workpiece, and the forming results from a pure reinforcement learning method were also presented as comparisons. At last, a case study was performed to verify the generalisable tool path planning strategy with a completely new target workpiece.

The main contributions of this work are as following: 1) developing a generalisable tool path planning strategy for arbitrary 2-D free-formed sheet metal components for the first time, which successfully integrated deep RL and SL algorithms to learn and generalise efficient forming paths, and validating through a case study; 2) analysing a free-form rubber-tool forming process and discovering 2 close punch effects; 3) quantitatively analysing the performance of 6 deep RL algorithms and 3 deep SL models on tool path learning and generalisation, respectively. In addition, two heuristics were derived from real-world empirical experience and have been demonstrated to significantly facilitate the tool path learning process.

Methodology

In this section, the application of the proposed tool path planning strategy was first introduced in " Free-form stamping test and digitisation of forming process " section, followed by the detailed illustration of the generalisable tool path planning strategy in " Generalisable tool path planning strategy " section. " Forming goal and forming parameters design " section presents the forming goal that the strategy needs to achieve and the forming parameters to be selected. " Deep reinforcement learning algorithms and learning parameters " section and 2.5 illustrate the designation details of the RL and SL algorithms, respectively.

Free-form stamping test and digitisation of forming process

A rubber-tool forming process proposed in the Authors’ previous research (Liu et al., 2022 ) was adopted to consecutively deform a sheet metal while retaining a sound surface condition during the forming process. From the test setup and FE model shown in Fig.  1 , the workpiece was deformed by a rubber-wrapped punch on a workbench rubber. The specification of the setup is summarised in Table  2 . The deformation was accomplished by translating the punch towards the workpiece along Y-axis and lifting it up, considering springback. The workpiece was consecutively deformed at different locations towards its target shape. At each step of the free forming process, the workpiece was repositioned through rotation and translation to relocate the punch location, the details of which can be found in (Liu et al., 2022 ). The deformation process was set up, for simplicity, in 2-D space and computationally performed with Abaqus 2019. The FE plain strain model was configured with the material of AA6082 for the workpiece and natural rubber for the punch rubber and workbench rubber, with details in (Liu et al., 2022 ). Mesh for the workpiece was of size 0.1 mm, with 17,164 elements in total.

figure 1

Test setup and FE model for the rubber-tool forming. The lengths of the deformation and trim zones are 30 and 10 mm, respectively

To realise the free forming process in FE simulations, the forming process was digitised and standardised for precise process control. As shown in Fig.  1 , the workpiece was divided into two zones, namely deformation zone and trim zone, with lengths of 30 and 10 mm, respectively. The punch could only work on the deformation zone, and the trim zone would be trimmed after the deformation had completed. The trim zone was reserved without deformation due to that the significant shear force from the edge of the workpiece could easily penetrate the workbench rubber, which would cause non-convergence issue in FE computation. The deformation zone was marked by 301 node locations, numbered from left to right, which are consistent with the mesh node locations.

To quantitatively observe and analyse the workpiece shape, a curvature distribution ( \(\varvec{K}\) ) graph was generated to represent the shape of the workpiece deformation zone, as shown in Fig.  2 . The local curvature \(K\) of a point on the workpiece was calculated by Menger curvature, which is the reciprocal of the radius of the circle passing through this point and its two adjacent points. Thus, a total of 303 mesh nodes on the top surface of the workpiece were used to generate the \(\varvec{K}\) -graph, including 301 nodes in the deformation zone and one additional node next to each end of the deformation zone. Using 0.1 mm of interval distance between each two contiguous node locations, the workpiece shape can be regenerated from its \(\varvec{K}\) -graph.

figure 2

Example of workpiece shape (left) and its curvature \(\varvec{K}\) distribution along node locations (right). The region highlighted in red in the drawing denotes the deformation zone, and the \(\Delta \varvec{K}\) -graph is generated from the workpiece top surface of this zone

Generalisable tool path planning strategy

The proposed generalisable tool path planning strategy works by segmenting the target workpiece, based on the shape of three groups of segments classified in prior, into a few segments whose subpaths are generated through deep learning approach. The entire tool path for the target workpiece would be acquired by aggregating the subpaths for all workpiece segments. By classifying common groups of segments with the same shape features, any arbitrary workpiece can be regarded as assembled by segments from these groups. From the theoretical perspective, through dynamic programming, the tool path learning complexity for a complete workpiece was reduced to simpler subproblems of path learning for each group of workpiece segments. As the segments in each group are highly correlated in shape, the tool path learning for each segment group is significantly more generalisable than that for arbitrary workpieces. From the empirical perspective, representative groups of workpiece segments are finite, while there are infinite number of possible target workpiece shapes. After studying the efficient forming path for each segment group, the tool path for any arbitrary workpiece can be obtained by aggregating the tool path for all its segments, which yields the superior generalisability of this strategy.

To quantitatively measure the shape difference between the target and current workpiece, a curvature difference distribution graph ( \(\Delta \varvec{K}\) -graph) was generated by subtracting the current \({\varvec{K}}_{C}\) -graph from the target \({\varvec{K}}_{T}\) -graph to represent the workpiece state, as shown in Fig.  3 . The current workpiece was considered to be close to its target shape if the value of \(\Delta \varvec{K}\) approaches zero at any point along the longitudinal length. From the example in Fig.  3 , the \(\Delta \varvec{K}\) -graph was split into 6 segments, A-F. Through the segmental analysis of the \(\Delta \varvec{K}\) -graphs of real-world components (e.g. aerofoil), three groups of segments were classified, of which any arbitrary \(\Delta \varvec{K}\) -graph can be composed. Groups 1 and 2 consist of half-wave shaped and quarter-wave shaped segments, and Group 3 includes constant-value segments representing circular arcs or flat sheet.

figure 3

Schematic digitisation procedure for workpiece state representation \(\Delta \varvec{K}\) -graph and the classification of three groups of segments. The drawings for target and current workpieces depict their top surfaces. The dashed lines in Group 2 signify other segments having the same shape features as the solid line, which are also counted in this group. L-length denotes longitudinal length

figure 4

The generalisable tool path planning strategy through deep reinforcement and supervised learning

There are two phases in the generalisable tool path planning strategy, learning phase and inference phase, as shown in Fig.  4 . At the learning phase, for each group of segments, m variants of \(\Delta \varvec{K}\) -graphs, \(\Delta {\varvec{K}}_{i,j}\) , were created as shown in Fig.  4 a, where i is the group number and j is the variant number. The tool path, \({\varvec{P}}_{i,j}\) , for each of the variant of segment in each group was then learned and planned through deep reinforcement learning, without any path planning experience in prior. After the tool paths for all segments were obtained, a deep supervised learning model was trained with the tool path data, \({\varvec{P}}_{i,j}\) , for each group to generalise the efficient tool path patterns for segments from each group.

At the inference phase, as shown in Fig.  4 b, a new workpiece was firstly digitised to the \(\Delta \varvec{K}\) -graph and segmented in accordance with the three groups. Five segments, A-E, were obtained in this example, and their tool paths were predicted using the deep supervised learning model trained for their particular groups at the learning phase, respectively. At last, the entire tool path for the workpiece was obtained by aggregating all the tool path for each segment. To sum up, the RL and SL algorithms were utilised for different purposes in this strategy. The RL model explored the optimal tool path for each single target workpiece, which was used as the training data of the SL models to learn the efficient forming pattern for a group of workpieces with common features. In application, only SL models were used to infer the tool path of a new workpiece.

In the segmental analysis of the \(\Delta \varvec{K}\) -graphs, taking the workpiece in Fig.  4 b as an example, one can easily find that most segments are from Group 1. The segments from Group 2 can only be seen at the two ends of the components, and the segments from Group 3 only exist in workpiece with circular arc. Thus, Group 1 was used for instantiation of the generalisable tool path planning strategy, and a total of 25 variants of segments in this group were arbitrarily created through the method in Appendix A.

Forming goal and forming parameters design

In the context of the free-form sheet metal stamping test setup presented in Fig.  1 , at each step of the forming process, the stamping outcome is determined by the punch location and punch stroke. However, the large amount of punch location options, 301 in total, would incur considerably vast search space for the tool path planning problem. Thus, to simplify the problem, a forming heuristic (Heuristic 1) was applied to this forming process, which is in conformity with practical forming scenario, to allow the node location that had the most salient shape difference from the target workpiece to be selected at each forming step. In a word, the node location where the value of \(\Delta \varvec{K}\) is highest in the \(\Delta \varvec{K}\) -graph was selected at each step.

As the workpiece shape is close to its target when the \(\Delta \varvec{K}\) approaches zero at any point along the longitudinal length, the goal of the free forming in this research was considered to be achieved if \(\text{max}\left(\left|\Delta \varvec{K}\right|\right)\le 0.01\) mm −1 . Thus, in order to determine an appropriate range of punch stroke values to select during deformation, by which the forming goal is possible to achieve within a relatively small search space, a preliminary study was performed to investigate the free forming characteristics. Two phenomena, namely close punch effect 1 and 2 (CPE1 and CPE2), were discovered in this study, which are shown in Fig.  5 and Fig.  6 .

From Fig.  5 , the \(\Delta \varvec{K}\) -graphs of three workpieces before and after the same punch with stroke of 3.0 mm at the location of 151 are shown. The three workpieces had been consecutively deformed by 2, 3 and 4 punches in the vicinity of this node location, as shown respectively in Fig.  5 a, b and c. It can be seen that more prior deformation underwent near the node location of interest, less deformation was resulted in, i.e., larger punch stroke was required to accomplish a certain change of shape at this location. This phenomenon was named CPE1, which barely escalated with more than 4 punches in prior.

figure 5

Close punch effect 1 on the punch with stroke of 3.0 mm at the location of 151. a , b and c present the \(\Delta \varvec{K}\) -graphs before and after this punch on the workpiece which has been consecutively deformed, in prior, by 2, 3 and 4 punches, respectively

Figure  6 shows the \(\Delta \varvec{K}\) -graphs of two workpieces before and after 1 punch and 50 punches, respectively. From Fig.  6 a, it can be seen that the \(\Delta \varvec{K}\) values around node location of 118 was decreased by about 0.002 mm −1 after deformation applied to location of 132, although the punch at the latter location had less effect on the \(\Delta \varvec{K}\) value than that at the former location did. From Fig.  6 b, it can be seen that the workpiece had been deformed at location of 118 since the 2nd step, and the \(\Delta \varvec{K}\) value at this location was affected by the punches nearby in the following 50 steps and decreased by about 0.008 mm −1 . This phenomenon was named CPE2, whose area of influence covers approximately 5 mm (about 50 node locations) around the node location.

figure 6

Close punch effect 2 from the punches near location of 118. a and b present the \(\Delta \varvec{K}\) -graphs before and after 1 punch and 50 punches, respectively. The area highlighted by dashed circle is where CPE2 was found

From the analysis above, it was found that a stroke of 2.1 mm can reach the forming goal at the 1st punch (with no CPE), and that of at least 3.6 mm was needed to overcome CPE2 and reach the forming goal. Thus, 19 options of punch strokes, ranging from 2.1 to 3.9 mm in 0.1 mm increments, were determined.

Deep reinforcement learning algorithms and learning parameters

Reinforcement learning is a technology which learns the optimal control strategy through active trial-and-error interaction with the problem environment. A reward is delivered by the environment as feedback for each interaction, and the goal of reinforcement learning is to maximise the total rewards. Almost all RL problems can be framed as a Markov Decision Process (MDP), which is defined as (Sutton & Barto, 2017 ):

where \(S\) is a set of possible states, \(\mathcal{A}\) is a set of possible actions, \(R\) is the reward function, \(P\) is the transition probability function and \(\gamma\) is the discounting ratio ( \(\gamma \in \left[\text{0,1}\right]\) ). In this research, \(S\) includes the workpiece state representation \(\Delta \varvec{K}\) and \(\mathcal{A}\) includes the options of punch stroke. \(P\) is unknown in this research problem, for which model-free RL algorithms are to be applied. The bold capital characters here are used to distinguish them from scalar values in the subsequent equations, such as state or action at a single step.

With the terms introduced above, the RL process can be briefly illustrated with a loop: from the state \({s}_{t}\) at time t , an action \({a}_{t}\) is based on the current policy, which leads to the next state \({s}_{t+1}\) and a reward \({r}_{t}\) for this step. To measure the goodness of a state, state-value and action-value (also called Q-value) are commonly used, which are respectively defined as follows (Sutton & Barto, 2017 ):

where \(\mathbb{E}\) denotes expectation and \(\pi\) denotes policy. The term \(\sum _{t=0}^{\infty }{\gamma }^{t}{r}_{t}|{s}_{t},\pi\) is the cumulative future rewards under policy \(\pi\) from t , known as return, of which the superscript and subscript denote exponent and time step, respectively. Thus, the optimal policy \({\pi }^{*}\) is achieved when the value functions produce the maximum return, \({V}^{*}\left({s}_{t}\right)\) and \({Q}^{*}\left({s}_{t},{a}_{t}\right)\) .

Using Bellman’s Equation (Sutton & Barto, 2017 ), which decompose the value functions to immediate reward plus the discounted future rewards, the optimal value functions can be iteratively computed for every state to obtain the optimal policy:

Two categories of RL algorithms were investigated in this research, namely value-based and policy-based approaches. When value functions, Eqs. ( 2 ) and ( 3 ), are approximated with neural network, traditional RL becomes deep reinforcement learning (DRL). For the value-based approaches, three Q-learning algorithms, namely deep Q-learning, double deep Q-learning and dueling deep Q-learning, were implemented. For the policy-based approaches, three policy gradient algorithms, namely Advantage Actor-Critic (A2C), Deep Deterministic Policy Gradient (DDPG) and Proximal Policy Optimisation (PPO), were implemented.

As shown in Eq. ( 5 ), the optimal policy is obtained by iteratively updating the Q-value function for each state-action pair. However, it is computationally infeasible to compute them all when the entire state and action space becomes enormous. Thus, Q-learning algorithm (Mnih et al., 2015 ) was brought up to estimate the Q-value function using a function approximator. Three function approximators were investigated in this study: Deep Q-Network (DQN), Double Deep Q-Network (Double-DQN) and Dueling Deep Q-Network (Dueling-DQN), whose objective functions can be found in existing works (Mnih et al., 2015 ; van Hasselt et al., 2016 ; Wang et al., 2015 ). It is noted that Double-DQN alleviates the Q-value overestimation problem for DQN by decomposing the max operation in the target Q-value into two operations of action selection and action evaluation. The Dueling-DQN specifically models the advantage-value, which measures the goodness of an action at a certain state and is arithmetically related to state-value and action-value by \(Q\left(s,a\right)=V\left(s\right)+A\left(s,a\right)\) .

Unlike Q-learning which achieves optimal policy by learning the optimal value functions, policy gradient algorithms parameterise the policy with a model and directly learn the policy. The objective function of policy gradient algorithms is configured to be the expected total return as shown by Eqs. ( 2 ) and ( 3 ), and the goal of the optimisation is to maximise the objective function. Through gradient ascent , the policy model which produces the highest return yields the optimal policy. Most of the policy gradient algorithms have the same theoretical foundation, Policy Gradient Theorem , which is defined in (Sutton & Barto, 2017 ).

The three policy gradient algorithms investigated in this research, A2C, DDPG and PPO, all use an Actor-Critic method (Sutton & Barto, 2017 ) for policy update, of which the critic model is used for value functions evaluation to assist the policy update and the actor model is used for policy evaluation which is updated in the direction suggested by the critic. The objective functions can be found in existing works (Mnih et al., 2016 ; Lillicrap et al., 2015 ; Schulman et al., 2017 ), of which DDPG was specially developed for problems with continuous action space. The A2C algorithm used an advantage term to assist the policy update, while DDPG using the gradient of Q-value with respect to the action and PPO uses Generalised Advantage Estimate (GAE) (Schulman, Moritz, et al., 2015 ). For A2C, the temporal difference was selected for the advantage estimate through a preliminary study compared with the Monte Carlo (MC) method. The PPO algorithm is a simplified version of Trust Region Policy Optimisation (TRPO) (Schulman et al., 2015a , 2015b ) by using a clipped objective function to prevent from extremely large online policy updates and learning instability. The hyperparameters in the PPO algorithm, future advantage discounting ratio and clip ratio, were set as 0.95 and 0.2 in this study following the original work (Schulman et al., 2017 ).

Learning setup and hyperparameters

For RL environment, Abaqus 2019 was interfaced with the RL algorithms to supply computation results during learning. The transient \(({s}_{t},{a}_{t},{r}_{t},{s}_{t+1})\) was formulated as follow:

The state/next state was in the form of workpiece state representation \(\Delta \varvec{K}\) -graph and a one-hot vector of size 1 × 301 indicating the punch location. Thus, it represents the shape difference between the current workpiece shape and its target shape, which can be easily used to construct the reward function. The one-hot vector was generated following Heuristic 1.

The action was the punch stroke, ranging from 2.1 to 3.9 mm (19 in total for discrete action space).

The reward was defined to measure the goodness of the selected action at given state, whose evaluation is shown in Fig.  7 . After each action at a given state, the reward was determined by the punch effectiveness ratio, which was defined as the ratio of the punch effect on \(\Delta \varvec{K}\) at given location at time step t ( \({p}_{t}\) ) to the expected effect at this location ( \({p}_{o}\) ), with the function \(r_{t} = 2\left( {p_{t} /p_{o} } \right)^{2} - 3\) (except for PPO: \({r}_{t}=2{\left({p}_{t}/{p}_{o}\right)}^{2}\) ). An exponential function was used to discourage non-effective punch, since the reward hardly changed at a low punch effectiveness ratio. If the workpiece was overpunched, the \(\Delta \varvec{K}\) at the punch location below the lower threshold \(-0.01 {mm}^{-1}\) , \({r}_{t}=-100\) (PPO: \({r}_{t}=-1\) and DDPG: \({r}_{t}=-\) 3); if the forming goal was achieved, i.e. \(\text{max}\left(\left|\Delta \varvec{K}\right|\right)\le 0.01 {mm}^{-1}\) , \({r}_{t}=0\) (PPO: \({r}_{t}=500-2.5\,\times\)  episode step). Negative rewards were used for each step to penalise unnecessary steps, except for PPO where unnecessary steps were penalised by rewarding early termination. A reward of -3 was assigned for overpunch in DDPG learning rather than -100 since it was found that sparse rewards can cause failures in DDPG training (Matheron et al., 2019 ).

figure 7

Reward function and its evaluation for each action. a shows the reward evaluation method for punch stroke of 2.2 mm at location of 162, and the two lines denote the initial and current \(\Delta \varvec{K}\) -graph; b shows the reward function for evaluation

figure 8

The reinforcement learning process for the tool path learning of the rubber-tool forming process, of which the FE simulation (FE sim) was used as the RL environment to provide real-time deformation results. The vertical line in the \(\Delta \varvec{K}\) -graph denotes the punch location

Figure  8 presents the reinforcement learning process configured for the tool path learning purpose, using FE simulations as the RL environment. The RL process collected data in a loop, starting from digitising the workpiece geometry to the state \({s}_{t}\) and feeding it to the learning agent. The learning agent predicted the stroke \({a}_{t}\) based on the current policy and exploration scheme. The FE simulation was configured by repositioning the current workpiece about the punch location and setting up the selected punch stroke, and the deformed workpiece geometry was extracted and stored. The deformed geometry was also digitised to obtain the next state \({s}_{t+1}\) , with which the reward \({r}_{t}\) was evaluated through the reward function. The collected transient \(({s}_{t},{a}_{t},{r}_{t},{s}_{t+1})\) at this time step was then used to optimise the objective function J and update the agent policy. The RL loop ended by re-inputting the next state to the agent as the state in the next loop.

The learning methods for the six RL algorithms, which are all model-free algorithms, are shown in Table  3 . The off-policy algorithms were trained with experience replay, of which all learning histories were stored to be uniformly sampled in minibatch for training, while the on-policy algorithms were trained with the immediate experience. Target network was used for action evaluation, which was updated by the online network periodically for stable learning progress. For exploration and exploitation, the Q-learning algorithms adopted \(\varepsilon\) -greedy policy, while A2C used an additional entropy term from (Williams & Peng, 1991 ) in the loss function and DDPG used a Gaussian distributed action noise. In addition to above, a forming heuristic (Heuristic 2) was developed to facilitate the learning process, which was defined as follows: the choice of the stroke at the current node location, if applicable, cannot be less than previous choices at the same location in one run, otherwise a larger value of stroke was randomly selected for this location. This heuristic was only applied in addition to \(\varepsilon\) -greedy policy as they have the same exploration mode, which would not disturb the training data structure.

The learning hyperparameters for RL are summarised in Table  4 . The maximum step per episode signifies the maximum forming step allowed for each run of the free forming trial. The episode would end if any of the following conditions was met: 1) forming goal achieved, 2) overpunch and 3) maximum step per episode (step/ep) attained. It is noted that the target network for Q-learning was updated every 20 learning steps, while that for DDPG is softly updated with \(\tau =0.01\) and that for PPO is updated every rollout (512 steps in this research) of the online policy. For \(\varepsilon\) -greedy policy, the value of \(\varepsilon\) decays from 1.0 to 0.1.

With regard to the models used for value function and policy function approximations in all six algorithms, the learning performance of a shallow multilayer perceptron (MLP) and a convolutional neural network (CNN) were compared as in (Lillicrap et al., 2015 ). Rectified linear unit ( ReLU ) was used for all hidden layers. There was no activation function for the output layer of the value network, while softmax and tanh were used for that of the policy network, respectively. The MLP had 2 hidden layers with 400 and 200 units, respectively (164,819 parameters). The network had 2 inputs, the \(\Delta \varvec{K}\) -graph and the one-hot vector for punch location, each followed by a half layer of neurons in the 1st layer before they were added together and fed into the 2nd layer. The CNN had the same architecture with the one used in (Mnih et al., 2015 ), with an additional hidden layer with 512 units, for the 2nd input, parallel to the last layer of the convolutional layers (1,299,891 parameters).

Virtual environment for RL algorithms comparison

Subject to the FE computational speed, it is considerably time-consumptive to test the feasibility of tool path learning for the free forming process using RL. Thus, a virtual environment was developed to imitate the rubber-tool forming behaviour by having similar punch effects on the \(\Delta \varvec{K}\) -graph to those computed by FE simulations, with which the performances of the six RL algorithms in tool path learning were compared. The virtual environment was composed to also manifest CPE1 and CPE2 as presented in " Forming goal and forming parameters design " section, and the effect of stroke value on the \(\Delta \varvec{K}\) -graph was also imitated by the virtual environment through a parametric study. The detailed setting of the virtual environment is presented in Appendix B.

Deep supervised learning models and training methods

After the optimal tool paths for the 25 variants of workpiece segments in Group 1 were acquired from deep reinforcement learning, they were used to train deep supervised learning models to learn the efficient tool path patterns for this group.

Deep neural networks

Three deep neural networks (DNNs), namely single CNN, cascaded networks and CNN LSTM, had been compared in predicting the tool path through a recursive prediction framework in the Authors’ previous research (Liu et al., 2022 ). Since the results revealed that the performance of CNN LSTM preceded that of the other two models, CNN LSTM was adopted in this research, with VGG16 (Simonyan & Zisserman, 2015 ), ResNet34 and ResNet50 (He et al., 2016 ) as the feature extractor, respectively. The model architectures for these models were the same as those used in (Liu et al., 2022 ), with a simple substitution of feature extractor with ResNet34 and ResNet50. The input to the LSTM was the partial forming sequence made up of the concatenation of the \(\Delta \varvec{K}\) -graph and the punch location vector for each time step, and the output from the model was the punch stroke prediction for the coming step. As the target workpiece information is already contained in the \(\Delta \varvec{K}\) -graph, it was not fed into the model as a 2nd input, different from (Liu et al., 2022 ).

Training method and hyperparameters

The DNNs were compiled in Python and trained using Keras with TensorFlow v2.2.0 as backend, and the computing facility had a NVIDIA Quadro RTX 6000 GPU with 24 GB of RAM memory. The training data for DNNs were all the tool paths learned from the RL algorithm, which were pre-processed to conform to the LSTM models and the labels (output features) were standardised to comparable scales. The tool path prediction with DNNs was configured to be a regression problem, for which the Mean Square Error (MSE) (Goodfellow et al., 2016 ) was the objective function for DNN training. Adam algorithm (Kingma & Ba, 2015 ), with default values of hyperparameters ( \({\beta }_{1},{\beta }_{2},\varepsilon\) ) in Keras, was used for optimisation. In addition, the learning rate \(\eta\) was set to exponentially decaying, from the initial learning rate \({\eta }_{0}\) , along with training process, with the same decaying rate and decaying steps as in (Liu et al., 2022 ). The key training parameters are shown in Table  5 , in which two amounts of training data are presented.

Learning results and discussions

Selection of reinforcement learning algorithm.

Two categories of reinforcement learning algorithm, namely Q-learning and policy gradient algorithms, were compared in terms of their performances in tool path learning for the rubber-tool forming process. Subjected to the prohibitively expensive FE computation, a virtual environment was developed to imitate the rubber-tool forming behaviour as introduced in " Virtual environment for RL algorithms comparison " section. A total of six RL algorithms were investigated with the data generated by the virtual environment, half of which belong to Q-learning and the other half belong to policy gradient method. The most superior one determined in this study is to be implemented with FE environment and learn the optimal tool path using FE computational data. The same target workpiece, as shown in Fig.  2 , was used for tool path learning in this study.

The learning setups for all algorithms, including the transient \(({s}_{t},{a}_{t},{r}_{t},{s}_{t+1})\) , learning method and hyperparameters, are summarised in Section 2.4.2. An additional exploration rule, namely Heuristic 2, was implemented along with the \(\varepsilon\) -greedy policy for Q-learning algorithms. Figure  9 shows the performances of DQN, Double-DQN and Dueling-DQN trained under the exploration scheme with and without Heuristic 2, of which the termination step signifies the total punch steps spent to achieve the forming goal. The same learning rate (1 × 10 −2 ) and value function approximator (CNN) were applied to each training. It can be seen that the average termination step was reduced by approximately 40%, from 62 to 37, after introducing Heuristic 2 for exploration in the training of each algorithm. In addition, with Heuristic 2, the first applicable tool path was found more quickly in each case than those without Heuristic 2 by 9K, 2K and 3K training steps, respectively. Thus, because of the consistent improvement of learning efficiency from Heuristic 2 in each case, it was implemented for the training of all the Q-learning algorithms for the following results.

figure 9

Comparison of the performances of three Q-learning algorithms trained at 10 −2 learning rate, with and without heuristic, for tool path learning in terms of termination step. The upper and lower dashed lines denote the average termination step (total punch steps spent to achieve the forming goal), estimated from Gaussian process regression, through the training process of the algorithms implemented with and without Heuristic 2, respectively. The shaded regions denote 95% confidence interval. The unit for training steps, K, denotes 10 3

To comprehensively evaluate and compare the performance of the six RL algorithms in tool path learning, four performance factors were raised, namely first termination step (1st Term. step), converge speed (Cvg. speed), average converge termination step (Avg. Cvg. Term. step) and average termination frequency (Avg. Term. Freq.). The first factor was quantified by the punch steps spent at the first time achieving the forming goal, which was used to evaluate the learning efficiency of each algorithm under the circumstance that no prior complete tool path planning experience was available and the agent learned the tool path from scratch. The 2nd and 3rd factors evaluated the learning progress and the learning results, and they were quantified by the first converged training step and the average termination step after convergence. The last factor described the learning steadiness in finding the tool path, which was computed as follow:

where \({T}_{total}\) denotes the total times of termination during training, and \({S}_{first}\) and \({S}_{final}\) denote the training step where the first and final terminations occur, respectively. For this research, the first termination needs to be reached as soon as possible due to the high computational expense. Thus, the importance of 1st Term. Step, Cvg. speed and Avg. Cvg. Term. step is regarded as the same and is greater than that of Avg. Term. Freq.

The six algorithms were trained at four different learning rates using two action/policy function approximators, respectively, as described in Section 2.4.2 on the RL learning setup. The learning performance of each algorithm quantified by the four performance factors is summarised in Tables  6 and 7 . The learning results where no termination was found were omitted from the tables, except for Dueling-DQN which was designed to be a CNN with shared convolutional layers and separate fully connected layers. For example, DDPG only managed to learn the tool path at the learning rate of 10 −2 with MLP function approximator. The best tweaking results for the learning rate and function approximator for each algorithm are highlighted in bold.

For Q-learning algorithms, the CNN function approximator was found to outperform the MLP one. Although they had close values of the first three performance factors, the average termination frequencies from the training of DQN and Double-DQN with CNN approximator were, in general, notably higher than those from algorithm trainings with MLP. It was seen that, with MLP, both DQN and Double-DQN cannot attain mere 0.3 terminations per thousand steps at 10 −4 learning rate, and the latter terminated only once through the whole learning process at the learning rate of 10 −1 . However, with CNN approximator, these two algorithms can both terminate steadily over one time per thousand steps at all learning rates. In regard to learning rate, 10 −2 , 10 −1 and 10 −3 were respectively selected for the three Q-learning algorithms because of the evidently better results for the first three performance factors than the other choices of learning rate.

For policy gradient algorithms, MLP was selected as the function approximator for A2C and DDPG while CNN was selected for PPO, and the best learning results were found at learning rate of 10 −2 for all three of them. It can be found that, compared to the Q-learning algorithms, the policy gradient algorithms tended to have a remarkably higher 1st termination step, of which those from A2C were approaching the maximum steps per episode (100). Although they converged to a comparable amount of average termination step to the Q-learning, they spent considerably more time in convergence, especially PPO which used over 200 thousand training steps (over 20 times longer than the Q-learning). In addition, the learning steadiness of the three policy gradient algorithms was poor, especially DDPG and PPO, although the best average termination frequency from them was over 9 times the best from the Q-learning. Trained with two different approximators and at four learning rates, DDPG only managed to learn the tool path once, which could be due to the reason that DDPG was developed for continuous action space problems.

Figure  10 shows the training process of the six RL algorithms, which were trained with the best hyperparameters from above. It can be seen that, unlike the Q-learning algorithms which almost instantly converged after a few terminations, the policy gradient ones had more discernible converging process. Although the tool paths learned from the policy gradient algorithms were about 1–3 times longer than those from the Q-learning at the start of training, they eventually converged to a comparable level of length. DDPG converged to the minimum average termination step of 29, however, its learning steadiness was the worst among all from Table  7 . The Q-learning algorithms outperformed the policy gradient ones in general in terms of the first termination step and convergence speed. This could be due to that A2C and PPO are on-policy learning which is less data-efficient than off-policy, and DDPG is created for learning problems with continuous action space which needs careful tuning for problems with discrete space. For Q-learning algorithms, the Double-DQN preceded DQN and Dueling-DQN for its lower average converge termination step and marginally faster first termination. The reason could be that Double-DQN alleviates the Q-value over-estimation problem in DQN learning, and Dueling-DQN is only particularly useful when the relevance of actions to the goal can be differentiated by separately learning state-value and advantage-value. However, each action in free-form deformation is highly relevant to the goal, for which the structure of Dueling-DQN, in turn, increases learning complexity and slows down learning speed. Thus, for the following results, Double-DQN was used to learn the optimal tool path.

figure 10

Comparison of the performances of 6 reinforcement learning algorithms studied in this research in terms of termination step. The solid and dashed lines denote the average termination step, estimated from Gaussian process regression, through the training process of the algorithms. The shaded regions denote 95% confidence interval

To assess the credibility of the algorithm selection study, the tool path learning processes and results of the Double-DQN, implemented with virtual environment (VE) and FE simulations, for the same target workpiece were compared in Fig.  11 . From Fig.  11 a and b, the history of forming step and total rewards per episode from the learning using VE have the pattern which highly resembled those from the learning using FE simulations. The average episode step over the training process from VE was marginally higher than that from FE simulations by about 6 steps, while the average episode total rewards from the former was less than the latter by about 15. This phenomenon indicates that, with FE, the agent is more predisposed to overpunch (episode ends) with less efficient tool path at each episode than with VE. In addition, the first termination was about 1200 episodes slower and the total reward was about 30 less than those with VE. The practical rubber-tool forming behaviour is more complex and nonlinear than the VE imitates. From Fig.  11 c, both tool paths had similar forming pattern of alternatingly selecting small and large stroke values which have, on average, slight increased throughout the tool path. Thus, in general, the virtual environment managed to imitate most of the forming behaviours in FE simulations, and the results from the pre-study on algorithm selection performed with the virtual environment are convincing regarding their learning efficiency in tool path learning.

figure 11

Comparison between RL using virtual environment (VE) and FE simulation environment (FE) in terms of a the history of steps in each episode, b the history of total rewards in each episode and c the tool path predictions

Tool path learning results for 25 workpieces using double-DQN

From Section 3.1, Double-DQN was selected to learn the optimal tool paths for 25 variants of workpiece segments in Group 1, whose \(\varvec{K}\) -graphs and real-scale shapes are shown in Fig.  12 . The workpieces were deformed through the rubber-tool forming process, which was simulated through FE computations. The \(\varvec{K}\) -graphs were arbitrarily created with the method shown in Appendix A, and the real-scale shapes in Fig.  12 b were reconstructed from the \(\varvec{K}\) -graphs using constant initial interval between two contiguous node locations (0.1 mm).

figure 12

The \(\varvec{K}\) -graphs and workpiece shapes for all the generated target workpieces

An exemplary Double-DQN learning process is shown in Fig.  11 a and b, where the termination occurs at around episode 1500. It can be seen that the first 150 episodes ended with remarkably fewer forming steps than those thereafter, which is due to the effect of \(\varepsilon\) -greedy policy. Under this policy, the agent was more likely to randomly explore the search space than following the online policy learned from existing forming experiences before the \(\varepsilon\) value decayed to 0.5, which led to quicker overpunch thus less steps per episode. To analyse the learning process in the light of effective forming progress, an exemplary learning process concerning the maximum \(\Delta \varvec{K}\) value at the end of each episode is shown in Fig.  13 . As the forming goal is to achieve a workpiece state where its \(\text{max}\left(\left|\Delta \varvec{K}\right|\right)\le 0.01\) mm −1 , the learning history of episode end maximum \(\Delta \varvec{K}\) can reflect the learning progress of effective tool path. From Fig.  13 , there is a clear trend that the maximum value of \(\Delta \varvec{K}\) at the end of each episode gradually decreased from 0.05 mm −1 at the start of learning to below 0.01 mm −1 at about episode 1350, where the termination occurred. This learning curve demonstrates both the effectiveness of the Double-DQN algorithm and the reward function in searching the tool path. With progressing, the deformed sheet metal was more and more approaching its target shape, which was demonstrated by the troughs of the max ( \(\Delta \varvec{K}\) ) graph along the arrow marker.

figure 13

The history of the maximum \(\Delta \varvec{K}\) value at the end of each episode (Ep) throughout the learning process. The arrow shows the learning progress of effective tool path

In addition to the learning progress captured from maximum \(\Delta \varvec{K}\) curves, extra two self-learning characteristics of the tool path learning, which were measured from a more micro perspective than the former, were observed. Figure  14 shows two examples where the self-learning characteristics of tool path efficiency improvement and overpunch circumvention were captured, respectively. The three \(\Delta \varvec{K}\) -graphs were collected from the workpiece deformed by the same number of punches at different episodes during a learning process. In Fig.  14 a, the shaded hatch denotes the total advantage from the workpiece state at episode 527 over that at episode 245 in terms of the shape difference from the target shape, which is measured by the area of hatch. In turn, the unshaded hatch represents the opposite. Thus, it is clear that the tool path planned at the recent episode was more efficient than the one at previous time by about 1.21 with reference to the net area of hatch (shaded area minus unshaded area), accounting for 11.8% of the initial \(\Delta \varvec{K}\) -graph area. In Fig.  14 b, the shaded regions indicate two overpunch-prone locations at episode 527, where the \(\Delta \varvec{K}\) values were only within about 0.002 mm −1 away from the lower threshold (− 0.01 mm −1 ). Due to CPE2, the workpiece can be easily overpunched by deformation near the two locations. It was found that the agent selected smaller punch strokes at these two locations at episode 687, which circumvented the overpunch occurred in previous episodes.

figure 14

Examples showing self-learning characteristics of a improving tool path efficiency and b circumvention of overpunch. The results were from step 31 at three different episodes of the tool path learning process for a workpiece

figure 15

The two-dimensional embedding, generated through t-SNE, of the representations in the last hidden layer of the Double-DQN to workpiece states ( \(\Delta \varvec{K}\) -graphs) experienced during tool path learning. The points are coloured according to stroke values selected by the agent. The graph at the top left corner shows the initial \(\Delta \varvec{K}\) -graph, and the axis labels of the other \(\Delta \varvec{K}\) -graphs (numbered from ① to ⑪) are omitted for brevity. The vertical lines in the \(\Delta \varvec{K}\) -graphs denote the punch locations

To evaluate the performance of the Double-DQN algorithm in extracting and learning abstract information during tool path learning, the representations in the last hidden layer of the Double-DQN model to the workpiece states, which the agent experienced throughout the learning process, were retrieved and reduced to two-dimensional embeddings through t-SNE technique (Maaten & Hinton, 2008 ). The visualisation of these embeddings is shown in Fig.  15 , in which the embeddings are coloured according to selected stroke values by the agent. It can be seen that the CPE1, namely more prior deformation undergoes near the node location of interest results in larger punch stroke required to accomplish a certain change of shape at this location, was learned by the agent.

From Fig.  15 , the \(\Delta \varvec{K}\) -graphs ③, ④ and ⑤ were assigned relatively low value of stroke as there was no prior deformation to at least one side of the punch locations. In addition, larger stroke was assigned if the punch location was closer to the initial punch location 66, which is due to the higher amount of local shape difference. As the CPE1 escalated, higher stroke values were selected by the agent as shown by the rest of the \(\Delta \varvec{K}\) -graphs except for ① and ②. It was also captured that, from ⑥ and ⑦, the CPE1 became more severe with larger nearby prior punches. Apart from CPE1, CPE2 was also captured by the \(\Delta \varvec{K}\) -graphs ① and ②, where the effect was significantly more obvious than the one shown in Fig.  6 a. As indicated by the regions highlighted with circles in ① and ②, the \(\Delta \varvec{K}\) values in these regions were very close to the lower forming threshold, for which small strokes were assigned for them to prevent from overpunch. The punch effect at this region was reproduced as shown in Appendix C, where a mere increase of 0.1 mm of stroke can deteriorate the \(\Delta \varvec{K}\) values to the left of the punch location by about 0.007 mm −1 and caused overpunch in this context. Overall, the similar workpiece states were clustered together and assigned by reasonable stroke values. The agent had a good understanding in tool path planning through learning the abstract representations.

Figure  16 shows an example of the tool path learned by the Double-DQN. In Fig.  16 a, the initial \(\Delta \varvec{K}\) -graph between the blank sheet and the target workpiece was transformed to the final one (enclosed by a rectangle), where the \(\Delta \varvec{K}\) values at all node locations were within the forming thresholds, by 47 forming steps. Due to the Heuristic 1 that the location with the highest \(\Delta \varvec{K}\) value was selected as the punch location, the punch started from the location where the initial \(\Delta \varvec{K}\) value was the highest (about 65) and diverged to both ends of the workpiece, as shown by the top view of Fig.  16 b. Lower values of stroke were assigned to diverging punches than those inside the divergence area due to the CPE1, which led to the repeatedly alternating selection of small and large strokes along the forming progress in Fig.  16 a. As the forming progressed, the CPE1 escalated thus the larger stroke values were selected at later steps (after step 15) of the tool path than those at start. It is also worth noting that, from the side view in Fig.  16 b, large strokes were concentratedly assigned to punch locations with high initial \(\Delta \varvec{K}\) values and descended to those with low ones.

figure 16

An example of a tool path learned by RL and b its top view and side view. The height of the bars in ( a ) were proportionally decreased for better visualisation

Figure  17 presents the deformation process of a workpiece from blank sheet to its target shape following the tool path learned from the Double-DQN and the dimension error (the geometry difference in Y-direction) between the target workpiece and the one after all punches. The target shape in the real-scale graph was reconstructed from the target \(\varvec{K}\) -graph, of which the same interval of 0.1 mm between two contiguous node locations along the deformed workpiece was used for reconstruction. It can be seen that the final shape of this deformed workpiece was in a good agreement with its target shape, with a maximum dimension error of just above 0.2 mm. The dimension error was at its minimum in the middle of the workpiece, from which the error increased to both ends due to the accumulation of shape difference. The average maximum dimension error for the 25 variants using the tool paths learned by the Double-DQN algorithm was 0.26 mm.

figure 17

An example of workpiece deformed by the tool path predicted by the Double-DQN. Top: the dimension error between the target workpiece and the final workpiece deformed along the predicted tool path. Bottom: the workpiece shape at each forming step (dotted line) compared to its target shape (solid line). Forming step of 0 denotes blank sheet

Tool path learning generalisation using supervised learning

Through Double-DQN algorithm, the tool paths for the 25 variants of workpiece (shown in Fig.  12 ) segment were learned, whose length (total punch steps) for each variant is shown in Fig.  18 . The tool path lengths varied from 44 to 63, with most lying around 52.

figure 18

The length of the tool path learned through Double-DQN for each variant of workpiece segment

To learn the intrinsic efficient forming pattern for these workpiece variants (Group 1), the supervised learning model was used for training with the tool path data for the 25 variants. As introduced in " Deep neural networks " section, three LSTMs, which respectively used VGG16, ResNet34 and ResNet50 as the feature extractor, were investigated. The training data were the 25 tool paths pre-processed to the data format consistent with the input and output of the CNN LSTMs, with a total of 1315 data. These data were split into 90% for training and 10% for testing, and the other key training parameters are presented in Table  5 . The training processes of the three models are shown in Fig.  19 by generalisation loss (test loss) history, which would end early if the generalisation loss tended to increase (Goodfellow et al., 2016 ). In addition to training models using the total amount of 25 tool paths, the VGG16 LSTM was also trained with only 20 tool paths to study the effect of training data on the learning performance. The 20 tool paths were evenly sampled from the original 25 paths to avoid massive data missing, and the maximum tool path length among the 20 paths was 55. The generalisation loss has been de-standardised to stroke unit (mm), and the losses from the three models all converged to a comparable level of 0.25 mm, except for the VGG16 LSTM trained with 20 tool paths whose loss converged to about 0.33 mm. Thus, more training data was seen to improve the generalisation, which could be due to that more exhaustive data help to generalise the forming pattern during training. It is also noted that the loss from LSTMs with both ResNets sharply decreased before convergence, which could be due to the decaying learning rate during training. Before learning rate decreased to a certain level, the parameter update at each learning step could be so large that parameter value was jiggling around its suboptimum. Once the learning rate became smaller than this level, the model parameters could be closer to their optimal values and the loss would encounter a sharp drop.

figure 19

The generalisation loss curves along the training processes of the LSTM models with VGG16, ResNet34 and ResNet50 as the feature extractor, respectively. The number 25 and 20 in the parenthesis denote the amount of tool paths used for training

Figure  20 shows the prediction results for the same test workpiece from the three models trained with 25 tool paths and the VGG16 LSTM trained with 20 tool paths. The total time steps of the LSTMs were the maximum forming steps in the training data, namely 63 and 55 steps for models trained with 25 and 20 tool paths, respectively. It can be seen that the tool path predictions for the workpiece from the three supervised learning models trained with different amount of data all agreed well with the tool path learned through reinforcement learning. The punch started from small values of stroke and alternatingly selected small and large strokes along the forming progress, and the values of large strokes gradually increased as the CPE1 escalated in the forming process. However, it is worth noting that the ResNet50 LSTM tended to predict successive lower values of strokes, near the end of forming (from step 46 to 57), than those predicted by the other models.

figure 20

The learning performance of LSTMs trained with 25 tool paths data (solid squares) and 20 tool paths data (dashed squares) on a test workpiece. For the former, the prediction results from LSTMs with a VGG16, b ResNet34 and c ResNet50 are presented, while for the latter, that from d VGG16 LSTM are presented. The prediction results include three parts as followed. Top: the tool path prediction from LSTMs (SL) and its comparison to the tool path from reinforcement learning (RL); Mid: the final \(\Delta \varvec{K}\) -graph after deformation; Bottom: the dimension error and the comparison between the deformed workpiece shape and its target

With regard to the final \(\Delta \varvec{K}\) -graph of the test workpiece deformed through the tool path predicted by the LSTMs, the VGG16 LSTM trained with 25 tool paths was the most superior among all models, whose level of forming goal achievement ( \(G=1-{\Delta \varvec{K}}_{final}^{out\, THLD}/{\Delta \varvec{K}}_{initial}^{out\, THLD}\) , THLD denotes threshold) was up to 99.9%. However, the level of goal achievement of the other two models trained with 25 tool paths just reached 97%, and the LSTM trained with 20 tool paths only achieved 95%. From the final \(\Delta \varvec{K}\) -graphs from models trained with 25 tool paths in Fig.  20 , the one from the VGG16 LSTM was seen to have only two negligible overpunch at location 66 and 161, of which location 66 was the first punch location in the tool path and the overpunch was due to the accumulation of CPE2 near this location through the rest of the tool path. However, multiple evident overpunches and short (insufficient) punches were found in the \(\Delta \varvec{K}\) -graph from the ResNet34 LSTM and short punches in that from the ResNet50 LSTM. This is due to the over- and under-estimation of stroke values in the tool path, from the models, at the locations where overpunches and short punches occurred, and the short punches from ResNet50 LSTM could be caused by the massive punch steps of low stroke values near the end of forming. As for the \(\Delta \varvec{K}\) -graph from the VGG16 LSTM trained with 20 tool paths, the result was even worse than those trained with 25 tool paths. It was seen to have the worst overpunch at node location 213 among the four cases, and there was a continuous short punch region from location 72 to 122, indicating a consistent underestimation of stroke values for punches in this region. The consistent underestimation could be caused by the less training data, which led to the lack of useful tool path data for this region.

In terms of the final geometry difference between the deformed workpiece and its target shape, the tool path predicted by the VGG16 LSTM preceded those from the other models, through which the maximum dimension error was about 0.37 mm. However, the dimension errors resulted from other models were much higher, especially for ResNet50 LSTM and the one trained with less data whose predictions led to over 0.6 mm and 0.8 mm of dimension error, respectively. In addition, the final workpiece shapes from these two models had much more visible deviation from their targets than those from VGG16- and ResNet34- LSTMs trained with 25 tool paths.

It is worth noting that, although the \(\Delta \varvec{K}\) -graph from the ResNet34 LSTM was not as good as the one from the VGG16 LSTM according to the level of forming goal achievement G , the tool paths from both models resulted in comparable results of the final dimension error. On the other hand, good goal achievement does not entail good dimensional accuracy. For example, the right half of workpiece shape from ResNet50 LSTM remarkably differed from its target although its corresponding \(\Delta \varvec{K}\) -graph had a good goal achievement, which is due to the excessively more area of \(\Delta \varvec{K}\) -graph above the X-axis than that below it. The four models, the VGG16-, ResNet34- and ResNet50 LSTM trained with 25 tool paths and the VGG16 LSTM trained with 20 tool paths, were re-evaluated on 10 arbitrary variants, and the average level of forming goal achievement \(\stackrel{-}{G}\) from them was 99.54%, 96.86%, 97.15% and 97.19%, respectively. With regard to the maximum dimension error, the average value from the four models was 0.45, 0.40, 0.63 and 0.58 mm, respectively. It was seen that, although VGG16 LSTM had remarkably better goal achievement than the ResNet34 LSTM, it yielded slightly larger dimension error. This indicates that the overpunch and short punch in terms of the \(\Delta \varvec{K}\) thresholds can, to some extent, contribute to the final forming results. The results entail that moderate compromise of workpiece curvature smoothness could bring more effective tool path planning behaviour in terms of dimensional accuracy. Multi-objective optimisation could be considered in the future for learning the optimal trade-off between the final curvature smoothness and the dimensional accuracy. Thus, attaining the best level of goal achievement and leading to a high dimensional accuracy, the VGG16 LSTM trained with larger amount of data had the most superior performance in tool path learning generalisation.

To compare the tool path planning performance of the proposed generalisable strategy and the method exploiting pure reinforcement learning technology, a well-trained Double-DQN model was used for stroke prediction of the first punch for variants of workpiece with different initial \(\Delta \varvec{K}\) -graphs (i.e., different target shapes), including the target shape it was trained for.

figure 21

Evaluation of the pure RL strategy by assessing the tool path prediction results from the trained Double-DQN model for new variants of workpiece. The \(\Delta \varvec{K}\) -graphs were acquired after the first punch, predicted by the RL model, for the variant used for tool path learning through RL (solid line) and new variants that were never seen in the learning process (dashed line)

Figure  21 shows the \(\Delta \varvec{K}\) -graphs after the first punch of these workpieces with the stroke prediction from the Double-DQN, of which the node locations where the troughs reside indicate the punch locations. It can be seen that most of the stroke predictions for new variants were uncharacteristically large, which caused significant overpunch at the very first forming step. This indicates that the Double-DQN trained for the tool path learning of a certain target shape cannot be used to predict the tool path for different target workpieces, and the reinforcement learning process has to be gone through again for new applications.

Case study verifying the generalisable tool path planning strategy

To evaluate the generalisable tool path planning strategy presented in Fig.  4 , a new target workpiece of length 90.2 mm was arbitrarily generated as shown in Fig.  22 . The target workpiece was first digitised to its initial \(\Delta \varvec{K}\) -graph using 0.1 mm interval between two contiguous node locations, with 903 node locations in total. The \(\Delta \varvec{K}\) -graph can be segmented to 3 Group 1 segments, A, B and C, which were never seen in the training process of the proposed strategy. With the trained supervised learning model (VGG16 LSTM), the forming tool path for each of the segment was predicted and aggregated to the entire tool path for the target workpiece. It can be seen that the deformation took place segment-wise, and the final \(\Delta \varvec{K}\) -graph resided well in the threshold region with the level of forming goal achievement of 99.87%. Thus, the case study verifies the generalisation of the proposed strategy that an arbitrarily selected workpiece can be formed by solving its tool path in a dynamic programming way. By factorising the forming process of an entire workpiece into that of typical types of segments, the entire workpiece can be formed by consecutively forming each segment.

figure 22

The entire tool path for a new workpiece, which is aggregated by the subpaths predicted by the VGG16 LSTM for each segment of the workpiece. The three segments were never seen in the LSTM training

Figure  23 presents the workpiece shape after deformation, computed by FE, through the generalisable tool path planning strategy and its target shape. From Fig.  23 a, due to the accumulation of \(\Delta \varvec{K}\) -graph area above the X-axis near the junction location (location 301 and 602) between two segments, there was a visible deviation between the deformed workpiece shape and its target, with a maximum dimension error of about 1.8 mm. With two supplementary punches at the two junction locations, the deformed workpiece shape had a clear approaching to the target shape, with a maximum dimension error of about 1 mm. Since the punch location is at the end of each segment where CPE1 was not escalated, small stroke values were selected for the two supplementary punches, which did not cause overpunch. Thus, the generalisable tool path planning strategy successfully yielded the final workpiece shape within a dimension error of 2%. Due to the error accumulation brought in by the junction area, the deformed workpiece shape can be further improved by a few supplementary punches in this area.

figure 23

The comparison between the workpiece shape deformed through the generalisable tool path planning strategy and its target shape. a deformed workpiece shape yielded by the strategy and b workpiece shape after two supplementary punches near the junctions of the three workpiece segments

Conclusions

In this research, a generalisable tool path planning strategy for free-form sheet metal stamping was proposed through deep reinforcement and supervised learning technologies. By factorising the forming process of an entire workpiece into that of typical types of segments, the tool path planning problem was solved in a dynamic programming way, which yielded a generalisable tool path planning strategy for a curved component for the first time. RL algorithms and SL models were exploited in tool path learning and generalisation, and six deep RL algorithms and three deep SL models were investigated for performance comparison. The proposed strategy was verified through an application to a case study where the forming tool path for a completely different target workpiece from training data was predicted. From this study, it can be concluded that:

Q-learning algorithms are superior to policy gradient algorithms in tool path planning of free-form sheet metal stamping process, in which Double-DQN precedes DQN and Dueling-DQN. The forming heuristic is also corroborated to further improve the Q-learning performance.

Conferred by deep reinforcement learning, the generalisable tool path planning strategy manifests self-learning characteristics. Over the learning process, the tool path plan becomes more efficient and learns to circumvent overpunch-prone behaviours. With Double-DQN, the tool path for a free-form sheet metal stamping process can be successfully acquired, with the dimension error of the deformed workpiece below 0.26 mm (0.87%).

The efficient forming pattern for a group of workpiece segments have been successfully generalised using deep supervised learning models. The VGG16 LSTM precedes ResNet34- and ResNet50 LSTMs in the tool path learning generalisation, although they have comparable average generalisation loss. The VGG16 LSTM manages to predict the tool path for 10 test variants, with an average level of forming goal achievement of 99.54% and a dimension error of the deformed workpiece below 0.45 mm (1.5%). However, the pure reinforcement learning method cannot generalise plausible tool paths for completely new workpieces.

The generalisable tool path planning strategy successfully predicts the tool path for a completely new workpiece, which has never been seen in its previous learning experience. The level of goal achievement reached 99.87% and the dimension error of the deformed workpiece was 2%. The dimension error could be reduced to about 1.1% with two small supplementary punches near the junctions of the workpiece segments.

Through the proposed method, the tool path planning for an arbitrary sheet metal component is attempted with a generalisable strategy for the first time, and the poor generalisation issue of pure reinforcement learning approach for tool path planning is addressed. However, the efficiency of this strategy is subject to the design of forming the pattern and reward function. In future work, a multi-objective forming goal for tool path planning could be used for a trade-off between the final curvature smoothness and the dimensional accuracy. With moderate compromise of curvature smoothness, more efficient tool path in terms of dimensional accuracy might be yielded. CPE1 and CPE2 can also be embedded into the reward function design to facilitate the tool path learning process.

Appendix A: arbitrary generation of workpiece segments

The \(\Delta \varvec{K}\) -graph of each variant in Group 1 shown in Fig.  24 , which is composed of two parabolas \({\Delta \varvec{K}}_{a-b}\) and \({\Delta \varvec{K}}_{b-c}\) , is determined by five variables ( \({h}_{a}\) , \({h}_{c}\) , \({l}_{b}\) , \({w}_{ab}\) and \({w}_{bc}\) ) and two constants ( \({h}_{b}\) and \({l}_{c}\) ). The workpiece segments were arbitrarily generated by randomly sampling the values of these variables.

figure 24

The variables and functions for creating the variants of segments in Group 1. \({\varvec{w}}_{\varvec{a}\varvec{b}}\) and \({\varvec{w}}_{\varvec{b}\varvec{c}}\) can be derived once \({\varvec{h}}_{\varvec{a}}\) , \({\varvec{h}}_{\varvec{b}}\) , \({\varvec{h}}_{\varvec{c}}\) and \({\varvec{l}}_{\varvec{b}}\) are generated

Appendix B. virtual environment configuration

The virtual environment (VE) is developed to imitate the rubber-tool forming behaviour in FE simulations, in which the DRL algorithms are trained to reduce computational expense. Since the forming process is extremely nonlinear, VE is only tweaked to qualitatively resemble FE simulation results, which, however, is sufficient for RL algorithms comparison. The VE is configured following the rules below, whose formulation and parameter selection were based on FE simulation results.

A single punch operation only affects the \(\Delta \varvec{K}\) values at 50 node locations (5 mm) around the punch location and the punch location itself in the \(\Delta \varvec{K}\) -graph.

If a node location has been punched with a stroke and this location is selected again for punching, the \(\Delta \varvec{K}\) -graph will only change if the new stroke is greater than the previous one.

The change of \(\varvec{K}\) value ( \({c}_{K}\) ) at the punch location by stroke ( \({d}_{s}\) ) without CPE1 and CPE2 is defined as: \({c}_{K0}=\left({d}_{s}-2.1\right)\times 0.05+0.045\) .

only one side of the punch location is pre-deformed: \({c}_{K1}={c}_{K0}/2\) ;

both sides of the punch location are pre-deformed by 1 punch: \({c}_{K2}=\left({c}_{K0}+0.035\right)/2\) ;

one side of the punch location is pre-deformed by 1 punch and the other side is pre-deformed by 2 punches: \({c}_{K3}=\left({c}_{K0}+0.005\right)/2\) ;

two sides of the punch locations are pre-deformed by 4 and over 4 punches in total: \({c}_{K4}=\left({c}_{K0}-0.01\right)/2\) .

The change of \(\varvec{K}\) value gradually decreases from \({c}_{Ki}\) ( \(i\in \left[\text{0,4}\right],\mathbb{ }\mathbb{Z}\) ) at the punch location to 0 at two ends of the 51 node locations in rule 1.

With CPE2: if the \(\Delta \varvec{K}\) -graph is changed by the punch, \(\Delta \varvec{K}\) values at the 51 node locations in rule 1 are reduced by 0.0005 mm −1 .

Appendix C: extraordinarily large CPE2

A phenomenon of extraordinarily large CPE2 is shown in Fig.  25 . It can be seen that, after a punch with stroke of 3.6 mm was applied to location 76, there is an evident effect of curvature at about location 70. When the applied stroke was increased by 0.1 mm, the CPE2 at location 70 increased by about 0.007 mm −1 .

figure 25

Extraordinarily large CPE2 occurring near punch location of 80. The vertical line denotes the punch location at the current workpiece state (original \(\Delta \varvec{K}\) -graph). The dotted line and dashed line denote the \(\Delta \varvec{K}\) -graphs after punches with stroke of 3.6 mm and 3.7 mm, respectively

Data availability

The data that support the findings of this study are available from the corresponding author upon request.

Allwood, J. M., & Utsunomiya, H. (2006). A survey of flexible forming processes in Japan. International Journal of Machine Tools and Manufacture , 46 (15), 1939–1960. https://doi.org/10.1016/j.ijmachtools.2006.01.034 .

Article   Google Scholar  

Attanasio, A., Ceretti, E., & Giardini, C. (2006). Optimization of tool path in two points incremental forming. Journal of Materials Processing Technology , 177 (1–3), 409–412. https://doi.org/10.1016/j.jmatprotec.2006.04.047 .

Azaouzi, M., & Lebaal, N. (2012). Tool path optimization for single point incremental sheet forming using response surface method. Simulation Modelling Practice and Theory , 24 , 49–58. https://doi.org/10.1016/j.simpat.2012.01.008 .

Bowen, D. T., Russo, I. M., Cleaver, C. J., Allwood, J. M., & Loukaides, E. G. (2022). From art to part: Learning from the traditional smith in developing flexible sheet metal forming processes. Journal of Materials Processing Technology , 299 , 117337. https://doi.org/10.1016/j.jmatprotec.2021.117337 .

Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning . The MIT.

Google Scholar  

Hartmann, C., Opritescu, D., & Volk, W. (2019). An artificial neural network approach for tool path generation in incremental sheet metal free-forming. Journal of Intelligent Manufacturing , 30 (2), 757–770. https://doi.org/10.1007/s10845-016-1279-x .

He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (pp. 770–778). Las Vegas, NV, USA. https://doi.org/10.1109/CVPR.2016.90

Kingma, D. P., & Ba, J. L. (2015). Adam: a method for stochastic optimization. arXiv.1412.6980

Kirk, R., Zhang, A., Grefenstette, E. and Rocktäschel, T. (2021). A survey of zero-shot generalisation in deep reinforcement learning. arXiv.2111.09794

Kubik, C., Knauer, S. M., & Groche, P. (2022). Smart sheet metal forming: Importance of data acquisition, preprocessing and transformation on the performance of a multiclass support vector machine for predicting wear states during blanking. Journal of Intelligent Manufacturing , 33 (1), 259–282. https://doi.org/10.1007/s10845-021-01789-w .

Lillicrap, T. P., Hunt, J. J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., & Wierstra, D. (2015). Continuous control with deep reinforcement learning. https://arxiv.org/abs/1509.02971

Liu, S., Shi, Z., Lin, J., & Li, Z. (2020). Reinforcement learning in free-form stamping of sheet-metals. Procedia Manufacturing , 50 , 444–449. https://doi.org/10.1016/j.promfg.2020.08.081 .

Liu, S., Xia, Y., Liu, Y., Shi, Z., Yu, H., Li, Z., & Lin, J. (2022). Tool path planning of consecutive free-form sheet metal stamping with deep learning. Journal of Materials Processing Technology , 303 , 117530. https://doi.org/10.1016/j.jmatprotec.2022.117530 .

Liu, S., Xia, Y., Shi, Z., Yu, H., Li, Z., & Lin, J. (2021). Deep learning in sheet metal bending with a novel theory-guided deep neural network. IEEE/CAA Journal of Automatica Sinica , 8 (3), 565–581. https://doi.org/10.1109/JAS.2021.1003871 .

Low, D. W. W., Chaudhari, A., Kumar, D., & Kumar, A. S. (2022). Convolutional neural networks for prediction of geometrical errors in incremental sheet metal forming. Journal of Intelligent Manufacturing . https://doi.org/10.1007/s10845-022-01932-1 .

Malhotra, R., Bhattacharya, A., Kumar, A., Reddy, N. V., & Cao, J. (2011). A new methodology for multi-pass single point incremental forming with mixed toolpaths. CIRP Annals , 60 (1), 323–326. https://doi.org/10.1016/j.cirp.2011.03.145 .

Matheron, G., Perrin, N., & Sigaud, O. (2019). The problem with DDPG: understanding failures in deterministic environments with sparse rewards. arXiv:1911.11679

Mnih, V., Badia, A. P., Mirza, M., Graves, A., Lillicrap, T. P., Harley, T., Silver, D., & Kavukcuoglu, K. (2016). Asynchronous methods for deep reinforcement learning. Proceedings of the 33rd International Conference on Machine Learning, PMLR 48 , 1928–1937. https://proceedings.mlr.press/v48/mniha16.html

Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., Graves, A., Riedmiller, M., Fidjeland, A. K., Ostrovski, G., Petersen, S., Beattie, C., Sadik, A., Antonoglou, I., King, H., Kumaran, D., Wierstra, D., Legg, S., & Hassabis, D. (2015). Human-level control through deep reinforcement learning. Nature , 518 (7540), 529–533. https://doi.org/10.1038/nature14236 .

Monostori, L., Markus, A., Van Brussel, H., & Westkämpfer, E. (1996). Machine learning approaches to manufacturing. CIRP Annals , 45 (2), 675–712. https://doi.org/10.1016/s0007-8506(18)30216-6 .

Nagargoje, A., Kankar, P. K., Jain, P. K., & Tandon, P. (2021). Application of artificial intelligence techniques in incremental forming: A state-of-the-art review. Journal of Intelligent Manufacturing . https://doi.org/10.1007/s10845-021-01868-y .

Opritescu, D., & Volk, W. (2015). Automated driving for individualized sheet metal part production - A neural network approach. Robotics and Computer-Integrated Manufacturing , 35 , 144–150. https://doi.org/10.1016/j.rcim.2015.03.006 .

Rossi, G., & Nicholas (2018). Re/Learning the wheel: Methods to utilize neural networks as design tools for doubly curved metal surfaces. Proc 38th Annu Conf Assoc Comput Aided Des Archit , 146-155 , https://doi.org/10.52842/conf.acadia.2018.146 .

Schulman, J., Levine, S., Moritz, P., Jordan, M. I., & Abbeel, P. (2015a). Trust region policy optimization. Proceedings of the 32nd International Conference on Machine Learning, PMLR 37:1889–1897. https://proceedings.mlr.press/v37/schulman15.html

Schulman, J., Moritz, P., Levine, S., Jordan, M., & Abbeel, P. (2015b). High-dimensional continuous control using generalized advantage estimation. arXiv:1707.06347

Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O. (2017). Proximal policy optimization algorithms. arXiv:1506.02438

Simonyan, K., & Zisserman, A. (2015). Very deep convolutional networks for large-scale image recognition. 3rd International Conference on Learning Representations, ICLR 2015 - Conference Track Proceedings , 1–14.

Störkle, D., Altmann, P., Möllensiep, D., Thyssen, L., & Kuhlenkötter, B. (2019). Automated parameterization of local support at every toolpath point in robot-based incremental sheet forming. Procedia Manufacturing , 29 , 67–73. https://doi.org/10.1016/j.promfg.2019.02.107 .

Störkle, D. D., Seim, P., Thyssen, L., & Kuhlenkötter, B. (2016). Machine learning in incremental sheet forming. 47st International Symposium on Robotics, 2016, 1–7.

Sutton, R. S., & Barto, A. G. (2017). Reinforcement Learning: An Introduction (Second edi) . The MIT Press Cambridge.

Tanaka, H., Asakawa, N., & Hirao, M. (2005). Development of a forging type rapid prototyping system; Automation of a free forging and metal hammering working. Journal of Robotics and Mechatronics , 17 (5), 523–528. https://doi.org/10.20965/jrm.2005.p0523 .

van der Maaten, L., & Hinton, G. (2008). Visualizing high-dimensional data using t-SNE. Journal of Machine Learning Research , 9 , 2579–2605.

van Hasselt, H., Guez, A., & Silver, D. (2016). Deep reinforcement learning with double Q-learning. Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (AAAI'16), (pp. 2094–2100. https://doi.org/10.1609/aaai.v30i1.10295

Wang, Z., Schaul, T., Hessel, M., van Hasselt, H., Lanctot, M., & de Freitas, N. (2015). Dueling network architectures for deep reinforcement learning. arXiv.1511.06581

Williams, R. J., & Peng, J. (1991). Function optimization using connectionist reinforcement learning algorithms. Connection Science , 3 (3), 241–268. https://doi.org/10.1080/09540099108946587 .

Download references

Acknowledgments

S. Liu is grateful for the support from China Scholarship Council (CSC) (Grant no. 201908060236).

S. Liu received subsistence allowance from China Scholarship Council (CSC) under Grant no. 201908060236.

Author information

Authors and affiliations.

Department of Mechanical Engineering, Imperial College London, London, SW7 2AZ, UK

Shiming Liu, Zhusheng Shi & Jianguo Lin

School of Creative Technologies, University of Portsmouth, Portsmouth, PO1 2DJ, UK

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Zhusheng Shi .

Ethics declarations

Competing interest.

The authors have no competing interests to declare that are relevant to the content of this article.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Liu, S., Shi, Z., Lin, J. et al. A generalisable tool path planning strategy for free-form sheet metal stamping through deep reinforcement and supervised learning. J Intell Manuf (2024). https://doi.org/10.1007/s10845-024-02371-w

Download citation

Received : 08 November 2022

Accepted : 13 March 2024

Published : 22 April 2024

DOI : https://doi.org/10.1007/s10845-024-02371-w

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Deep learning
  • Deep reinforcement learning
  • Deep supervised learning
  • Sheet metal forming
  • Intelligent manufacturing
  • Tool path planning
  • Find a journal
  • Publish with us
  • Track your research

IMAGES

  1. 15 Most Common Deep Learning Use Cases Across Industries

    case study on deep learning

  2. Deep Learning Tutorial: What it Means and what’s the role of Deep Learning

    case study on deep learning

  3. Deep Learning for Feature Extraction in Remote Sensing: A Case-Study of

    case study on deep learning

  4. Deep Learning: ¿qué es y cómo se relaciona con el Machine Learning?

    case study on deep learning

  5. Deep Learning

    case study on deep learning

  6. Optimizing Deep Learning Models with the Right Number of Images

    case study on deep learning

VIDEO

  1. We hit #2... (Local SEO is TOO EASY)

  2. DeepLearning in Genomics

  3. [Deep Learning] Stream Analytics and Machine Learning

  4. Deep Learning Overview

  5. SCALING YOUR PMO

  6. Dharavi Redevelopment Unveiling

COMMENTS

  1. [1703.00133] Easy over Hard: A Case Study on Deep Learning

    Easy over Hard: A Case Study on Deep Learning. While deep learning is an exciting new technique, the benefits of this method need to be assessed with respect to its computational cost. This is particularly important for deep learning since these learners need hours (to weeks) to train the model. Such long training time limits the ability of (a ...

  2. A Case Study Applying Mesoscience to Deep Learning

    Abstract. In this paper, we propose mesoscience-guided deep learning (MGDL), a deep learning modeling approach guided by mesoscience, to study complex systems. When establishing sample dataset based on the same system evolution data, different from the operation of conventional deep learning method, MGDL introduces the treatment of the dominant ...

  3. (PDF) Easy over hard: a case study on deep learning

    Easy over Hard: A Case Study on Deep Learning. W ei Fu, Tim Menzies. Com.Sci., NC State, USA. [email protected],[email protected]. ABSTRACT. While deep learning is an exciting new technique, the ...

  4. Deep learning accelerators: a case study with MAESTRO

    Deep learning is a new subset of machine learning including algorithms that are used for learning concepts in different levels, utilizing artificial neural networks [ 4 ]. As Fig. 3 shows, if each neuron and its weight are represented by X i and W i j respectively, the output result (Y j) would be: Fig. 3.

  5. Understanding Deep Learning: Case Study Based Approach

    This section describes some real-life case studies where we can apply deep learning. Mainly three case studies are analyzed. Two of them are using CNN architecture to do some prediction or classification. Although CNN is used, it differs in processing the input data, and the remaining steps are the same while using the convolution layer and ...

  6. Deep learning for recommender systems: A Netflix case study

    On the practical side, integrating deep-learning toolboxes in our system has made it faster and easier to implement and experiment with both deep-learning and non-deep-learning approaches for various recommendation tasks. We conclude this article by summarizing our take-aways that may generalize to other applications beyond Netflix.

  7. Nvidia: Winning the deep-learning leadership battle

    Home Research & Knowledge Strategy Nvidia: Winning the deep-learning leadership battle. The case charts the evolution of NVIDIA, the market leading producer of graphics processing units (GPU), from its beginnings to becoming a leader at the forefront of artificial intelligence (AI) development. Founded in 1993, the company designed processors ...

  8. The Open Science of Deep Learning: Three Case Studies

    <p>Objective: An area of research in which open science may have particularly high impact is in deep learning (DL), where researchers have developed many algorithms to solve challenging problems, but others may have difficulty in replicating results and applying these algorithms. In response, some researchers have begun to open up DL research by making their resources available (e.g ...

  9. Deep learning in automated text classification: a case study using

    Our case study presents an empirical assessment of a range of deep learning algorithms and architectures compared with a baseline traditional machine learning algorithm. The case study is designed in the context of systematic reviews related to human health risk assessment and is focused on both the performance and the practicalities of these ...

  10. 12 Deep Learning Use Cases / Applications in Healthcare [2024]

    Computer vision, natural language processing, reinforcement learning are the most commonly used deep learning techniques in healthcare. IDC claims that: Research in the pharma industry is one of the fastest growing use cases. Global spending on AI will be more than $110 billion in 2024.

  11. Deep Learning for Recommender Systems: A Netflix Case Study

    Deep learning has profoundly impacted many areas of machine learning. However, it took a while for its impact to be felt in the field of recommender systems. In this article, we outline some of the challenges encountered and lessons learned in using deep learning for recommender systems at Netflix. We first provide an overview of the various recommendation tasks on the Netflix service.

  12. Machine Learning and Deep Learning Sentiment Analysis Models: Case

    This article presents a comprehensive evaluation of traditional machine learning and deep learning models in analyzing sentiment trends within the SENT-COVID Twitter corpus, curated during the COVID-19 pandemic. The corpus, filtered by COVID-19 related keywords and manually annotated for polarity, is a pivotal resource for conducting sentiment analysis experiments. Our study investigates ...

  13. Deep learning in finance and banking: A literature review and

    Deep learning has been widely applied in computer vision, natural language processing, and audio-visual recognition. The overwhelming success of deep learning as a data processing technique has sparked the interest of the research community. Given the proliferation of Fintech in recent years, the use of deep learning in finance and banking services has become prevalent. However, a detailed ...

  14. Applying Deep Learning to Hail Detection: A Case Study

    In this article, we demonstrate the value of deep learning for atmospheric science applications by providing a proof of concept, using deep learning for the detection of hail-bearing storms as a test case study. The deep learning network presented in this article obtains a higher precision when presented with multisource data and is able to ...

  15. Deep learning-based landslide susceptibility mapping

    Deep learning methods have gained popularity because they often outperform conventional shallow learning methods by extracting informative features automatically from raw data with little or no ...

  16. Deep learning model for temperature prediction: A case study in New

    This study is based on temperature prediction in the capital of India (New Delhi). We have adopted different ML models such as (MPR and DNN) which are designed and implemented for temperature prediction. The MPR models are varied on the degree of the polynomial, whereas the DNN models differ in the number of input parameters.

  17. Application of a novel deep learning-based 3D videography workflow to

    Deep learning—DeepLabCut. For each case study, we used DLC 11 to train deep learning models on detecting the positions of 16 anatomical landmarks (see colored dots in Figure 1B and labels in Figure 6B). The training was done on a total of 1062 training frames for case study 1 and 1078 training frames for case study 2 as described above on a ...

  18. (PDF) Deep Learning Accelerators: A Case Study

    learning accelerators (DLA) [ 15 ]. A DLA is a hardware architecture that is specially. designed and optimized for deep le arning purposes. Recent DLA architectures (e.g. OpenCL) have mainly ...

  19. Deep Learning Based Forecasting: A Case Study from the ...

    Sales to demand translation for an article with 5 sizes: In a, historical sales observations of an article over weeks with all sizes available are aggregated.Based on a, we obtain in b, the articles's empirical probability distribution over sizes.In c, the weekly demand of an article with missing sizes L and XL is illustrated.The unobserved demand for L and XL is inferred in d, given the ...

  20. Symmetry

    A cable-stayed bridge is a typical symmetrical structure, and symmetry affects the deformation characteristics of such bridges. The main girder of a cable-stayed bridge will produce obvious deflection under the inducement of temperature. The regression model of temperature-induced deflection is hoped to provide a comparison value for bridge evaluation. Based on the temperature and deflection ...

  21. Applying Deep Learning to Hail Detection: A Case Study

    The deep learning network presented in this article obtains a higher precision when presented with multisource data and is able to identify a common feature associated with hail storms—decreased infrared brightness temperatures. This network and case study illustrate the capability of deep networks for the detection of weather phenomena and ...

  22. Customer Stories and Case Studies Powered by NVIDIA

    Customer Stories. accelerated compute, and simulation to modernize their businesses. See how industry leaders are driving innovations with AI, accelerated computing, and simulation to modernize business.

  23. [2404.14455] A Neuro-Symbolic Explainer for Rare Events: A Case Study

    A Neuro-Symbolic Explainer for Rare Events: A Case Study on Predictive Maintenance. João Gama, Rita P. Ribeiro, Saulo Mastelini, Narjes Davarid, Bruno Veloso. Predictive Maintenance applications are increasingly complex, with interactions between many components. Black box models are popular approaches based on deep learning techniques due to ...

  24. Predicting malaria outbreaks using earth observation ...

    Predicting malaria outbreaks using earth observation measurements and spatiotemporal deep learning modelling: a South Asian case study from 2000 to 2017. ... In this case study, we developed and internally validated a data fusion approach to predict malaria incidence in Pakistan, India, and Bangladesh using geo-referenced environmental factors. ...

  25. Particle Swarm Optimized Deep Learning Models for Rainfall Prediction

    Abstract: Rainfall is vital to all life on Earth, and rainfall prediction is essential for various sectors and aspects of human society. Hilly areas such as the state of Mizoram in India have suffered from landslides during the rainy season. This study compares twelve hybrid deep learning and machine learning models to predict daily rainfall using meteorological variables such as maximum ...

  26. A Case Study of Deep Reinforcement Learning for Engineering Design

    Abstract. Efficient exploration of design spaces is highly sought after in engineering applications. A spectrum of tools has been proposed to deal with the computational difficulties associated with such problems. In the context of our case study, these tools can be broadly classified into optimization and supervised learning approaches. Optimization approaches, while successful, are ...

  27. A generalisable tool path planning strategy for free-form ...

    Conferred by deep learning, the tool path planning process is corroborated to have self-learning characteristics. This method has been instantiated and verified by a successful application to a case study, of which the workpiece shape deformed by the predicted tool path has been compared with its target shape.

  28. Novel Automatic Classification of Human Adult Lung Alveolar ...

    SARS-CoV-2 can infect alveoli, inducing a lung injury and thereby impairing the lung function. Healthy alveolar type II (AT2) cells play a major role in lung injury repair as well as keeping alveoli space free from fluids, which is not the case for infected AT2 cells. Unlike previous studies, this novel study aims to automatically differentiate between healthy and infected AT2 cells with SARS ...