Table of Contents:

Generating some data

Initialize the parameters, compute the class scores, compute the loss.

  • Computing the analytic gradient with backpropagation

Performing a parameter update

Putting it all together: training a softmax classifier, training a neural network.

In this section we’ll walk through a complete implementation of a toy Neural Network in 2 dimensions. We’ll first implement a simple linear classifier and then extend the code to a 2-layer Neural Network. As we’ll see, this extension is surprisingly simple and very few changes are necessary.

Lets generate a classification dataset that is not easily linearly separable. Our favorite example is the spiral dataset, which can be generated as follows:

case study on neural networks

Normally we would want to preprocess the dataset so that each feature has zero mean and unit standard deviation, but in this case the features are already in a nice range from -1 to 1, so we skip this step.

Training a Softmax Linear Classifier

Lets first train a Softmax classifier on this classification dataset. As we saw in the previous sections, the Softmax classifier has a linear score function and uses the cross-entropy loss. The parameters of the linear classifier consist of a weight matrix W and a bias vector b for each class. Lets first initialize these parameters to be random numbers:

Recall that we D = 2 is the dimensionality and K = 3 is the number of classes.

Since this is a linear classifier, we can compute all class scores very simply in parallel with a single matrix multiplication:

In this example we have 300 2-D points, so after this multiplication the array scores will have size [300 x 3], where each row gives the class scores corresponding to the 3 classes (blue, red, yellow).

The second key ingredient we need is a loss function, which is a differentiable objective that quantifies our unhappiness with the computed class scores. Intuitively, we want the correct class to have a higher score than the other classes. When this is the case, the loss should be low and otherwise the loss should be high. There are many ways to quantify this intuition, but in this example lets use the cross-entropy loss that is associated with the Softmax classifier. Recall that if \(f\) is the array of class scores for a single example (e.g. array of 3 numbers here), then the Softmax classifier computes the loss for that example as:

We can see that the Softmax classifier interprets every element of \(f\) as holding the (unnormalized) log probabilities of the three classes. We exponentiate these to get (unnormalized) probabilities, and then normalize them to get probabilites. Therefore, the expression inside the log is the normalized probability of the correct class. Note how this expression works: this quantity is always between 0 and 1. When the probability of the correct class is very small (near 0), the loss will go towards (positive) infinity. Conversely, when the correct class probability goes towards 1, the loss will go towards zero because \(log(1) = 0\). Hence, the expression for \(L_i\) is low when the correct class probability is high, and it’s very high when it is low.

Recall also that the full Softmax classifier loss is then defined as the average cross-entropy loss over the training examples and the regularization:

Given the array of scores we’ve computed above, we can compute the loss. First, the way to obtain the probabilities is straight forward:

We now have an array probs of size [300 x 3], where each row now contains the class probabilities. In particular, since we’ve normalized them every row now sums to one. We can now query for the log probabilities assigned to the correct classes in each example:

The array correct_logprobs is a 1D array of just the probabilities assigned to the correct classes for each example. The full loss is then the average of these log probabilities and the regularization loss:

In this code, the regularization strength \(\lambda\) is stored inside the reg . The convenience factor of 0.5 multiplying the regularization will become clear in a second. Evaluating this in the beginning (with random parameters) might give us loss = 1.1 , which is -np.log(1.0/3) , since with small initial random weights all probabilities assigned to all classes are about one third. We now want to make the loss as low as possible, with loss = 0 as the absolute lower bound. But the lower the loss is, the higher are the probabilities assigned to the correct classes for all examples.

Article Contents

Conclusions.

  • < Previous

A case study of using artificial neural networks for classifying cause of death from verbal autopsy

  • Article contents
  • Figures & tables
  • Supplementary Data

Andrew Boulle, Daniel Chandramohan, Peter Weller, A case study of using artificial neural networks for classifying cause of death from verbal autopsy, International Journal of Epidemiology , Volume 30, Issue 3, June 2001, Pages 515–520, https://doi.org/10.1093/ije/30.3.515

  • Permissions Icon Permissions

Background Artificial neural networks (ANN) are gaining prominence as a method of classification in a wide range of disciplines. In this study ANN is applied to data from a verbal autopsy study as a means of classifying cause of death.

Methods A simulated ANN was trained on a subset of verbal autopsy data, and the performance was tested on the remaining data. The performance of the ANN models were compared to two other classification methods (physician review and logistic regression) which have been tested on the same verbal autopsy data.

Results Artificial neural network models were as accurate as or better than the other techniques in estimating the cause-specific mortality fraction (CSMF). They estimated the CSMF within 10% of true value in 8 out of 16 causes of death. Their sensitivity and specificity compared favourably with that of data-derived algorithms based on logistic regression models.

Conclusions Cross-validation is crucial in preventing the over-fitting of the ANN models to the training data. Artificial neural network models are a potentially useful technique for classifying causes of death from verbal autopsies. Large training data sets are needed to improve the performance of data-derived algorithms, in particular ANN models.

KEY MESSAGES

Artifical neural networks have potential for classifying causes of death from verbal autopsies.

Large datasets are needed to train neural networks and for validating their performance.

Generalizability of neural network models to various settings needs further evaluation.

In many countries routine vital statistics are of poor quality, and often incomplete or unavailable. In countries where vital registration and routine health information systems are weak, the application of verbal autopsy (VA) in demographic surveillance systems or cross-sectional surveys has been suggested for assessing cause-specific burden of mortality. The technique involves taking an interviewer-led account of the symptoms and signs that were present preceding the death of individuals from their caretakers. Traditionally the information obtained from caretakers is analysed by physicians and a cause(s) of death is reached if a majority of physicians on a panel agreed on a cause(s). The accuracy of physician reviews has been tested in several settings using causes of death assigned from hospital records as the ‘gold standard’. Although physician reviews of VA gave robust estimates of cause-specific mortality fractions (CSMF) of several causes of death, the sensitivity, specificity and predictive values varied between causes of death and between populations 1 , 2 and had poor repeatability of results. 3

Arguments to introduce opinion-based and/or data-derived algorithm methods of assigning cause of death from VA data are based on both the quest for accuracy and consistency, as well as the logistical difficulties in getting together a panel of physicians to review what are often large numbers of records. However, physician review performed better than set diagnostic criteria (opinion-based or data-derived) given in an algorithm to assign a cause of adult death. 4 One promising approach to diagnose disease status has been artificial neural networks (ANN) which apply non-linear statistics to pattern recognition. For example, ANN predicted outcomes in cancer patients better than a logistic regression model. 5 Duh et al. speculate that ANN will prove useful in epidemiological problems that require pattern recognition and complex classification techniques. 6 In this report, we compare the performance of ANN and logistic regression models and physician review for reaching causes of adult death from VA.

An overview of neural networks

Although often referred to as black boxes, neural networks can in fact easily be understood by those versed in regression analysis techniques. In essence, they are complex non-linear modelling equations. The inputs, outputs and weights in a neural network are analogous to the input variables, outcome variables and coefficients in a regression analysis. The added complexity is largely the result of a layering of ‘nodes’ which provides a far more detailed map of the decision space. A single node neural network will produce a comparable output to logistic regression, where a function will combine the weights of the inputs to produce the output (Figure 1 ).

Combining these nodes into multiple layers adds to the complexity of the model and hence the discriminatory power. In so doing, a number of elements, each receiving all of the inputs and producing an output, have these outputs sent as inputs to a further element(s). The architecture is called a multi-layer perceptron (Figure 2 ).

The study population and field procedures of the VA data used in this analysis are described elsewhere. 1 In brief, data were collected at three sites (a regional hospital in Ethiopia, and two rural hospitals in Tanzania and Ghana). Adults dying at these hospitals who lived within a 60-km radius of the institution were included in the study. A VA questionnaire was administered by interviewers with at least 12 years of formal education. The reference diagnoses (gold standard) were obtained from a combination of hospital records and death certificates by one of the authors (DC) together with a local physician in each site. A panel of three physicians reviewed the VA data and reached a cause of death if any two agreed on a cause (physician review).

The method used to derive algorithms from the data using logistic regression models has been described elsewhere. 4 Each subject was randomly assigned to the train dataset (n = 410) or test dataset (n = 386), such that the number of deaths due to each cause (gold standard) was the same in both datasets. If a cause of death had odd numbers, the extra subject was included in the train dataset. Symptoms (includes signs) with odds ratio (OR) ≥2 or ≤0.5 in univariate analyses were included in a logistic model and then those symptoms that were not significant statistically ( P > 0.1) were dropped from the model in a backward stepwise manner. Coefficients of each symptom remaining in the model were summed to obtain a score for each subject i.e. Score = b 1 × 1 +b 2 × 2 +…, where b i x i are the log OR b i of symptoms x i in the model. A cut-off score was identified for each cause of death (included 16 primary causes of adult death) that gave the estimated number of deaths closest to the true number of cause-specific deaths, such that the sensitivity was at least 50%.

We used the same train and test datasets used by Quigley et al. for training and testing an ANN. The data were ported to Microsoft Excel™ and analysed using NeuroSolutions 3.0™ (Lefebvre WC. NeuroSolution Version 3.020, Neurodimension Inc.1994. [ www.nd.com ]). All models were multi-layer perceptrons with a single hidden layer and trained with static backpropogation. The number of nodes in the hidden layer were varied according to the number of inputs and network performance. A learning rate of 0.7 was used throughout with the momentum learning rule. A sigmoid activation function was used for all processing elements.

Model inputs were based on those used in the logistic regression study, with further variables added to improve discrimination in instances when they improved the model performance. Sensitivity analysis provided the basis for evaluating the role of the inputs in the models.

For each diagnosis, the first 100 records of the training subset were used in the first training run of each model as a cross-validation set to determine the optimal number of hidden nodes and the training point at which the cross-validation mean squared error reached a trough. Thereafter the full training set was used to train the network to this point.

The output weights were then adjusted by a variable factor until the CSMF was as close as possible to 100% of the expected value in testing runs on the training set. At this point the network was tested on the unseen data in the test subset.

Weighted (by number of deaths) averages for sensitivity and specificity were calculated for each method. A summary measure for CSMF was calculated for each method by summing the absolute difference in observed and estimated number of cases for each cause of death, dividing by the total number of deaths, and converting to a percentage.

Table 1 shows the comparison of validity of the logistic regression models versus the ANN models for estimating CSMF by comparing estimated with observed number of cases as well as sensitivity and specificity.

The CSMF was estimated to within 10% of the true value in 8 out of 16 classes (causes of death) by the ANN. In a further six classes it was estimated to within 25% of the true value. In the remaining two classes the CSMF was extremely low (tetanus and rabies). The summary measure for CSMF favours those methods that are more accurate on the more frequently occurring classes and may mask poor performance on rare causes of death. In this measure however, calculated from the absolute number of over- or under-diagnosed cases, the neural network method performed better than logistic regression models (average error 11.27% versus 31.27%), and compared well with physician review (average error of 12.84%). In the assessment of chance agreement between ANN and gold-standard diagnoses, the kappa value was ≥0.5 for the following classes: rabies (0.86), injuries (0.76), tetanus (0.66), tuberculosis and AIDS (0.55), direct maternal causes (0.52), meningitis (0.50), and diarrhoea (0.50).

There was a trade-off between specificity and sensitivity, and in some instances the neural network performed better than other techniques in one at the expense of the other. Compared to logistic regression, the networks performed better in both parameters for tuberculosis and AIDS, meningitis, cardiovascular disorders, diarrhoea, and tetanus. They produced a lower sensitivity for malaria (compared with logistic regression), but a higher specificity. The overall and disease-specific sensitivities and specificities compared favourably with logistic regression, but did not match the performance of physician review.

Accuracy of CSMF estimates

One of the most significant findings of this analysis is the relative accuracy in assessing the fraction of deaths that are due to specific causes, especially for the more frequently occurring classes. The accuracy in this estimate does not always correlate with the reliability estimated by the kappa statistic. Care was taken to find a weighting for the output that would lead to a correct CSMF in the training set. The choice of this weight is analogous to selecting the minimum total score at which a case is defined in the logistic regression models. This then led to surprisingly good estimates in the testing set. It is a feature of the train and test subsets however that the number of members in each class is similar. Manipulating either subset so that the CSMF differed, by randomly removing or adding records of the class in question, did not alter the accuracy of the CSMF estimates if the number of training examples for the class was not decreased in the training subset. With less frequently occurring classes such as pneumonia, decreasing the number of training examples in the training set, reduced the accuracy of the CSMF estimate. This is essentially an issue of generalisation, and it is to be expected that networks that are trained with fewer examples are less likely to be generalizable. It is suggested that it is for this reason that the CSMF estimates for the five most frequently occurring classes are all within 10% of the expected values. It would be expected furthermore that if the datasets were larger, that the generalizability of the CSMF estimates for the less frequently occurring classes would improve.

At the stage of data analysis the question can be asked as to whether or not there is an output level above which class membership is reasonably certain, and below which misclassification is more likely to occur. Looking at the tuberculosis-AIDS model (n = 71), as well as the meningitis model (n = 32), and ranking the top 20 test outputs in descending order by value (reflecting the certainty of the classification), 13/20 of these outputs correctly predict the class membership in both instances. The sensitivities for the models overall were 66% and 56% respectively. The implication is that without a gold standard result for comparison, it would be difficult to delineate the true positives from the false positives even in the least equivocal outputs. This is in keeping with observations that different data-derived methods arrive at their estimates differently. One study to predict an acute abdomen diagnosis from surgical admission records demonstrated that data-derived methods with similar overall performance correlated poorly as to which of the records they were correctly predicting. 7

Mechanisms of improved performance

A single layer neural network (i.e. a network with only inputs, and one processing element) is isomorphic with logistic regression. A network with no hidden nodes produced almost identical results when comparing the input weights to the log(OR) for the four inputs used in the regression model to predict malaria as the cause of death. In those instances where the performance of logistic regression and neural network models differ, it is of interest as to know the mechanisms by which improvements are made. The results from this study indicate that the differences in performance of the neural networks are achieved both by improved fitting of those variables already known to be significantly predictive of class membership, through the modelling of interaction between them, and by additional discriminating power conferred by variables that are not significantly predictive on their own

The first mechanism was borne out in one of the meningitis models in which the exact same inputs used in the logistic regression model were used in the neural network model with an improvement in performance. Exploring the sensitivity analysis for cardiovascular deaths (Table 2 ), the network outputs are surprisingly sensitive to the absence of a tuberculosis history, which was not strongly predictive by itself. Age above 45 years old was the seventh most predictive input in the regression model, whereas it was the input to which the neural network model was second most sensitive. In the case of meningitis, presence of continuous fever was more important in the regression model, whilst presence or absence of recent surgery and abdominal distension were more significant in the ANN model (Table 3 ). The network has mapped relationships between the inputs that were not predicted by the regression model.

Effect of size of dataset

Both data-derived methods stand to benefit from more training examples. In the regression models, some inputs not currently utilized may yield significant associations with outputs when larger datasets are used. With enough nodes and training time, it was possible in the course of this analysis to train a neural network to completely map the training set with 100% sensitivity and specificity. However, this level of sensitivity and specificity was not reproduced when these models were tested in the test dataset. What it did demonstrate is the ability of the method to map complex functions. The key point is one of generalizability. In the models presented above, training was stopped and the nodes limited to ensure that the generalizability was not compromised. With more training examples, it is likely that the networks would develop a better understanding of the relationships between inputs and outputs before over-training occurs. Arguably, the neural network models would stand to improve performance more than the regression models should larger training sets be available. However, further training may not achieve algorithms of sufficiently high sensitivity and specificity to obviate the need for algorithms with particular operating characteristics suitable for use in specific environments.

Physician review

Only 78% of the reference diagnoses were confirmed by laboratory tests. Since 22% of the reference diagnoses were based on hospital physicians' clinical judgement, it is not surprising that physician review of VA performed better than the other methods. Nevertheless, physician review remains the optimal method of analysis, as far as overall performance is concerned, for gathering cause-specific mortality data as good as the data produced by routine health information systems. 1 The technique by which physicians in this study came to their classification differed considerably, as they made extensive use of the open section of the questionnaire from which information was not coded for analysis by the other techniques. Interestingly though, other methods are able to come close if the CSMF is used as the outcome of choice, as indeed it often is. Thus ANN or logistic regression models based algorithms have the potential for substituting physician review of VA.

Limitations of the technique

At various points we have alluded to some of the difficulties and limitations of using neural networks for the analysis. These are summarized in Table 4 .

Even with sensitivity analysis, we had no way of working out which were going to be the most important inputs prior to creating a model and conducting a sensitivity analysis on it. There is some correlation with linearly predictive inputs that helps in the initial stages.

Determining the weighting for the output for providing the optimum estimate of the CSMF was time-consuming. The software provides an option for prioritizing sensitivity over specificity, but no way of balancing the number of false positives and false negatives that would give an accurate CSMF estimate.

Designing the optimal network topology requires building numerous networks in search of the one with the lowest least mean squared error. The number of hidden nodes, inputs and training time all affect the performance of the network. Whilst training is relatively quick compared to the many hours it took to train ANN in the early days of their development, it is still time-consuming to build and train multiple networks for each model.

Cross-validation to prevent over-training required compromising the number of training examples to allow for a cross-validation dataset.

Sensitivity and specificity of the ANN algorithms were not high enough to be generalizable to a variety of settings. Furthermore, the accuracy of individual and summary estimates of CSMF obtained in this study could be due to the similarity in the CSMF between the training and test datasets. Thus large datasets from a variety of settings are needed to identify optimal algorithms for each site with different distributions of causes of death.

Classification software based on neural network simulations is an accessible tool which can be applied to VA data potentially outperforming other the data-derived techniques already studied for this purpose. As with other data-derived techniques, over-fitting to the training data leading to a compromise in the generalizability of the models is a potential limitation of ANN. Increasing the number of training examples is likely to improve performance of neural networks for VA. However, ANN algorithms with particular operating characteristics would be site-specific. Thus optimal algorithms need to be identified for use in a variety of settings.

Comparison of performance of physician review, logistic regression and neural network models

Comparison of the most important inputs for two data-derived models for assigning cardiovascular deaths

Comparison of the most important inputs for two data-derived models for assigning death due to meningitis

Limitations of the artificial neural network technique

 Schematic representation of a single node in a neural network

Schematic representation of a single node in a neural network

 Schematic representation of multi-layer perceptron

Schematic representation of multi-layer perceptron

London School of Hygiene & Tropical Medicine, Keppel Street, London WC1E, UK.

Chandramohan D, Maude H, Rodrigues L, Hayes R. Verbal autopsies for adult deaths: their development and validation in a multicentre study. Trop Med Int Health 1998 ; 3 : 436 –46.

Snow RW, Armstrong ARM, Forster D et al . Childhood deaths in Africa: uses and limitations of verbal autopsies. Lancet 1992 ; 340 : 351 –55.

Todd JE, De Francisco A, O'Dempsey TJD, Greenwood BM. The limitations of verbal autopsy in a malaria-endemic region. Ann Trop Paediatr 1994 ; 14 : 31 –36.

Quigley M, Chandramohan D, Rodrigues L. Diagnostic accuracy of physician review, expert algorithms and data-derived algorithms in adult verbal autopsies. Int J Epidemiol 1999 ; 28 : 1081 –87.

Jefferson MF, Pendleton N, Lucas SB, Horan MA. Comparison of a genetic algorithm neural network with logistic regression for predicting outcome after surgery for patients with nonsmall cell lung carcinoma. Cancer 1997 ; 79 : 1338 –42.

Duh MS, Walker AM, Pagano M, Kronlund K. Epidemiological interpretation of artificial neural networks. Am J Epidemiol 1998 ; 147 : 1112 –22.

Schwartz S, et al . Connectionist, rule-based and Bayesian diagnostic decision aids: an empirical comparison. In: Hand DJ (ed.). Artificial Intelligence Frontiers in Statistics . London: Chapman and Hall, 1993, pp.264–77.

Email alerts

Citing articles via, looking for your next opportunity.

  • About International Journal of Epidemiology
  • Recommend to your Library

Affiliations

  • Online ISSN 1464-3685
  • Copyright © 2024 International Epidemiological Association
  • About Oxford Academic
  • Publish journals with us
  • University press partners
  • What we publish
  • New features  
  • Open access
  • Institutional account management
  • Rights and permissions
  • Get help with access
  • Accessibility
  • Advertising
  • Media enquiries
  • Oxford University Press
  • Oxford Languages
  • University of Oxford

Oxford University Press is a department of the University of Oxford. It furthers the University's objective of excellence in research, scholarship, and education by publishing worldwide

  • Copyright © 2024 Oxford University Press
  • Cookie settings
  • Cookie policy
  • Privacy policy
  • Legal notice

This Feature Is Available To Subscribers Only

Sign In or Create an Account

This PDF is available to Subscribers Only

For full access to this pdf, sign in to an existing account, or purchase an annual subscription.

Suggestions or feedback?

MIT News | Massachusetts Institute of Technology

  • Machine learning
  • Social justice
  • Black holes
  • Classes and programs

Departments

  • Aeronautics and Astronautics
  • Brain and Cognitive Sciences
  • Architecture
  • Political Science
  • Mechanical Engineering

Centers, Labs, & Programs

  • Abdul Latif Jameel Poverty Action Lab (J-PAL)
  • Picower Institute for Learning and Memory
  • Lincoln Laboratory
  • School of Architecture + Planning
  • School of Engineering
  • School of Humanities, Arts, and Social Sciences
  • Sloan School of Management
  • School of Science
  • MIT Schwarzman College of Computing

Study urges caution when comparing neural networks to the brain

Press contact :, media download, *terms of use:.

Images for download on the MIT News office website are made available to non-commercial entities, press and the general public under a Creative Commons Attribution Non-Commercial No Derivatives license . You may not alter the images provided, other than to crop them to size. A credit line must be used when reproducing images; if one is not provided below, credit the images to "MIT."

Previous image Next image

Neural networks, a type of computing system loosely modeled on the organization of the human brain, form the basis of many artificial intelligence systems for applications such speech recognition, computer vision, and medical image analysis.

In the field of neuroscience, researchers often use neural networks to try to model the same kind of tasks that the brain performs, in hopes that the models could suggest new hypotheses regarding how the brain itself performs those tasks. However, a group of researchers at MIT is urging that more caution should be taken when interpreting these models.

In an analysis of more than 11,000 neural networks that were trained to simulate the function of grid cells — key components of the brain’s navigation system — the researchers found that neural networks only produced grid-cell-like activity when they were given very specific constraints that are not found in biological systems.

“What this suggests is that in order to obtain a result with grid cells, the researchers training the models needed to bake in those results with specific, biologically implausible implementation choices,” says Rylan Schaeffer, a former senior research associate at MIT.

Without those constraints, the MIT team found that very few neural networks generated grid-cell-like activity, suggesting that these models do not necessarily generate useful predictions of how the brain works.

Schaeffer, who is now a graduate student in computer science at Stanford University, is the lead author of the new study , which will be presented at the 2022 Conference on Neural Information Processing Systems this month. Ila Fiete, a professor of brain and cognitive sciences and a member of MIT’s McGovern Institute for Brain Research, is the senior author of the paper. Mikail Khona, an MIT graduate student in physics, is also an author.

Modeling grid cells

Neural networks, which researchers have been using for decades to perform a variety of computational tasks, consist of thousands or millions of processing units connected to each other. Each node has connections of varying strengths to other nodes in the network. As the network analyzes huge amounts of data, the strengths of those connections change as the network learns to perform the desired task.

In this study, the researchers focused on neural networks that have been developed to mimic the function of the brain’s grid cells, which are found in the entorhinal cortex of the mammalian brain. Together with place cells, found in the hippocampus, grid cells form a brain circuit that helps animals know where they are and how to navigate to a different location.

Place cells have been shown to fire whenever an animal is in a specific location, and each place cell may respond to more than one location. Grid cells, on the other hand, work very differently. As an animal moves through a space such as a room, grid cells fire only when the animal is at one of the vertices of a triangular lattice. Different groups of grid cells create lattices of slightly different dimensions, which overlap each other. This allows grid cells to encode a large number of unique positions using a relatively small number of cells.

This type of location encoding also makes it possible to predict an animal’s next location based on a given starting point and a velocity. In several recent studies, researchers have trained neural networks to perform this same task, which is known as path integration.

To train neural networks to perform this task, researchers feed into it a starting point and a velocity that varies over time. The model essentially mimics the activity of an animal roaming through a space, and calculates updated positions as it moves. As the model performs the task, the activity patterns of different units within the network can be measured. Each unit’s activity can be represented as a firing pattern, similar to the firing patterns of neurons in the brain.

In several previous studies, researchers have reported that their models produced units with activity patterns that closely mimic the firing patterns of grid cells. These studies concluded that grid-cell-like representations would naturally emerge in any neural network trained to perform the path integration task.

However, the MIT researchers found very different results. In an analysis of more than 11,000 neural networks that they trained on path integration, they found that while nearly 90 percent of them learned the task successfully, only about 10 percent of those networks generated activity patterns that could be classified as grid-cell-like. That includes networks in which even only a single unit achieved a high grid score.

The earlier studies were more likely to generate grid-cell-like activity only because of the constraints that researchers build into those models, according to the MIT team.

“Earlier studies have presented this story that if you train networks to path integrate, you're going to get grid cells. What we found is that instead, you have to make this long sequence of choices of parameters, which we know are inconsistent with the biology, and then in a small sliver of those parameters, you will get the desired result,” Schaeffer says.

More biological models

One of the constraints found in earlier studies is that the researchers required the model to convert velocity into a unique position, reported by one network unit that corresponds to a place cell. For this to happen, the researchers also required that each place cell correspond to only one location, which is not how biological place cells work: Studies have shown that place cells in the hippocampus can respond to up to 20 different locations, not just one.

When the MIT team adjusted the models so that place cells were more like biological place cells, the models were still able to perform the path integration task, but they no longer produced grid-cell-like activity. Grid-cell-like activity also disappeared when the researchers instructed the models to generate different types of location output, such as location on a grid with X and Y axes, or location as a distance and angle relative to a home point.

“If the only thing that you ask this network to do is path integrate, and you impose a set of very specific, not physiological requirements on the readout unit, then it's possible to obtain grid cells,” Fiete says. “But if you relax any of these aspects of this readout unit, that strongly degrades the ability of the network to produce grid cells. In fact, usually they don't, even though they still solve the path integration task.”

Therefore, if the researchers hadn’t already known of the existence of grid cells, and guided the model to produce them, it would be very unlikely for them to appear as a natural consequence of the model training.

The researchers say that their findings suggest that more caution is warranted when interpreting neural network models of the brain.

“When you use deep learning models, they can be a powerful tool, but one has to be very circumspect in interpreting them and in determining whether they are truly making de novo predictions, or even shedding light on what it is that the brain is optimizing,” Fiete says.

Kenneth Harris, a professor of quantitative neuroscience at University College London, says he hopes the new study will encourage neuroscientists to be more careful when stating what can be shown by analogies between neural networks and the brain.

“Neural networks can be a useful source of predictions. If you want to learn how the brain solves a computation, you can train a network to perform it, then test the hypothesis that the brain works the same way. Whether the hypothesis is confirmed or not, you will learn something,” says Harris, who was not involved in the study. “This paper shows that ‘postdiction’ is less powerful: Neural networks have many parameters, so getting them to replicate an existing result is not as surprising.”

When using these models to make predictions about how the brain works, it’s important to take into account realistic, known biological constraints when building the models, the MIT researchers say. They are now working on models of grid cells that they hope will generate more accurate predictions of how grid cells in the brain work.

“Deep learning models will give us insight about the brain, but only after you inject a lot of biological knowledge into the model,” Khona says. “If you use the correct constraints, then the models can give you a brain-like solution.”

The research was funded by the Office of Naval Research, the National Science Foundation, the Simons Foundation through the Simons Collaboration on the Global Brain, and the Howard Hughes Medical Institute through the Faculty Scholars Program. Mikail Khona was supported by the MathWorks Science Fellowship.

Share this news article on:

Related links.

  • McGovern Institute
  • Department of Brain and Cognitive Sciences

Related Topics

  • Brain and cognitive sciences
  • Artificial intelligence
  • National Science Foundation (NSF)
  • Neuroscience

Related Articles

Ila Fiete, a professor of brain and cognitive sciences at MIT, uses computational and mathematical techniques to study how the brain encodes information in ways that enable cognitive tasks such as learning, memory, and neural representation of our surroundings.

Ila Fiete studies how the brain performs complex computations

a black box opens

Unpacking black-box models

case study on neural networks

Explained: Neural networks

Previous item Next item

More MIT News

Six cows on pasture during sunny day

Featured video: Moooving the needle on methane

Read full story →

Olivia Rosenstein stands with arms folded in front of a large piece of lab equipment

The many-body dynamics of cold atoms and cross-country running

Heather Paxson leans on a railing and smiles for the camera.

Heather Paxson named associate dean for faculty of the School of Humanities, Arts, and Social Sciences

David Barber stands in an MIT office and holds an AED device up to the camera. The device’s touchscreen display is illuminated.

Preparing MIT’s campus for cardiac emergencies

Emma Bullock smiles while near the back of a boat and wearing waterproof gear, with the ocean and sky in background.

Researching extreme environments

A person plays chess. A techy overlay says “AI.”

To build a better AI helper, start by modeling the irrational behavior of humans

  • More news on MIT News homepage →

Massachusetts Institute of Technology 77 Massachusetts Avenue, Cambridge, MA, USA

  • Map (opens in new window)
  • Events (opens in new window)
  • People (opens in new window)
  • Careers (opens in new window)
  • Accessibility
  • Social Media Hub
  • MIT on Facebook
  • MIT on YouTube
  • MIT on Instagram

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Springer Nature - PMC COVID-19 Collection

Logo of phenaturepg

Explaining the Neural Network: A Case Study to Model the Incidence of Cervical Cancer

Paulo j. g. lisboa.

Department of Applied Mathematics, Liverpool John Moores University, Liverpool, L3 3AF UK

Sandra Ortega-Martorell

Neural networks are frequently applied to medical data. We describe how complex and imbalanced data can be modelled with simple but accurate neural networks that are transparent to the user. In the case of a data set on cervical cancer with 753 observations excluding, missing values, and 32 covariates, with a prevalence of 73 cases (9.69%), we explain how model selection can be applied to the Multi-Layer Perceptron (MLP) by deriving a representation using a General Additive Neural Network.

The model achieves an AUROC of 0.621 CI [0.519,0.721] for predicting positive diagnosis with Schiller’s test. This is comparable with the performance obtained by a deep learning network with an AUROC of 0.667 [ 1 ]. Instead of using all covariates, the Partial Response Network (PRN) involves just 2 variables, namely the number of years on Hormonal Contraceptives and the number of years using IUD, in a fully explained model. This is consistent with an additive non-linear statistical approach, the Sparse Additive Model [ 2 ] which estimates non-linear components in a logistic regression classifier using the backfitting algorithm applied to an ANOVA functional expansion.

This paper shows how the PRN, applied to a challenging classification task, can provide insights into the influential variables, in this case correlated with incidence of cervical cancer, so reducing the number of unnecessary variables to be collected for screening. It does so by exploiting the efficiency of sparse statistical models to select features from an ANOVA decomposition of the MLP, in the process deriving a fully interpretable model.

Introduction

This paper is about explainable neural networks, illustrated by an application of a challenging data set on cervical cancer screening that is available in the UCI repository [ 3 ]. The purpose of the paper is to describe a case study of the interpretation of a neural network by exploiting the same ANOVA decomposition that has been used in statistics to infer sparse non-linear functions for probabilistic classifiers [ 2 ].

We will show how a shallow network, the Multi-Layer Perceptron (MLP) can be fully explained by formulating it as a General Additive Neural Network (GANN). This methodology has a long history [ 4 ]. However, to our knowledge there is no method to derive the GANN from data, rather a model structure needs to be assumed or hypothesized from experimental data analysis. In this paper we use a mechanistic model to construct the GANN and show that, for tabular data i.e. high-level features that are typical of applications to medical decision support, a transparent and parsimonious model can be obtained, whose predictive performance comparable i.e. well within the confidence interval for the AUROC, with that obtained an alternative, opaque, deep learning neural network applied to the same data set [ 1 ].

Fairness, Accountability Transparency and Ethics (FATE) in AI [ 5 ] is emerging as a priority research area that relates to the importance of human-centered as a key enabler for practical application in risk-related domains such as clinical practice. Blind spots and bias in models e.g. due to artifacts and spurious correlations hidden in observational data, can undermine the generality of data driven models when they are used to predict for real-world data and this may have legal implications [ 6 ].

There different approaches that may be taken to interpret neural networks, in particular. These include derivation of rules to unravel the inner structure of deep learning neural networks [ 7 ] and saliency methods [ 8 ] to determine the image elements to which the network prediction is most sensitive.

An additional aspect of data modelling that is currently very much understudied is the assessment of the quality of the data. Generative Adversarial Networks have been used to quantify sample quality [ 9 ].

Arguably the most generic approach machine explanation is the attribution of feature influence with additive models. A unified framework for this class of models has been articulated [ 10 ]. This includes as a special case the approach of Local Interpretable Model Agnostic Explanations (LIME) [ 11 ].

However, it is acknowledged in [ 10 ] that General Additive Models (GAMs) are the most interpretable because the model is itself the interpretation, and this applies to data at a global level, not just locally.

Recently there has been a resurgence of interest in GAMs [ 11 , 12 ] in particular through implementations as GANNs. These models sit firmly at the interface between computational intelligence and traditional statistics, since they permit rigorous computation of relevant statistical measures such as odds ratios for the influence of specific effects [ 12 ].

A previously proposed framework for the construction of GANNs from MLPs will be applied to carry out model selection and so derive the form of the GANN from a trained MLP. This takes the form of a Partial Response Network (PRN) whose classification performance on multiple benchmarking data sets matches that of deep learning but with much sparser and directly interpretable features [ 13 ].

This paper reports a specific case study of the application of PRN to demonstrate how it can interpret the MLP as a GAM, providing complete transparency about the use of the data by the model, without compromising model accuracy as represented by the confidence interval of the AUROC. Our results are compared with those from a state-of-the-art feature selection method for non-linear classification [ 2 ].

Moreover, the model selection process itself will generate insights about the structure of the data, illustrating the value of this approach for knowledge discovery in databases (KDD).

Data Description

Data collection.

Cervical cancer is a significant cause of mortality among women both in developed and developing countries world-wide [ 1 ]. It is unusual among cancers for being closely associated with contracting the Human Papillomavirus (HPV) [ 14 ] which is strongly influenced by sexual activity. This makes cervical cancer one of the most avoidable cancers, through lifestyle factors and by vaccination.

Screening for possible incidence of the cancer is a public health priority, with potential for low-cost screening to be effective. The data set used in this study was acquired for this purpose.

The data were collected from women who attended the Hospital Universitario de Caracas in Caracas [ 3 ]. Most of the patients belong to the lowest socioeconomic status, which comprises the population at highest risk. They are all sexually active. Clinical screening includes cytology, a colposcopic assessment with acetic acid and the Schiller test (Lugol’s iodine solution). This is the most prevalent diagnostic index and is the choice for the present study.

Data Pre-processing

The data comprise records from a random sample of patients presenting between 2012 and 2013 (n = 858) [ 1 , 3 ]. There is a wide age range and a broad set of indicators of sexual activity, several of which overlap in what they measure. Four target variables are reported, including the binary outcome of Schiller’s test.

This data set is challenging, first because of severe class imbalance, which is typical in many medical diagnostic applications. The number of positive outcomes in the initial data sample is just 74 cases for Schiller’s test, 44 for a standard cytology test and 35 for Hinselmann’s test.

Secondly, the data include a range of self-reported behavioural characteristics, where noise levels may be significant. Third, some of the variables were problematic for data analysis. The report of STD: cervical condylomatosis comprises all zero values. STD: vaginal condylomatosis, pelvic inflammatory disease, genital herpes, molluscum contagiosum, AIDS, HIV, Hepatitis B, syphilis and HPV are all populated in <2.5% of all cases. For this reason, these variables were removed from the study as they are unlikely to provide statistical significance in predictive modelling and their low prevalence can cause numerical instabilities for model optimisation.

The number of pregnancies was deemed to be less informative about sexual behaviour than the number of sexual partners, so this was also excluded.

In total 105 rows of data had 20 or more of the 32 covariate values missing. While these values can be imputed, such a large proportion of covariates for individual observations can bias the study, since missingness can be informative. For this reason, these rows were removed from the data.

Among the selected variables, several pairs of covariates measure the same indicator in binary form and as an ordinal count. This applies to variables Smokes, Hormonal Contraceptives, IUD and STDs. Consequently, the initial pool of covariates in this study comprises 9 variables. They are:

  • Number of sexual partners;
  • Age of first sexual intercourse;
  • Years since first sexual intercourse, derived by subtracting the previous covariate from Age;
  • Number of years smoking;
  • Number of years taking Hormonal Contraceptives
  • Number of years using IUDs;
  • STD: condylomatosis;
  • Number of STDs;
  • Number of diagnosed STDs.

The dataset used in this study is a reduced cohort (n = 753) with marginal values summarized in Table  1 . The prevalence of missing data in the study sample is now much reduced, especially as the number of pregnancies is not used. The maximum proportion of missing is 4.1% for IUD (years).

Table 1.

Summary statistics of the sample population for Cervical Cancer screening. {} indicates a binary variable. [] shows the range of the variable.

Missing values were imputed with the sample median. The reason for this is that the standardisation used in the following section maps the median value of every covariate to zero, which has the effect of discarding that instance from the gradient descent weight updates, so minimising the impact of unknown information in the training of the MLP.

Partial Response Network Methodology

In binary classification, GAMs model the statistical link function appropriate for a Bernoulli error distribution. This is the logit, hence the inverse of the familiar sigmoid function. An appropriate objective function is the equally familiar log-likelihood cost.

In order to control for overfitting of the original MLP, we apply regularisation using Automatic Relevance Determination [ 15 ]. This model evaluates the strength of weight decay using a Bayesian estimator, which enables a different weight decay parameter to be used for the fan-out weights linked to each input node. This results in soft model selection, that is to say a modulation of the weight values that compresses towards zero the weights linked to the less informative input variables.

Input variables are divided by the standard deviation and shifted by the median value, so that the median is represented by zero. This is important because in a Taylor expansion of the logit function about the median values, setting an individual variable to the median causes all of the terms involving that variable in the Taylor expansion to vanish. It is then possible to capture much of the most significant terms by systematically setting all bar one covariate to zero, then all but each pair of covariates to zero, and so on.

The MLP response when all but a few variables are zero is called the Partial Response and the GANN obtained by mapping the partial responses onto its weights, forms the Partial Response Network (PRN) [ 13 ].

The functional form of the PRN is given by the well-known statistical decomposition of multivariate effects into components with fewer variables, represented by the ANOVA functional model [ 2 ] shown in Eq. ( 1 ):

equation M1

where the partial responses φ k (•) are evaluated with all variables held fixed at zero except for one or two indexed as follows:

equation M2

The derivation of the PRN proceeds as follows:

  • Train an MLP for binary classification;
  • Obtained the univariate and bivariate partial responses in Eqs. ( 2 )–( 4 ).
  • Apply the Lasso to the partial responses;

An external file that holds a picture, illustration, etc.
Object name is 497527_1_En_43_Fig1_HTML.jpg

Representation of the Partial Response Network as General Additive Neural Network (GANN). The weight values are derived from a trained MLP and re-calibrated by further training of the network as a GANN.

  • Re-train the resulting multi-layer network.

The mapping of the partial responses onto the GANN requires matching the weights and bias terms as follows:

equation M5

The main limitation of the model as currently used is that it is restricted to univariate effects and bivariate interactions. However, in many medical applications, this is likely to suffice. The method can be extended to higher order interactions but it will generate a combinatorially large number of partial responses.

Experimental Results

This section explains how model selection took place and describes the models obtained with the PRN applied to the Cervical Cancer screening data set described in Sect.  2 . The variables used in the model are the subset of Table  1 that is listed in 2.2 and the target variable is the outcome of Schiller’s diagnostic test for cervical cancer.

Given the low prevalence of positive outcome, 73 out of the 753 cases retained (prevalence = 9.69%) the results presented are all for out-of-sample data using 2-fold cross validation. This choice of number of folds is motivated by the need to retain a meaningful number of events in each fold.

Model selection consisted of an iterative process of removing the least frequently occurring variable or set of variables at each stage in the process. Table  2 shows the frequency of occurrence of each covariate in the partial responses selected by the PRN. It also shows the average AUROC for 10 random starts.

Table 2.

equation M9

In contrast, neural network models are not convex and so require multiple estimation. By interpreting the MLP in the form of a GAM with sparse features, the PRN model considerably reduces the variability in classification performance that is typical of the MLP, providing more consistent results.

However, correlations between variables can result in multiple models with very similar predictive power. This is the case for the present data set.

The SAM identified {#Years sexual intercourse; Smokes (years); STDs} for fold 1 and {Hormonal Contraceptives (years); IUD (years); STDs: condylomatosis; STDs} for fold 2 as univariate models; {STDs} for fold 1 and {IUD (years); STDs: condylomatosis; STDs; Number of sexual partners*IUD (years); #Years sexual intercourse*Hormonal Contraceptives (years); STDs: condylomatosis*STDs} when interaction terms were included.

The AUROCs for SAM in 2-fold cross validation are 0.599 and 0.565, respectively.

The variable subsets extracted with model selection using the PRN model are all consistent with the previously cited work on this data set, and indeed with cervical screening literature.

The iterative process for feature selection applied in the previous section made use of the variability of the MLP under random starts to explore the space of predictive features in the presence of correlated variables. This enable the identification of stable features that could be applied for both folds to build a model with a consistent explanation. These two features are Hormonal Contraceptives (years) and IUD (years).

It cannot be claimed that these are the only predictive variables or indeed the best. However, they are a representative subset that achieves a high predictive model with parsimony, as can be seen from both the size of the derived feature set and high AUROC compared with the SAM.

Equally of interest is the shape of the partial responses and their stability under 2-fold cross validation, shown in Figs.  2 , ​ ,3, 3 , ​ ,4 4 and ​ and5 5 .

An external file that holds a picture, illustration, etc.
Object name is 497527_1_En_43_Fig2_HTML.jpg

Two univariate responses identified in the first fold. The abscissa measures the contribution of the individual covariate to the logit response. The histogram represents the empirical distribution of the covariate across the study population. The curves show the response derived from the initial MLP and after re-training with the PRN.

An external file that holds a picture, illustration, etc.
Object name is 497527_1_En_43_Fig3_HTML.jpg

Bivariate response found to be significant in the first fold of the data. The response is shown as a heat map and as a 3-d surface.

An external file that holds a picture, illustration, etc.
Object name is 497527_1_En_43_Fig4_HTML.jpg

Two univariate responses identified in the second fold, as in Fig.  2 .

An external file that holds a picture, illustration, etc.
Object name is 497527_1_En_43_Fig5_HTML.jpg

Bivariate response found to be significant in the second fold of the data, as in Fig.  3 .

The partial responses are remarkably consistent given the challenges posed by the low prevalence and high noise in the data. Differences are apparent in areas of low data density, which is to be expected. Further work will involve quantifying the uncertainty about these estimates.

The initial pool of 9 variables contains redundant information. This causes instability in neural network models, as several different models will capture information with similar predictive value. However, an iterative approach to feature selection can produce a stable sparse model.

It is perhaps remarkable how the same predictive information is contained in a small number of covariates compared with the size of the original pool. Bearing in mind that the typical standard deviation of the AUROC is 0.05, making the 95% confidence interval 0.10, the AUROC values for all models listed in Table  2 are comparable. Indeed, the average performance for ten random starts equals that of the best cross-validated model, 0.621 CI [0.519,0.721]. The overall performance figure is also consistent with the deep learning models in [ 1 ] and with a statistical approach to non-linear classification with an ANOVA decomposition, the SAM [ 2 ].

The main conclusion of this paper is that it is possible to break the black box that is the standard MLP, using it to derive a more interpretable structure as a GANN. Using partial responses is a common way to interpret non-linear statistical models. Here, it is shown that the responses can themselves be used directly in modelling, with little or no compromise in predictive performance.

The result is a small model that explains a large and complex data set in terms of variable dependencies that clinicians can understand and integrate into their reasoning models. Iterative modeling is necessary because of the inherent redundancy in the data set, but the sequence of models obtained is itself informative about the association with outcome for individual and pairs of covariates.

Ultimately, the PRN model shows that it is possible to be sure that the model is right for the right reasons. Moreover, the covariate dependencies provide the ability to diagnose flaws in the data, whether because of sampling bias or artifacts in observational cohorts.

It is concluded that the PRN approach can add significant insight and modelling value to the analysis of tabular data in general, and in particular medical data.

Contributor Information

Marie-Jeanne Lesot, Email: [email protected] .

Susana Vieira, Email: [email protected] .

Marek Z. Reformat, Email: [email protected] .

João Paulo Carvalho, Email: [email protected] .

Anna Wilbik, Email: [email protected] .

Bernadette Bouchon-Meunier, Email: [email protected] .

Ronald R. Yager, Email: moc.xinap@regay .

Help | Advanced Search

Statistics > Machine Learning

Title: cognitive psychology for deep neural networks: a shape bias case study.

Abstract: Deep neural networks (DNNs) have achieved unprecedented performance on a wide range of complex tasks, rapidly outpacing our understanding of the nature of their solutions. This has caused a recent surge of interest in methods for rendering modern neural systems more interpretable. In this work, we propose to address the interpretability problem in modern DNNs using the rich history of problem descriptions, theories and experimental methods developed by cognitive psychologists to study the human mind. To explore the potential value of these tools, we chose a well-established analysis from developmental psychology that explains how children learn word labels for objects, and applied that analysis to DNNs. Using datasets of stimuli inspired by the original cognitive psychology experiments, we find that state-of-the-art one shot learning models trained on ImageNet exhibit a similar bias to that observed in humans: they prefer to categorize objects according to shape rather than color. The magnitude of this shape bias varies greatly among architecturally identical, but differently seeded models, and even fluctuates within seeds throughout training, despite nearly equivalent classification performance. These results demonstrate the capability of tools from cognitive psychology for exposing hidden computational properties of DNNs, while concurrently providing us with a computational model for human word learning.

Submission history

Access paper:.

  • Other Formats

References & Citations

  • Google Scholar
  • Semantic Scholar

2 blog links

Bibtex formatted citation.

BibSonomy logo

Bibliographic and Citation Tools

Code, data and media associated with this article, recommenders and search tools.

  • Institution

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs .

MIT Press

On the site

  • Computational Neuroscience Series

Case Studies in Neural Data Analysis

Case Studies in Neural Data Analysis

A Guide for the Practicing Neuroscientist

by Mark A. Kramer and Uri T. Eden

ISBN: 9780262529372

Pub date: November 4, 2016

  • Publisher: The MIT Press

384 pp. , 7 x 9 in , 137 color illus.

eTextbook rental

  • 9780262529372
  • Published: November 2016
  • MIT Press Bookstore
  • Penguin Random House
  • Barnes and Noble
  • Bookshop.org
  • Books a Million

Other Retailers:

  • Amazon.co.uk
  • Waterstones
  • Description

A practical guide to neural data analysis techniques that presents sample datasets and hands-on methods for analyzing the data.

As neural data becomes increasingly complex, neuroscientists now require skills in computer programming, statistics, and data analysis. This book teaches practical neural data analysis techniques by presenting example datasets and developing techniques and tools for analyzing them. Each chapter begins with a specific example of neural data, which motivates mathematical and statistical analysis methods that are then applied to the data. This practical, hands-on approach is unique among data analysis textbooks and guides, and equips the reader with the tools necessary for real-world neural data analysis.

The book begins with an introduction to MATLAB, the most common programming platform in neuroscience, which is used in the book. (Readers familiar with MATLAB can skip this chapter and might decide to focus on data type or method type.) The book goes on to cover neural field data and spike train data, spectral analysis, generalized linear models, coherence, and cross-frequency coupling. Each chapter offers a stand-alone case study that can be used separately as part of a targeted investigation. The book includes some mathematical discussion but does not focus on mathematical or statistical theory, emphasizing the practical instead. References are included for readers who want to explore the theoretical more deeply. The data and accompanying MATLAB code are freely available on the authors' website. The book can be used for upper-level undergraduate or graduate courses or as a professional reference.

A version of this textbook with all of the examples in Python is available at https://mark-kramer.github.io/Case-Studies-Python/

Mark A. Kramer is Associate Professor in the Department of Mathematics and Statistics at Boston University.

Uri T. Eden is Associate Professor in the Department of Mathematics and Statistics at Boston University.

The advancement of large-scale recording techniques have created a deluge of data that needs to be appropriately analyzed and interpreted. This volume comes at the right time and will certainly be a desk-companion for many neurophysiologists trying to make sense of complex data. György Buzsáki, Biggs Professor of Neuroscience, NYU; author of Rhythms of the Brain
No one knows what the future of neuroscience will bring, but it seems certain that ever increasing amounts of complex data will be one of its hallmarks. Kramer and Eden address an important need by offering an accessible, systematic hands-on approach to data analysis. This unique and invaluable book will be greatly appreciated by everyone who faces the tough analytic challenges of modern neuroscience. Olaf Sporns, Distinguished Professor, Indiana University; author of Networks of the Brain and Discovering the Human Connectome
Case Studies in Neural Data Analysis by Mark Kramer and Uri Eden is a significant contribution to the neuroscience and statistics literatures. By combining actual data analysis problems with the essential statistics and mathematics, Kramer and Eden take the experimental neuroscientist from having no MATLAB programming experience to being able to apply in a principled manner the most commonly used neuroscience data analysis methods. The book's clear pedagogical format makes it readily accessible to undergraduates, graduate students, postdoctoral fellows, and principal investigators. Case Studies in Neural Data Analysis is a must-read for experimental neuroscientists as well as for anyone outside of neuroscience (statisticians, physicists, computer scientists, and engineers) wishing to learn about neuroscience data analysis problems and methods. Emery N. Brown, Edward Hood Taplin Professor of Medical Engineering, Institute for Medical Engineering and Science, MIT; Professor of Computational Neuroscience, Picower Institute for Learning and Memory, MIT; Department of Brain and Cognitive Sciences, MIT

Additional Material

Code and Data for the Book

Python Version

Related Books

The Science of Sadness

  • Professional Services
  • Retail & Consumer
  • AI News Service
  • AI Advertising
  • AI Research Services

Algorithm-X LAB - The Business of Artificial Intelligence

10 Use Cases of Neural Networks in Business

Use Cases of Neural Networks

One of the key parts of cutting edge AI technology, Artificial Neural Networks (ANNs) are becoming too important and commonplace to ignore.

However, Artificial Neural Networks and the role that they play can be a difficult concept to understand.

In this article, we’ll explain exactly what Artificial Neural Network is and how they work.

To illustrate their importance we’ll also show you some examples of how Artificial Neural Networks are already transforming businesses.

Table of Contents

What are Neural Networks?

Neural networks are a set of algorithms, they are designed to mimic the human brain, that is designed to recognize patterns. They interpret data through a form of machine perception by labeling or clustering raw input data.

Let’s take a moment to consider the human brain. Made up of a network of neurons, the brain is a very complex structure.

It’s capable of quickly assessing and understanding the context of numerous different situations. Computers struggle to react to situations in a similar way. Artificial Neural Networks are a way of overcoming this limitation.

First developed in the 1940s Artificial Neural Networks attempt to simulate the way the brain operates.

Sometimes called perceptrons, an Artificial Neural Network is a hardware or software system.

Some networks are a combination of the two.

Consisting of a network of layers this system is patterned to replicate the way the neurons in the brain operate.

The network comprises an input layer, where data is entered, and an output layer.

The output layer is where processed information is presented.

Connecting the two is a hidden layer or layers.

The hidden layers consist of units that transform input data into useful information for the output layer to present.

In addition to replicating the human decision making progress Artificial Neural Networks allow computers to learn.

Their structure also allows ANN’s to reliably and quickly identify patterns that are too complex for humans to identify.

Artificial Neural Networks also allow us to classify and cluster large amounts of data quickly.

Applications in Deep Learning and Artificial Intelligence

Artificial neural networks are a form of deep learning .

They are also one of the main tools used in machine learning .

Consequently ANN’s play an increasingly important role in the development of artificial intelligence.

The rise in importance of Artificial Neural Network’s is due to the development of “ backpropagation ”.

This technique allows the system’s hidden layers to become versatile.

Adapting to situations where the outcome doesn’t match the one originally intended.

The development of deep learning neural networks has also helped in the development of Artificial Neural Networks.

Deep learning neural networks are networks made up of multiple layers.

This allows the system to become more versatile.

Different layers are able to analyse and extract different features.

This process allows the system to identify new data or images.

It also allows for unsupervised learning and more complex tasks to be undertaken.

MORE –  Computer Vision Applications in 10 Industries

How do Artificial Neural Networks Work?

As we have seen Artificial Neural Networks are made up of a number of different layers.

Each layer houses artificial neurons called units.

These artificial neurons allow the layers to process, categorize, and sort information.

Alongside the layers are processing nodes.

Each node has its own specific piece of knowledge.

This knowledge includes the rules that the system was originally programmed with.

It also includes any rules the system has learned for itself.

This makeup allows the network to learn and react to both structured and unstructured information and data sets.

Almost all artificial neural networks are fully connected throughout these layers.

Each connection is weighted.

The heavier the weight, or the higher the number, the greater the influence that the unit has on another unit.

The first layer is the input layer.

This takes on the information in various forms.

This information then progresses through the hidden layers where it is analysed and processed.

By processing data in this way, the network learns more and more about the information.

Eventually, the data reaches the end of the network, the output layer.

Here the network works out how to respond to the input data.

This response is based on the information it has learned throughout the process.

Here the processing nodes allow the information to be presented in a useful way.

Educating Artificial Neural Networks

For artificial neural networks to learn they require a mass of information.

This information is known as a training set.

If you wanted to teach your ANN to learn how to recognise a cat your training set would consist of thousands of images of a cat.

These images would all be tagged “cat”.

Once this information has been inputted and analysed the network is considered trained.

From now on it will try to classify any future data based on what it thinks it is seeing.

So if you present it with a new image of a cat, it will identify the creature.

As a check, during the training period, the system’s output is matched against the description of the data it’s analysing.

If the information is the same, the learning process is validated.

If the information is different backpropagation is used to adjust the learning process.

Backpropagation involves working back through the layers, adjusting the set mathematical equations and parameters.

These adjustments are made until the output data presents the desired result.

This process, deep learning, is what makes the network adaptive.

The network is able to learn and adapt as more information is processed.

What are Artificial Neural Networks Used for?

Artificial Neural Networks can be used in a number of ways.

They can classify information, cluster data, or predict outcomes.

ANN’s can be used for a range of tasks.

These include analyzing data, transcribing speech into text, powering facial recognition software , or predicting the weather.

There are many types of Artificial Neural Network.

Each has its own specific use.

Depending on the task it is required to process the ANN can be simple or very complex.

The most basic type of Artificial Neural Network is a feedforward neural network .

This is a basic system where information can travel in only one direction, from input to output.

Different Types of Neural Networks

The most commonly used type of Artificial Neural Network is the recurrent neural network .

In this system, data can flow in multiple directions.

As a result, these networks have greater learning ability.

Consequently, they are used to carry out complex tasks such as language recognition.

Other types of Artificial Neural Networks include convolutional neural networks , Hopfield networks, and Boltzmann machine networks .

Each network is capable of carrying out a specific task.

The data you want to enter, and the application you have in mind, affect which system you use.

Complex tasks such as voice recognition may require more than one type of ANN.

Now that we’ve established what Artificial Neural Networks are here are 10 examples of how they are currently being applied.

Neural Networks are Improving Marketing Strategies

Artificial Neural Networks are Improving Marketing Strategies

By adopting Artificial Neural Networks businesses are able to optimise their marketing strategy .

Systems powered by Artificial Neural Networks all capable of processing masses of information.

This includes customers personal details, shopping patterns as well as any other information relevant to your business.

Once processed this information can be sorted and presented in a useful and accessible way. This is generally known as market segmentation.

To put it another way segmentation of customers allows businesses to target their marketing strategies.

Businesses can identify and target customers most likely to purchase a specific service or produce.

This focusing of marketing campaigns means that time and expense isn’t wasted advertising to customers who are unlikely to engage.

This application of Artificial Neural Networks can save businesses both time and money.

It can also help to increase profits.

case study on neural networks

The flexibility of Artificial Neural Networks means that their marketing applications can be implemented by most businesses.

Artificial Neural Networks can segment customers on multiple characteristics.

These characteristics can be as diverse as location, age, economic status, purchasing patterns and anything else relevant to your business.

One company making the most of this flexibility is cosmetics brand Sephora .

The email marketing campaign is tailored to the interests of each customer on the mailing list.

This allows them to offer a seamless, targeted marketing campaign.

This approach means that at a time when many companies are struggling Sephora is flourishing .

Starbucks has used Artificial Neural Networks in it's highly marketing campaigns

Developing Targeted Marketing Campaigns

Through unsupervised learning, Artificial Neural Networks are able to identify customers with a similar characteristic.

This allows businesses to group together customers with similarities, such as economic status or preferring vinyl records to downloaded music.

Supervised learning systems allow Artificial Neural Networks to set out a clear aim for your marketing strategy.

Like unsupervised systems, they can also segment customers into similar groupings.

However supervised learning systems are also able to match customer groupings to the products they are most likely to buy.

This application of technology can increase profits by driving sales.

Starbucks has used Artificial Neural Networks and targeted marketing to keep customers engaged with their app.

The company has integrated its rewards system location and purchase history on their app.

This allows them to offer an incredibly personalised experience, helping to increase revenue by $2.56 billion.

Reducing Email Fatigue and Improving Conversion Rates

By only advertising relevant products to interested customers, you also reduce the chances of customers developing email fatigue.

In short, if your advertisements are relevant and interesting customers are more likely to interact.

This drives visits to your website, potentially increasing sales, and helps you to build a strong client-business relationship.

According to dragon360.com 61% of customers say that they are most likely to use companies that send them targeted content .

Applying Artificial Neural Networks in your marketing strategy can save your company both time and money.

By streamlining your marketing approach in this way you will only be targeting the customers most likely to purchase your product.

This streamlined approach of targeting the people most likely to engage can help to increase sales and profits.

Many companies who have adopted targeted or personalised marketing strategies have noticed clear, positive results.

For example, stationery retailers Paperstyle segmented their subscribers into two different groups.

Each group then received targeted emails.

Consequently, the business reported an open rate increase of 244%.

The traffic driven from emails to the website also increased by 161%.

These statistics show that personalised marketing campaigns can deliver real results, benefiting businesses.

Improving Search Engine Functionality

During 2015 Google I/O keynote address in San Francisco, Google revealed they were working on improving their search engine.

These improvements are powered by a 30 layer deep Artificial Neural Network.

This depth of layers, Google believes, allows the search engine to process complicated searches such as shapes and colours.

Using an Artificial Neural Network allows the system to constantly learn and improve.

This allows Google to constantly improve its search engine.

Within a few months, Google was already noticing improvements in search results.

The company reported that its error rate had dropped from 23% down to just 8% .

Google’s application shows that neural networks can help to improve search engine functionality.

Similar Artificial Neural Networks can be applied to the search feature on many e-commerce websites.

This means that many companies can improve their website search engine functionality.

This allows customers with only a vague idea of what they want to easily find the perfect item.

Amazon has reported sales increases of 29% following improvements to its recommendation systems.

MORE –  Will Google AI Ever Rule the World?

IBM Watson - Neural Networks play a crucial role

Applications of neural networks in the pharmaceutical industry

Artificial Neural Networks are being used by the pharmaceutical industry in a number of ways.

The most obvious application is in the field of disease identification and diagnosis.

It was reported in 2015 that in America 800 possible cancer treatments were in the trial.

With so much data being produced, Artificial Neural Networks are being used to help scientists efficiently analyse and interpret it.

The IBM Watson Genomics is one example of smart solutions being used to process large amounts of data.

IBM Watson Genomics is improving precision medicine by integrating genomic tumour sequencing with cognitive computing.

With a similar aim in mind, Google has developed DeepMind Health.

Working alongside a number of medical specialists such as Moorfields Eye Hospital , the company is looking to develop a cure for macular degeneration.

IBM Watson Genomics Clinical Trial Overview

Developing Personalised Treatment Plans

A personalised treatment plan can be more effective than adopting a standardised approach.

Artificial Neural Networks and supervised learning tools are allowing healthcare professionals to predict how patients may react to treatments based on genetic information.

The IBM Watson Oncology is leading this approach.

It is able to analyse the medical history of a patient as well as their current state of health.

This information is processed and compared to treatment options, allowing physicians to select the most effective.

MIT’s Clinical Machine Learning Group is advancing precision medicine research with the use of neural networks and algorithms.

The aim is to allow medical professionals to get a better understanding of how disease forms and operates.

This information can help to design an effective treatment.

The team at MIT are currently working on possible treatment plans for sufferers of Type 2 Diabetes.

Meanwhile, the Knight Cancer Institute and Microsoft’s Project Hanover is using networks and machine learning tools to develop precision treatments.

In particular, they are focusing on treatments for Acute Myeloid Leukemia.

Vast amounts of information and data are required to progress precision medicine and personalised treatments.

Artificial Neural Networks and machine learning tools are able to quickly and accurately analyse and present data in a useful way.

This ability makes it the perfect tool for this form of research and development.

READ MORE Artificial Intelligence in Medicine -Top 10 Applications

READ MORE  How Machine Learning Is Shaping Precision Medicine

Neural Networks in the Retail Sector

As we have noted, Artificial Neural Networks are versatile systems, capable of dealing reliably with a number of different factors.

This ability to handle a number of variables makes Artificial Neural Networks an ideal choice for the retail sector.

For instance, Artificial Neural Networks are, when given the right information, able to make accurate forecasts.

These forecasts are often more accurate than those made in the traditional manner, by analysing statistics.

This can allow accurate sales forecasts to be generated.

In turn, this information allows your businesses to purchase the right amount of stock.

This reduces the chances of selling out of certain items.

It also reduces the risk of valuable warehouse space being taken up by products you are unable to sell.

Ocado Smart warehouses utilises a host of AI technology including neural networks

Online grocers Ocado are making the most of this technology.

Their smart warehouses rely on robots to do everything from stock management to fulfilling customer orders.

This information is used to power the trend of dynamic pricing.

Many companies, such as Amazon , use dynamic pricing to reproduced and increase revenue.

This application has spread beyond retail, service providers, such as Uber , even use this information to adjust prices depending on the customer.

Many retail organisations, such as Walmart, use Artificial Neural Networks to predict future product demand .

The network models analyse location, historical data sets, as well as weather forecasts, models and other pieces of relevant information.

This is used to predict an increase in sales of umbrellas or snow clearing products.

By predicting a potential rise in demand the company is able to increase stock in store.

This means that customers won’t leave empty-handed and also allows Walmart to offer product-related offers and incentives.

Walmart using artificial neural networks to predict future product demand.

LEARN MORE –  10 Powerful Applications of Artificial Intelligence in Retail

Applications to Encourage Repeat Custom

As well as monitoring and suggesting purchases, Artificial Neural Network systems also allow you to analyse the time between purchases.

This application is most useful when monitoring individual customer habits.

For example, a customer may buy new ink cartridges every 2 months.

Systems powered by Artificial Neural Networks can identify and monitor this repeat custom.

You can then contact your customer and remind them to buy when the time to purchase the product approaches.

This friendly reminder increases the chances of the customer returning to your store to make their purchase.

Retailers that offer loyalty schemes are already taking advantage of this.

Beauty brand Sephora’s Beauty Insider program records every purchase a customer makes.

It also records how frequently these purchases are made.

This information allows the company to predict when a customer’s products may be running low.

At this point the company sends a “restock your stash” email, prompting the customer to make a repeat purchase.

This information can also be used to develop a personalised marketing approach offering incentives or discounts.

FedEx can predict which customers are likely to leave with an accuracy of 60-90% by using neural networks

Keeping Customers Loyal to Your Company

Artificial Neural Networks can also identify customers likely to switch to a competitor.

By knowing which customers are most likely to defect you are able to target them with tailored marketing campaigns.

Offering incentives, or friendly reminders about your company, will encourage customers to stick around.

This predictive use of Artificial Neural Networks is already benefiting FedEx.

Forbes reports that FedEx can predict which customers are likely to leave with an accuracy of 60-90%.

By applying Artificial Neural Networks in this way we can enhance and personalise the consumer’s experience.

Encouraging repeat custom and helping to build a relationship between your business and your customers.

Artificial Neural Networks in Financial Services

When it comes to AI banking and finance , Artificial Neural Networks are well suited to forecasting.

This suitability largely comes from their ability to quickly and accurately analyse large amounts of data.

Artificial Neural Networks are capable of processing and interpreting both structured and unstructured data.

After processing this information Artificial Neural Networks are also able to make accurate predictions.

The more information we can give the system, the more accurate the prediction will be.

 Ray Dalio, Founder & CEO of Bridgewater Associates is at the forefront of technology with neural networks

Forecasting Market Movements

Companies such as MJ Futures and Bridgewater are working towards fully realising the potential of networks in stock market forecasting .

Over a 2 year period, MJ Futures reported a 199.2% returns due to their use of neural network prediction methods.

LBS Capital Management has also reported positive results with a simplified neural network.

Their model uses 6 financial indicator inputs such as the average directional movement over the previous 18 days.

As networks become more advanced and are fed more detailed information, their predictions will only become more accurate.

READ MORE –  10 Applications of Machine Learning in Finance

Improving the way Banks Operate

The forecasting ability of Artificial Neural Networks is not just confined to the stock market and exchange rate situations.

This ability also has applications in other areas of the financial sector.

Mortgages, overdrafts and bank loans are all calculated after analysing an individual account holders statistical information.

Traditionally the software that analysed this information was driven by statistics.

Increasingly banks and financial providers are switching to software powered by Artificial Neural Networks.

This allows for a wider analysis of the applicant and their behaviour to be made.

Consequently, this means that the information presented to the bank or financial provider is more accurate and useful.

This allows the bank to make a better-informed decision that is more appropriate to both themselves and the applicant.

Forbes revealed that many mortgage lenders expect this application of systems powered by Artificial Neural Networks will boom in the next few years.

Neural networks systems to Identify Credit Risks - HSBC

Neural networks systems to Identify Credit Risks

HSBC is just one bank using Artificial Neural Networks to transform how loan and mortgage applications are processed.

The company uses neural networks to analyse customers with previous behaviour patterns.

This information can highlight personality traits that mark an applicant out as a credit risk.

Meanwhile, Natwest is developing a digital human chatbot called Cora .

Cora, currently, is only able to deal with simple requests.

However, as the technology develops it’s hoped that Cora will be able to help process mortgage and loan applications.

By applying Artificial Neural Networks, companies are able to provide a better service.

As well as reducing expense it means companies make fewer risky decisions, such as lending to credit risks.

This reduces potential losses and prevents people from running up debts they can’t afford.

READ MORE –  AI Revolution Disrupts Investment Banking

READ MORE –  HSBC’s Amy & Other AI Chatbots Will Change The Way we Bank

The Effects on Insurance Provision

Artificial Neural Networks have a number of different applications in the insurance industry.

Firstly, as in marketing applications, Artificial Neural Networks allow for segmentation of policyholders.

This grouping allows companies to determine and offer appropriate pricing plans.

Consequently applying Artificial Neural Networks allows for the correct level of provision to be offered.

It also allows for special offers to be made to encourage customers to renew policies.

Allianz adopted a system powered by Artificial Neural Networks.

Recently Allianz Travel Insurance adopted a system powered by Artificial Neural Networks.

Their systems analyse a number of factors such as trip length, cost, the traveller’s age if you are paying with air miles and reason for travel.

Allianz uses this information to identify the best product for the customer.

This not only ensures that the customer gets the most relevant coverage but it also reduces the time spent searching and researching.

This helpful application of Artificial Neural Networks takes away the worry and concern of planning a holiday.

Instead, it allows customers to focus on enjoying their trip.

READ MORE –  AI to Cut 90% of Office Work at Japanese Insurance Giant

READ MORE –  How Banks Can Start Using Machine Learning?

MasterCard is employing solutions powered by neural networks to reduce the chances of fraud

Fraud Detection Applications

As technology advances, and more importance is placed on online transactions, fraudsters are also becoming more sophisticated.

Luckily Artificial Neural Networks can help to keep us, and our finances, safe.

Deep learning and Artificial Neural Networks applications are powering systems capable of detecting all forms of financial fraud.

For example, this application can identify unusual activity, such as transactions occurring outside the established time frame.

Visa has used smart solutions to cut credit card fraud by two thirds .

Their sophisticated anti-fraud detection systems are working towards biometric solutions.

However the company also analyses information such as payment method, time, location, item purchased, and the amount spent.

Even a small deviation from the norm in any of these categories can highlight a potential fraud case.

Within seconds smart solutions allow Visa to look at over 500 data elements to determine if a transaction is suspicious.

Similarly, it can be embarrassing when our card is declined by a retailer.

Especially if our account is in credit.

MasterCard is employing solutions powered by Artificial Neural Networks to reduce the chances of this happening.

Currently, MasterCard has halved the chances of these errors from occurring.

READ MORE –  Mastercard Launches AI Express to Accelerate Adoption of AI from Businesses

READ MORE –  BBVA Teams up with MIT to Enhanced Machine Learning in Fraud Detection

Optimizing Store Layout

Artificial Neural Networks can also improve physical store layouts.

Their ability to quickly analyse and monitor stock levels allows companies to see which products are selling well and which aren’t.

Poorly performing products can then be placed on offer or moved to a more eye-catching position in the store.

These systems also allow companies to see which products are frequently purchased together.

Placing commonly purchased products close together encourages people buying one item to purchase the other.

You can then surround these products with other possible purchases.

Not only does this cut the waste of perishable products but it can also help to prevent a backlog building in the warehouse.

Fashion giants H&M are looking to these applications to transform their business model.

It’s been reported that the retailer is using Artificial Neural Networks to do everything from warehouse management to store layout.

The application in regards to store layout is particularly interesting.

H & M fashion retailer is using neural networks to do everything from warehouse management to store layout.

H&M are Abandoning the Generic Store Layout

Abandoning the traditional, one size fits all approach, H&M are using smart applications to tailor the product mix in their stores.

For example, the company’s store in the residential Östermalm area of Stockholm originally stocked basic products for men and children as well as women.

After analysing customer purchasing habits the company identified that the majority of the store’s clients were women.

Consequently, higher-priced items, as well as fashion items, sold far better than children’s or men’s products.

This information helped H&M to change the range of products on offer in the store.

As well as reducing the menswear range, they brought in crockery ranges, a flower stall and a coffee shop.

The Wall Street Journal reports that by going for a more high-end look the store has improved its appeal and sales.

While H&M say that this optimization has helped to increase profit margins they are yet to reveal any figures.

Facial Recognition Software

Technology companies have long been working toward developing reliable facial recognition software.

One company leading the way is Facebook.

For a number of years now they have been using the facial recognition technology to auto-tag uploaded photographs.

They have also developed DeepFace.

DeepFace is a form of facial recognition software-driven by Artificial Neural Networks.

It is capable of mapping 3D facial features.

Once the mapping is complete the software turns the information into a flat model.

The information is then filtered, highlighting distinctive facial elements.

To be able to do this DeepFace implements 120 million parameters.

This technology hasn’t just emerged overnight.

DeepFace has been trained with a pool of 4.4 million tagged faces.

These images were taken from 4,000 different Facebook accounts .

DeepFace powered by neural networks uses a 3-D model to rotate faces, virtually, so that they face the camera. Image (a) shows the original image, and (g) shows the final, corrected version. Image Source -MIT

During the training process, tests were carried out presenting the system with side-by-side images.

The system was then asked to identify if the images are of the same person.

In these tests, DeepFace returned an accuracy rating of 97.25% .

Human participants taking the same test scored, on average, 97.5%.

Facebook has also taken its software to computing and technology conferences .

This is done with the purpose of allowing academics and researchers to assess and inspect the technology.

With all this work it’s little wonder that DeepFace may be the most accurate facial technology software yet developed.

READ MORE –   Sensetime – Inside the World’s Biggest Facial Recognition AI Startup

Paying With Your Face

Recently, the Macau district in China has introduced ATM’s that are capable of reading the user’s face.

This negates the need for cards and pin numbers.

If proved to be successful it could lead to the end of paying with plastic.

FaceFirst's neural network powered software capable of identifying shoplifters.

Meanwhile, companies such as Facefirst are developing software capable of identifying shoplifters.

When implemented this can cut loss to crime, saving money, and making stores safer.

The company is also looking to roll out its systems at airports and other public areas.

Microsoft and Nvidia are just two of the companies working with Facefirst technology.

Finally at the 2019 CES Proctor and Gamble revealed their idea of the store of the future.

Here cameras driven by Artificial Neural Networks recognize customer’s face .

The system then makes product suggestions based on the customer’s past history and information.

Artificial Neural Networks are Revolutionising Business Practises

Artificial Neural Networks may be a complex concept to fully understand.

However, by using them in conjunction with deep learning tools allows computer-driven technology to make gigantic leaps forward.

From streamlining manufacturing to product suggestions and facial scanning, Artificial Neural Networks are transforming the way businesses operate.

Images: Flickr Unsplash Pixabay Wiki & Others

Subscribe to our newsletter

Signup today for free and be the first to get notified on the latest news and insights on artificial intelligence

' src=

RELATED ARTICLES

Ai model development isn’t the end; it’s the beginning, essential enterprise ai companies landscape, top 50 rpa tools – a comprehensive guide.

case study on neural networks

MOST POPULAR

AI Software in Banking

Top 25 AI Software for the Banking Industry

Data Science Powerful Applications Use Cases

Data Science – 8 Powerful Applications

10 Powerful Applications of Artificial Intelligence in Retail

10 Powerful Applications of AI in Retail

10 Applications of Machine Learning in Oil & Gas

10 Applications of Machine Learning in Oil & Gas

Top 10 Applications of Artificial Intelligence in Law

Top 10 Applications of AI in Law

Artificial Intelligence in Medicine - Top Applications

Artificial Intelligence in Medicine – Top 10 Applications

Latest ai news.

AI Model Development isn’t the End

Current AI Models Aren’t Good Enough

Enterprise AI Companies Landscape

Big Data: All the Stats, Facts, and Data You’ll Ever Need...

Computer Vision Applications in 10 Industries

Computer Vision Applications in 10 Industries

Natural Language Processing - 10 Amazing Examples

10 Amazing Examples Of Natural Language Processing

Facial Recognition All you Need to Know

Facial Recognition: All you Need to Know

Microsoft AI - From Rudderless Giant to AI First

Microsoft – From Rudderless Giant to AI First

Algorithm-X Lab - The business of artificial intelligence

  • Terms and Conditions

logo

Artificial Neural Networks, Made Easy – Retail Case Study Example (Part 8)

Welcome back to our retail case study example  for campaign and marketing analytics. So far in this case study, we were working on a classification problem to identify customers with a higher likelihood to purchase products from the campaign catalogues. In the last article on  model selection , we noticed that artificial neural networks performed better than logistic regression, and decision tree algorithms for our classification problem. In this article, we will try to gain an intuitive and simplified understanding of artificial neural networks which is inspired by our brain. In the next few segments we will learn about the properties of the brain that artificial neural networks try to mimic, such as:

Seeing with Your Tongue!

Artificial Neural Networks - by Roopam

Artificial Neural Networks – by Roopam

Eric Weihenmayer climbed Mount Everest in 2001. By doing this he became the first and till date the only blind person to achieve this feat. He pursues his passion in extreme rock climbing through a device called BrainPort, which helps him to see using his tongue! This device has a camera at one end connected to several hundred tiny electrodes that Eric places on his tongue to experience obstacles on his path. This experience for Eric is made possible through the incredible learning adaptiveness of the human brain. Initially, when Eric started using this device, he used to feel just some tingly sensation on his tongue associated with some experience. Slowly his brain learned to correlate each experience with a distinct sensation and enabled him to experience seeing. This is a phenomenal story about our brain’s capability to adapt – the property that has inspired machine learning algorithm called artificial neural networks.

Neural Networks’ Feed-Forward & Feedback Loops

The brain connects with other parts of the body through an intricate network of neurons called biological neural networks. The brain works with a powerful mechanism involving both feed-forward and feedback loops within these intricate neural networks. For example, the feed-forward mechanism involves inputs from the sensory organs, like eyes and ears, converted into outputs i.e, information and understanding. The feedback mechanism, on the other hand, makes the brain communicate with sensory organs and modify their inputs.

To learn about this better, let’s perform a couple of small experiments.  For the first experiment, close your eyes and say the following 3 words, at an interval of 10 seconds each, with the intention to visualize them.

  • Dragon killer

Most probably, you had visualized an elaborated  scene with a dragon attacking a village and getting killed by a dragon slayer. What you have just accomplished is a phenomenal capability of the brain to extract information about these words in a split second, and visualize a whole sequence of events without using your eyes. This is also the source of the elaborated imagination that human brains possess. In this case, one form of input (words) has generated another form of input (visualization) through a complicated process in our brain.

This second experiment will help us understand the feed-forward and feedback loop in our brain. It’s quite likely that you have come across the following sentence on some social media site. Anyway, read the sentence in the following text box.

Incredible, isn’t it? You  brain has gone through several cycles of feed-forward and feedback to read these jumbled letters in a matter of seconds. The brain, in this case, has superseded the incomplete and jumbled input i.e. information from eyes to meaningful output i.e. understanding of this sentence. Artificial neural networks try to mimic our phenomenal brain for prediction purposes through both feed-forward and feedback loop between input and output variables.

Artificial Neural Networks – Retail Case Study Example

Artifical Neural Networks

Artificial Neural Networks

Artificial neural networks are nowhere close to the intricacies that biological neural networks possess, but we must not forget the latter has gone through millions of years of evolution. On the other hand, artificial neural networks (from here on referred to as neural networks) have a history of close to half a century. In the 1990s, neural networks lost favour to other machine learning algorithms like support vector machines, etc. However, in the last decade or so, there is a renewed interest in neural networks because of the rise of deep learning. Let us try to understand the design of neural networks and their functionalities using our retail case study.

As displayed in the adjacent figure, neural networks could be broadly divided into three layers – input, hidden, and output. The hidden layer is the additional feature that separates neural networks from other predictive models. If we remove hidden layer from this architect it will become a simple regression (for estimation) or logistic regression (for classification), architect. The input layer in this architect is simply the input variables. For our retail case study as discussed in the last articles some of the input variables are:

The output layer, for our classification problem to identify customers who will respond to campaigns, is the binary variable that represents historic responders (0/1).

Mathematical Construct of Neural Networks

This section describes the mathematical construct of neural networks. If this seems a bit complicated to you, for now, I suggest you jump to the next section about the usage of neural networks.

Let’s come back to the hidden layer, each hidden layer has several hidden nodes (orange circles in the above figure). Each hidden node takes a weighted sum of input from every input variable. The following expression represents the weighted sum of input variables that the hidden nodes take as input. These input variables can be compared with input signals our sensory organs send to our brain, for example, in a case of fire around you – you see fire, you hear fire burning, you smell smoke, and your skin feels hot (a complete sensory experience through several input nodes).

(\textup{Hidden Node})_{i}=W_{i(Input\rightarrow Hidden)}^{T}\times \textup{(Input Variables)}_{i}+W_{0}

To begin with, the above weights W i(Input→Hidden)  &  W 0   are chosen at random, then they are modified iteratively to match the desired outputs (in output layer). Continuing with the above example of fire, if the sensory signals about the fire are too strong the tendency of the creature for self-preservation will take over. However, sensory signals about the fire from cooking stove need to be accounted as well for humans to cook. Hence, weights need to adjust for the fire usage and self-preservation.

In the hidden layer, the above linear weighted sum [(Hidden Node) i ] is converter to non-linear form through a non-linear function. This conversion is usually performed using the sigmoid activation function, yes this is the same logit function of the logistic regression. The following expression represents this process

{P(Hidden)}_{j}=\frac{e^{(\textup{Hidden Node})_{i}}}{1+e^{(\textup{Hidden Node})_{i}}}

Remember 0 ≤  P(Hidden)j ≤ 1;  this output [ P(Hidden) j ] for the different hidden nodes (j) becomes the input variables for the final output node. As described below

(\textup{Output})=U_{j(Hidden\rightarrow Output)}^{T}\times P(Hidden)_{j}+U_{0}

This linear weighted output is again converted to non-linear form through sigmoid function. The following is the probability of conversion of a customer P(Customer Response) based on his/her input variables.

{P(\textup{Customer Response})}=\frac{e^{(\textup{Output})}}{1+e^{(\textup{Output})}}

Neural network algorithms (like back propagation) iteratively modify weights for both links (i.e. Input→Hidden→Output) to reduce the error of prediction. Remember the weights for our architect are weights W i(Input→Hidden),  W 0   weights U j (Hidden→Output)  &  U 0  

Pros and Cons of Using Neural Networks

Let’s quickly sum of some of the important pros and cons of using neural networks for model development

Data Science Tasks - by Roopam

  • Neural networks are fairly insensitive to noise in the input data (similar to our brain) because of the hidden layer that absorbs noisy information.
  • They are better equipped to handle the fuzzy / non-linear relationship between input variables, and the output variable.
  • Neural networks are often considered as black boxes (again similar to our brain) because they don’t explicitly highlight the relationship between input and output variables. This is highly unlike the decision trees which offer highly intuitive solutions.
  • There is no set rule for choosing a number of hidden layers and hidden nodes while designing a neural networks architect. This requires a proficient data scientists to develop neural networks models.
  • Neural networks are often susceptible to overfitting, hence the analysts need to test the results carefully.

Sign-off note

These are early days for artificial neural networks but they sure have a lot of promise. Nature has patiently, and meticulously designed and modify our brain to develop phenomenal biological neural networks. I doubt if humans have the same amount of patience as Mother Nature. A virtue we can all learn from her.

See you soon with the next part of this case study.

7 thoughts on “ Artificial Neural Networks, Made Easy – Retail Case Study Example (Part 8) ”

Awesome Explanation, Easy to understand and implement

Thanks Anil

This is a very good article. Thank you for a great information.

Great case study! I think I should read this twice :3

Excellent articulation of a complex subject.

Excellent article , keep up the good work

Great article but needs more content

Leave a comment Cancel reply

Your email address will not be published. Required fields are marked *

Notify me of follow-up comments by email.

Notify me of new posts by email.

This site uses Akismet to reduce spam. Learn how your comment data is processed .

  • Journal homepage
  • Introduction
  • Instructions for Authors
  • Back Issues

A novel type of neural networks for feature engineering of geological data:Case studies of coal and gas hydrate-bearing sediments

  •   Lishuai Jiang 1 ;
  •   Yang Zhao 1 ;
  •   Naser Golsanami 1 ;
  •   Lianjun Chen 1 ;
  •   Weichao Yan 1

Cite this article

  • NoteExpress

{{custom_sec.title}}

Fulltext link.

  • Related articles

Book cover

Mutual Impact of Computing Power and Control Theory pp 259–271 Cite as

Neural Network Applications — Case Studies

  • Kevin Warwick 3  

100 Accesses

Neural networks have now been considered, in terms of their employment in control and systems, for a number of years. In this paper a number of case studies are discussed, one aim being to look at the different types of network possible, i.e. Back propagation, Kohonen, n-tuple etc, another aim being to look at the wide diversity of application studies carried out. The paper refers to a number of actual industrial implementations, and includes a section on the use of Genetic Algorithms.

This is a preview of subscription content, log in via an institution .

Buying options

  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Unable to display preview.  Download preview PDF.

Aitken, D., J.M. Bishop, S. A. Pepper, and R.J. Mitchell (1989). Pattern separation in digital learning nets, Electronics Letters , 25–685.

Google Scholar  

Aleksander, I. (1991). Introduction to neural nets, in Applied Artificial Intelligence , Warwick, K. (ed.), Peter Peregrinus Ltd.

Aleksander, I. and H. Morton (1990). An Introduction to Neural Computing , Chapmanand Hall Ltd.

Ball, N.R. and Warwick, K. (1992). Using self-organising feature maps for the control of artificial organisms, Proc. IEE, Part D, to appear.

Billings, S. and S. Chen (1992). Neural networks and system identification, in Neural Networks for Control and Systems , Warwick, K., Irwin, G. W. and Hunt, K. J., (eds.), Peter Peregrinus Ltd.

Bishop, J.M., M.J. Bushnell and S. Westland (1990). Computer recipe prediction using neural networks, in Research and Development in Expert Systems VII , Addis, T.R. and Muir, R.M. (eds.), Cambridge University Press.

Davidson, H.R., H. Hemmendinger and J. L. R. Landry (1963). A system of instrumental colour control for the textile industry. Journal of the Society of Dyers and Colourists , 79, 577.

Article   Google Scholar  

Dodd, N. (1990). Optimisation of network structure using genetic techniques. Proc. INNC90 , Paris, 693.

Gelaky, R., Warwick, K. and Usher, M.J., 1990, The implementation of a lowcostproduction line inspection system, Computer-aided Engineering Journal , 7 , 180.

Judd, D.B. and G. Wyszecki (1975). Colour in Business, Science and Industry , 3rd Ed.,Wiley, New York.

Kiernan, L. and K. Warwick (1991). Developing a learning system capable of hypothesis justification, Proc. IEE International Conference Control 91 , 272.

Kohonen, T. (1984). Self Organisation and Associative Memory , Springer-Verlag.

Kohonen, T. (1988). The neural phonetic typewriter. IEEE Computing Magazine , 21 , 11.

Narendra, K.S. and Parthasarathy, K. (1990). Identification and control of dynamical systems using neural networks. IEEE Trans. on Neural Networks , NN-1 ,4.

Rumelhart, D.E. and J.L. McClelland (1986). Parallel Distributed Processing: Explorations in the Microstructures of Cognition Vol. 1: Foundations , MIT Press, Cambridge, Mass.

Soderstrom, T. and Stoica, P. (1989). System Identification . Prentice-Hall.

Warwick, K. (1991). Neural net system for adaptive robot control, in Expert Systems and Robotics , Jordanides, T. and Torby, B. (eds.), Springer-Verlag.

Werbos, P. J. (1990). Back propagation through time: what is does and how to do it. Proc.of IEEE , 78 , 1550.

Westland, S. (1988). The Optical Properties of Printing Inks, PhD Thesis , University of 83Leeds (UK).

Wu, P. and K. Warwick (1990). A new neural coupling algorithm for speech processing, in Research and Development in Expert Systems VII , Addis, T. R. and Muir, R. M. (eds.), Cambridge University Press.

Zbikowski, R. and P. J. Gawthrop (1992). A survey of neural networks for control, in Neural Networks for Control and Systems ,Warwick, K., Irwin, G. W. and Hunt, K. J. (eds.), Peter Peregrinus Ltd.

Zhu, Q-M., K. Warwick and J. L. Douce (1991). Adaptive general predictive control for nonlinear systems. Proc. IEE , Part D, 138 , 33.

MATH   Google Scholar  

Download references

Author information

Authors and affiliations.

Department of Cybernetics, University of Reading, P.O Box 225, Whiteknights Reading, Berks, RG6 2AY, UK

Kevin Warwick

You can also search for this author in PubMed   Google Scholar

Editor information

Editors and affiliations.

Institute of Information Theory and Automation, Academy of Sciences of the Czech Republic, Prague, Czech Republic

School of Engineering and Information Sciences, University of Reading, Reading, UK

Rights and permissions

Reprints and permissions

Copyright information

© 1993 Springer Science+Business Media New York

About this chapter

Cite this chapter.

Warwick, K. (1993). Neural Network Applications — Case Studies. In: Kárný, M., Warwick, K. (eds) Mutual Impact of Computing Power and Control Theory. Springer, Boston, MA. https://doi.org/10.1007/978-1-4615-2968-2_19

Download citation

DOI : https://doi.org/10.1007/978-1-4615-2968-2_19

Publisher Name : Springer, Boston, MA

Print ISBN : 978-1-4613-6291-3

Online ISBN : 978-1-4615-2968-2

eBook Packages : Springer Book Archive

Share this chapter

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research

IMAGES

  1. Case study on neural networks:. You all know that in this era of…

    case study on neural networks

  2. Understanding Neural Networks: What, How and Why?

    case study on neural networks

  3. Artificial Neural Networks: Made Simple

    case study on neural networks

  4. Neural Networks

    case study on neural networks

  5. Case Study on Neural Networks

    case study on neural networks

  6. A Guide to Deep Learning and Neural Networks

    case study on neural networks

VIDEO

  1. Introduction to Neural Networks with Example in HINDI

  2. Neural Network Simply Explained

  3. Week 6

  4. #32 An Illustrative Example: Face Recognition in Neural Networks |ML|

  5. 8.3 Case Studies of Convolutional Neural Networks

  6. Lecture 11

COMMENTS

  1. Applied Deep Learning

    Part 1 was a hands-on introduction to Artificial Neural Networks, covering both the theory and application with a lot of code examples and visualization. Now comes the cool part, end-to-end application of deep learning to real-world datasets. ... We will cover the 3 most commonly encountered problems as case studies: binary classification ...

  2. Convolutional Neural Network—A Practical Case Study

    A convolutional neural network is a learning algorithm with connected nodes that function like the human brain's neurons [ 1 ]. These power hidden pattern recognition, correlation in raw data, as well as clustering and classification. CNNs constitute a continuous learning process.

  3. PDF Case study 16.2 Artificial neural networks

    Neural networks are particularly successful for resolving tasks associated with pattern recognition by learning to predict fu-ture events based on previously observed patterns. For example, ... Case study 16.2 Artificial neural networks Input layer Hidden layer Output layer x 1 x 2 x 3 y 1 y 2 (i) (ii)

  4. CS231n Convolutional Neural Networks for Visual Recognition

    Training a Neural Network; Summary; In this section we'll walk through a complete implementation of a toy Neural Network in 2 dimensions. We'll first implement a simple linear classifier and then extend the code to a 2-layer Neural Network. As we'll see, this extension is surprisingly simple and very few changes are necessary. Generating ...

  5. Applications of Neural Networks: A Case Study

    The interest of researchers from different scientific disciplines in the research area of neural networks grows as the list of applications of neural networks becomes longer. The recent interest in neural networks is reinforced by the complexity and diversity of the problems that can be successfully treated using neural networks.

  6. Deep Learning Neural Networks: Design And Case Studies

    Deep Learning Neural Networks is the fastest growing field in machine learning. It serves as a powerful computational tool for solving prediction, decision, diagnosis, detection and decision problems based on a well-defined computational architecture. It has been successfully applied to a broad field of applications ranging from computer security, speech recognition, image and video ...

  7. case study of using artificial neural networks for classifying cause of

    Abstract. Background Artificial neural networks (ANN) are gaining prominence as a method of classification in a wide range of disciplines. In this study ANN is applied to data from a verbal autopsy study as a means of classifying cause of death. Methods A simulated ANN was trained on a subset of verbal autopsy data, and the performance was tested on the remaining data.

  8. A Review and Case Study of Neural Network Techniques for ...

    2.1 Case Study: Rooftop Segmentation Using Data from the Swedish Mapping, Cadastral and Land Registration Authority. In the following, we present a case study of applying neural networks to rooftop segmentation using Swedish cadastral data. 2.1.1 Data Sources. Supervised learning solutions require an input set and a target set for the training ...

  9. Site Selection via Learning Graph Convolutional Neural Networks: A Case

    The proposed case study corroborates the geospatial interactions and offers new insights for solving various geographic and transport problems using graph neural networks. Selection of store sites is a common but challenging task in business practices.

  10. Study urges caution when comparing neural networks to the brain

    As the network analyzes huge amounts of data, the strengths of those connections change as the network learns to perform the desired task. In this study, the researchers focused on neural networks that have been developed to mimic the function of the brain's grid cells, which are found in the entorhinal cortex of the mammalian brain.

  11. Explainable Defect Detection Using Convolutional Neural Networks: Case

    In case you are interested in case studies, check my tutorial —Gentle introduction to 2D Hand Pose Estimation: Approach Explained. References [1] Zhou, Bolei, Aditya Khosla, Agata Lapedriza, Aude Oliva, and Antonio Torralba: Learning deep features for discriminative localization; in: Proceedings of the IEEE conference on computer vision and ...

  12. PDF Case Study: Neural Network Malware Detection Verification for Feature

    classifiers using the Neural Network Verification (NNV) and Neural Network Enumeration (nnenum) tools1, and outline the challenges and future considerations necessary for the improvement and re-finement of the verification of malware classification. By evaluating this novel domain as a case study, we hope to increase its visibility,

  13. Explaining the Neural Network: A Case Study to Model the Incidence of

    Introduction. This paper is about explainable neural networks, illustrated by an application of a challenging data set on cervical cancer screening that is available in the UCI repository [].The purpose of the paper is to describe a case study of the interpretation of a neural network by exploiting the same ANOVA decomposition that has been used in statistics to infer sparse non-linear ...

  14. Hybrid encodings for neuroevolution of convolutional neural networks

    The Neuroevolution of Convolutional Neural Networks have yield into highly competitive results in the field of visual recognition in contemporary years. Some of the most recent advances in this field have been related to the design of neural encodings to represent these highly complex Deep Learning structures.

  15. Physics informed neural networks: A case study for gas transport

    One of the biggest strength of physics informed neural networks is the flexibility to adapt the method easily for different use cases or exchange building blocks in the implementation such as the optimization method or the quadrature rule. Here, case studies like this one are very important to give rise to the most promising paths forward.

  16. Deep learning for recommender systems: A Netflix case study

    Deep learning has profoundly impacted many areas of machine learning. However, it took a while for its impact to be felt in the field of recommender systems. In this article, we outline some of the c...

  17. Cognitive Psychology for Deep Neural Networks: A Shape Bias Case Study

    Cognitive Psychology for Deep Neural Networks: A Shape Bias Case Study. Deep neural networks (DNNs) have achieved unprecedented performance on a wide range of complex tasks, rapidly outpacing our understanding of the nature of their solutions. This has caused a recent surge of interest in methods for rendering modern neural systems more ...

  18. Rainfall-Runoff Modeling Using Artificial Neural Network—A Case Study

    The present study examines the rainfall-runoff-based model development by using artificial neural networks (ANNs) models in the Yerli sub-catchment of the upper Tapi basin for a period of 36 years, i.e., from 1981 to 2016. The created ANN models were capable of establishing the correlation between input and output data sets. The rainfall and runoff models that were built have been calibrated ...

  19. PDF Chapter 8 Applications of Neural Networks: A Case Study

    8.1 Introduction. The interest of researchers from different scientific disciplines in the research area of neural networks grows as the list of applications of neural networks becomes longer. The recent interest in neural networks is reinforced by the complexity and diversity of the prob­ lems that can be successfully treated using neural ...

  20. Case Studies in Neural Data Analysis

    by Mark A. Kramer and Uri T. Eden. Paperback. $60.00. Paperback. ISBN: 9780262529372. Pub date: November 4, 2016. Publisher: The MIT Press. 384 pp., 7 x 9 in, 137 color illus. MIT Press Bookstore Penguin Random House Amazon Barnes and Noble Bookshop.org Indiebound Indigo Books a Million.

  21. 10 Use Cases of Neural Networks in Business

    Artificial Neural Networks can be used in a number of ways. They can classify information, cluster data, or predict outcomes. ANN's can be used for a range of tasks. These include analyzing data, transcribing speech into text, powering facial recognition software, or predicting the weather.

  22. Artificial Neural Networks: Made Simple

    In the 1990s, neural networks lost favour to other machine learning algorithms like support vector machines, etc. However, in the last decade or so, there is a renewed interest in neural networks because of the rise of deep learning. Let us try to understand the design of neural networks and their functionalities using our retail case study.

  23. Aesthetic Evaluation Method for Car Styling Based on BP Neural Network

    Aesthetic Evaluation Method for Car Styling Based on BP Neural Network and Perceptual Imagery Style A Case Study with the Z Generation Customer Group. Authors: Yanlong Li. School of Automotive Studies, Tongji University, China ... Shen Z Y, Sun L Optimization genetic algorithm based on BP neural network intelligent cockpit perceptual image ...

  24. Neural Network Robustness as a Verification Property: A Principled Case

    These case studies have demonstrated the importance of emancipating the study of desirable properties of neural networks from a concrete training method, and studying these properties in an abstract mathematical way. For example, we have discovered that some robustness properties can be ordered by logical strength and some are incomparable.

  25. Case Study Of Neural Network

    The first step towards neural networks took place in 1943, when Warren McCulloch, a neurophysiologist, and a young mathematician, Walter Pitts, wrote a paper on how neurons might work. They modeled a simple neural network with electrical circuits. In 1949, Donald Hebb reinforced the concept of neurons in his book, The Organization of Behavior.

  26. A novel type of neural networks for feature engineering of geological

    The nature of the measured data varies among different disciplines of geosciences.In rock engineering,features of data play a leading role in determining the feasible methods of its proper manipulation.The present study focuses on resolving one of the major deficiencies of conventional neural networks(NNs) in dealing with rock engineering data.Herein,since the samples are obtained from ...

  27. Water

    In this paper, based on the experimental situation, we consider optimizing the BP neural network with the particle swarm algorithm (PSO) to make up for the shortcomings of the neural network, so that the optimized neural network has a better prediction effect [32,33]. This study forecasts the infiltration depth of Xiashu loess slopes using a BP ...

  28. Neural Network Applications

    Abstract. Neural networks have now been considered, in terms of their employment in control and systems, for a number of years. In this paper a number of case studies are discussed, one aim being to look at the different types of network possible, i.e. Back propagation, Kohonen, n-tuple etc, another aim being to look at the wide diversity of ...