Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • NATURE INDEX
  • 17 November 2023

Hypotheses devised by AI could find ‘blind spots’ in research

  • Matthew Hutson 0

Matthew Hutson is a science writer based in New York City.

You can also search for this author in PubMed   Google Scholar

A 3D rendered artist's impression of artificial intelligence with an abstract human brain and question mark light bulbs.

Credit: Olemedia/Getty

In early October, as the Nobel Foundation announced the recipients of this year’s Nobel prizes, a group of researchers, including a previous laureate, met in Stockholm to discuss how artificial intelligence (AI) might have an increasingly creative role in the scientific process. The workshop, led in part by Hiroaki Kitano, a biologist and chief executive of Sony AI in Tokyo, considered creating prizes for AIs and AI–human collaborations that produce world-class science. Two years earlier, Kitano proposed the Nobel Turing Challenge 1 : the creation of highly autonomous systems (‘AI scientists’) with the potential to make Nobel-worthy discoveries by 2050.

It’s easy to imagine that AI could perform some of the necessary steps in scientific discovery. Researchers already use it to search the literature, automate data collection, run statistical analyses and even draft parts of papers. Generating hypotheses — a task that typically requires a creative spark to ask interesting and important questions — poses a more complex challenge. For Sendhil Mullainathan, an economist at the University of Chicago Booth School of Business in Illinois, “it’s probably been the single most exhilarating kind of research I’ve ever done in my life”.

Network effects

AI systems capable of generating hypotheses go back more than four decades. In the 1980s, Don Swanson, an information scientist at the University of Chicago, pioneered literature-based discovery — a text-mining exercise that aimed to sift ‘undiscovered public knowledge’ from the scientific literature. If some research papers say that A causes B, and others that B causes C, for example, one might hypothesize that A causes C. Swanson created software called Arrowsmith that searched collections of published papers for such indirect connections and proposed, for instance, that fish oil, which reduces blood viscosity, might treat Raynaud’s syndrome, in which blood vessels narrow in response to cold 2 . Subsequent experiments proved the hypothesis correct.

Literature-based discovery and other computational techniques can organize existing findings into ‘knowledge graphs’, networks of nodes representing, say, molecules and properties. AI can analyse these networks and propose undiscovered links between molecule nodes and property nodes. This process powers much of modern drug discovery, as well as the task of assigning functions to genes. A review article published in Nature 3 earlier this year explores other ways in which AI has generated hypotheses, such as proposing simple formulae that can organize noisy data points and predicting how proteins will fold up. Researchers have automated hypothesis generation in particle physics, materials science, biology, chemistry and other fields.

automated hypothesis generation

An AI revolution is brewing in medicine. What will it look like?

One approach is to use AI to help scientists brainstorm. This is a task that large language models — AI systems trained on large amounts of text to produce new text — are well suited for, says Yolanda Gil, a computer scientist at the University of Southern California in Los Angeles who has worked on AI scientists. Language models can produce inaccurate information and present it as real, but this ‘hallucination’ isn’t necessarily bad, Mullainathan says. It signifies, he says, “‘here’s a kind of thing that looks true’. That’s exactly what a hypothesis is.”

Blind spots are where AI might prove most useful. James Evans, a sociologist at the University of Chicago, has pushed AI to make ‘alien’ hypotheses — those that a human would be unlikely to make. In a paper published earlier this year in Nature Human Behaviour 4 , he and his colleague Jamshid Sourati built knowledge graphs containing not just materials and properties, but also researchers. Evans and Sourati’s algorithm traversed these networks, looking for hidden shortcuts between materials and properties. The aim was to maximize the plausibility of AI-devised hypotheses being true while minimizing the chances that researchers would hit on them naturally. For instance, if scientists who are studying a particular drug are only distantly connected to those studying a disease that it might cure, then the drug’s potential would ordinarily take much longer to discover.

When Evans and Sourati fed data published up to 2001 to their AI, they found that about 30% of its predictions about drug repurposing and the electrical properties of materials had been uncovered by researchers, roughly six to ten years later. The system can be tuned to make predictions that are more likely to be correct but also less of a leap, on the basis of concurrent findings and collaborations, Evans says. But “if we’re predicting what people are going to do next year, that just feels like a scoop machine”, he adds. He’s more interested in how the technology can take science in entirely new directions.

Keep it simple

Scientific hypotheses lie on a spectrum, from the concrete and specific (‘this protein will fold up in this way’) to the abstract and general (‘gravity accelerates all objects that have mass’). Until now, AI has produced more of the former. There’s another spectrum of hypotheses, partially aligned with the first, which ranges from the uninterpretable (these thousand factors lead to this result) to the clear (a simple formula or sentence). Evans argues that if a machine makes useful predictions about individual cases — “if you get all of these particular chemicals together, boom, you get this very strange effect” — but can’t explain why those cases work, that’s a technological feat rather than science. Mullainathan makes a similar point. In some fields, the underlying principles, such as the mechanics of protein folding, are understood and scientists just want AI to solve the practical problem of running complex computations that determine how bits of proteins will move around. But in fields in which the fundamentals remain hidden, such as medicine and social science, scientists want AI to identify rules that can be applied to fresh situations, Mullainathan says.

In a paper presented in September 5 at the Economics of Artificial Intelligence Conference in Toronto, Canada, Mullainathan and Jens Ludwig, an economist at the University of Chicago, described a method for AI and humans to collaboratively generate broad, clear hypotheses. In a proof of concept, they sought hypotheses related to characteristics of defendants’ faces that might influence a judge’s decision to free or detain them before trial. Given mugshots of past defendants, as well the judges’ decisions, an algorithm found that numerous subtle facial features correlated with judges’ decisions. The AI generated new mugshots with those features cranked either up or down, and human participants were asked to describe the general differences between them. Defendants likely to be freed were found to be more “well-groomed” and “heavy-faced”. Mullainathan says the method could be applied to other complex data sets, such as electrocardiograms, to find markers of an impending heart attack that doctors might not otherwise know to look for. “I love that paper,” Evans says. “That’s an interesting class of hypothesis generation.”

In science, experimentation and hypothesis generation often form an iterative cycle: a researcher asks a question, collects data and adjusts the question or asks a fresh one. Ross King, a computer scientist at Chalmers University of Technology in Gothenburg, Sweden, aims to complete this loop by building robotic systems that can perform experiments using mechanized arms 6 . One system, called Adam, automated experiments on microbe growth. Another, called Eve, tackled drug discovery. In one experiment, Eve helped to reveal the mechanism by which a toothpaste ingredient called triclosan can be used to fight malaria.

Robot scientists

King is now developing Genesis, a robotic system that experiments with yeast. Genesis will formulate and test hypotheses related to the biology of yeast by growing actual yeast cells in 10,000 bioreactors at a time, adjusting factors such as environmental conditions or making genome edits, and measuring characteristics such as gene expression. Conceivably, the hypotheses could involve many subtle factors, but King says they tend to involve a single gene or protein whose effects mirror those in human cells, which would make the discoveries potentially applicable in drug development. King, who is on the organizing committee of the Nobel Turing Challenge, says that these “robot scientists” have the potential to be more consistent, unbiased, cheap, efficient and transparent than humans.

Researchers see several hurdles to and opportunities for progress. AI systems that generate hypotheses often rely on machine learning, which usually requires a lot of data. Making more papers and data sets openly available would help, but scientists also need to build AI that doesn’t just operate by matching patterns but can also reason about the physical world, says Rose Yu, a computer scientist at the University of California, San Diego. Gil agrees that AI systems should not be driven only by data — they should also be guided by known laws. “That’s a very powerful way to include scientific knowledge into AI systems,” she says.

As data gathering becomes more automated, Evans predicts that automating hypothesis generation will become increasingly important. Giant telescopes and robotic labs collect more measurements than humans can handle. “We naturally have to scale up intelligent, adaptive questions”, he says, “if we don’t want to waste that capacity.”

doi: https://doi.org/10.1038/d41586-023-03596-0

Kitano, H. npj Syst. Biol. Appl. 7 , 29 (2021).

Article   PubMed   Google Scholar  

Swanson, D. R. Perspect. Biol. Med. 30 , 7–18 (1986).

Wang, H. et al. Nature 620 , 47–60 (2023).

Sourati, J. & Evans, J. A. Nature Hum. Behav. 7 , 1682–1696 (2023).

Ludwig, J. & Mullainathan, S. Working Paper 31017 (National Bureau of Economic Research, 2023).

King, R., Peter, O. & Courtney, P. in Artificial Intelligence in Science 129–139 (OECD Publishing, 2023).

Download references

Related Articles

automated hypothesis generation

  • Machine learning
  • Computer science

Lethal AI weapons are here: how can we control them?

Lethal AI weapons are here: how can we control them?

News Feature 23 APR 24

Will AI accelerate or delay the race to net-zero emissions?

Will AI accelerate or delay the race to net-zero emissions?

Comment 22 APR 24

AI’s keen diagnostic eye

AI’s keen diagnostic eye

Outlook 18 APR 24

AI now beats humans at basic tasks — new benchmarks are needed, says major report

AI now beats humans at basic tasks — new benchmarks are needed, says major report

News 15 APR 24

High-threshold and low-overhead fault-tolerant quantum memory

High-threshold and low-overhead fault-tolerant quantum memory

Article 27 MAR 24

Three reasons why AI doesn’t model human language

Correspondence 19 MAR 24

This water bottle purifies your drink with energy from your steps

This water bottle purifies your drink with energy from your steps

Research Highlight 17 APR 24

A milestone map of mouse-brain connectivity reveals challenging new terrain for scientists

A milestone map of mouse-brain connectivity reveals challenging new terrain for scientists

Technology Feature 15 APR 24

Postdoctoral Fellow

The Dubal Laboratory of Neuroscience and Aging at the University of California, San Francisco (UCSF) seeks postdoctoral fellows to investigate the ...

San Francisco, California

University of California, San Francsico

automated hypothesis generation

Postdoctoral Associate

Houston, Texas (US)

Baylor College of Medicine (BCM)

automated hypothesis generation

Postdoctoral Research Fellow

Description Applications are invited for a postdoctoral fellow position at the Lunenfeld-Tanenbaum Research Institute, Sinai Health, to participate...

Toronto (City), Ontario (CA)

Sinai Health

automated hypothesis generation

Postdoctoral Research Associate - Surgery

Memphis, Tennessee

St. Jude Children's Research Hospital (St. Jude)

automated hypothesis generation

Open Rank Faculty Position in Biochemistry and Molecular Genetics

The Department of Biochemistry & Molecular Genetics (www.virginia.edu/bmg) and the University of Virginia Cancer Center

Charlottesville, Virginia

Biochemistry & Molecular Genetics

automated hypothesis generation

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

automated hypothesis generation

Hypothesis Maker

Ai-powered research hypothesis generator.

  • Scientific Research: Generate a hypothesis for your experimental or observational study based on your research question.
  • Academic Studies: Formulate a hypothesis for your thesis, dissertation, or academic paper.
  • Market Research: Develop a hypothesis for your market research study to understand consumer behavior or market trends.
  • Social Science Research: Create a hypothesis for your social science research to explore societal or behavioral patterns.

New & Trending Tools

Ai quote generator, notes generator ai, ai writing ideas.

Help | Advanced Search

Computer Science > Artificial Intelligence

Title: automating psychological hypothesis generation with ai: large language models meet causal graph.

Abstract: Leveraging the synergy between causal knowledge graphs and a large language model (LLM), our study introduces a groundbreaking approach for computational hypothesis generation in psychology. We analyzed 43,312 psychology articles using a LLM to extract causal relation pairs. This analysis produced a specialized causal graph for psychology. Applying link prediction algorithms, we generated 130 potential psychological hypotheses focusing on `well-being', then compared them against research ideas conceived by doctoral scholars and those produced solely by the LLM. Interestingly, our combined approach of a LLM and causal graphs mirrored the expert-level insights in terms of novelty, clearly surpassing the LLM-only hypotheses (t(59) = 3.34, p=0.007 and t(59) = 4.32, p<0.001, respectively). This alignment was further corroborated using deep semantic analysis. Our results show that combining LLM with machine learning techniques such as causal knowledge graphs can revolutionize automated discovery in psychology, extracting novel insights from the extensive literature. This work stands at the crossroads of psychology and artificial intelligence, championing a new enriched paradigm for data-driven hypothesis generation in psychological research.

Submission history

Access paper:.

  • HTML (experimental)
  • Other Formats

license icon

References & Citations

  • Google Scholar
  • Semantic Scholar

BibTeX formatted citation

BibSonomy logo

Bibliographic and Citation Tools

Code, data and media associated with this article, recommenders and search tools.

  • Institution

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs .

Society Homepage About Public Health Policy Contact

Data-driven hypothesis generation in clinical research: what we learned from a human subject study, article sidebar.

automated hypothesis generation

Submit your own article

Register as an author to reserve your spot in the next issue of the Medical Research Archives.

Join the Society

The European Society of Medicine is more than a professional association. We are a community. Our members work in countries across the globe, yet are united by a common goal: to promote health and health equity, around the world.

Join Europe’s leading medical society and discover the many advantages of membership, including free article publication.

Main Article Content

Hypothesis generation is an early and critical step in any hypothesis-driven clinical research project. Because it is not yet a well-understood cognitive process, the need to improve the process goes unrecognized. Without an impactful hypothesis, the significance of any research project can be questionable, regardless of the rigor or diligence applied in other steps of the study, e.g., study design, data collection, and result analysis. In this perspective article, the authors provide a literature review on the following topics first: scientific thinking, reasoning, medical reasoning, literature-based discovery, and a field study to explore scientific thinking and discovery. Over the years, scientific thinking has shown excellent progress in cognitive science and its applied areas: education, medicine, and biomedical research. However, a review of the literature reveals the lack of original studies on hypothesis generation in clinical research. The authors then summarize their first human participant study exploring data-driven hypothesis generation by clinical researchers in a simulated setting. The results indicate that a secondary data analytical tool, VIADS—a visual interactive analytic tool for filtering, summarizing, and visualizing large health data sets coded with hierarchical terminologies, can shorten the time participants need, on average, to generate a hypothesis and also requires fewer cognitive events to generate each hypothesis. As a counterpoint, this exploration also indicates that the quality ratings of the hypotheses thus generated carry significantly lower ratings for feasibility when applying VIADS. Despite its small scale, the study confirmed the feasibility of conducting a human participant study directly to explore the hypothesis generation process in clinical research. This study provides supporting evidence to conduct a larger-scale study with a specifically designed tool to facilitate the hypothesis-generation process among inexperienced clinical researchers. A larger study could provide generalizable evidence, which in turn can potentially improve clinical research productivity and overall clinical research enterprise.

Article Details

The  Medical Research Archives  grants authors the right to publish and reproduce the unrevised contribution in whole or in part at any time and in any form for any scholarly non-commercial purpose with the condition that all publications of the contribution include a full citation to the journal as published by the  Medical Research Archives .

Automated Hypothesis Generation

automated hypothesis generation

Automated hypothesis generation: when machine-learning systems produce ideas, not just test them.

Testing ideas at scale. Fast.

While algorithms are mostly used as tools to number-crunch and test-drive ideas, they have yet been used to generate the ideas themselves. Let alone at scale.

Rather than thinking up one idea at a time and testing it, what if a machine could generate millions of ideas automatically? What if this same machine would then proceed to autonomously test and rank the ideas, discovering which are better supported by the data? A machine that can even identify the type of data that could refute one’s theories and challenge existing practices.

This machine lies at the heart of SparkBeyond Discovery: its Hypothesis Engine. The engine automatically generates millions of ideas, many of them novel. Asks questions we would never think to even ask.

This Hypothesis Engine integrates the world’s largest collection of algorithms, and bypasses human cognitive bias to produce millions of ideas, hypotheses and questions in minutes. These hypotheses ensure that any meaningful signals in the data are surfaced. Then, these signals are often immediately actionable, and can be used as predictive features in machine learning models.

Going beyond the bias

Human ideation is inherently limited by cognitive bottlenecks and biases, which restrict us in generating and testing ideas at scale and high throughput. We're also limited by the speed at which we can communicate. We don’t have the capacity to read and comprehend the thousands of scientific articles and patents published every day. 

What’s more, the questions we ask are biased by our experience and knowledge, or even our mood.

In data science and research workflows, there are key bottlenecks that limit what a person or team can accomplish while working on a problem within a finite amount of time. 

For example, when exploring for useful patterns in data, a data scientist only has time to conceive, engineer, and evaluate a limited number of distinct hypotheses, leaving many areas unexplored. 

One of these areas is the gaps within an organization’s own data. This internal data may only reveal part of the story, whereas augmented external data sources can provide valuable contextual information. Without it, hypotheses based only on internal data don’t take into account the influence of external factors, such as weather and local events, or macro-economic factors and market conditions. 

Instead, by mapping out the entire spectrum of dynamics that happen on earth,SparkBeyond Discovery connects the dots between every data set that exists and offers a comprehensive viewpoint.

Tap into humanity's collective intelligence

Just like search engines crawl the web for text, our machine started indexing the code, data and knowledge on the web, and amassed one of the world's largest libraries of open-source code functions. 

Using both automation and AI, the Hypothesis Engine employs these functions to generate four million hypotheses per minute—a capacity that allows the technology to work through hundreds of good and bad ideas every second.

Related Articles

Overcoming the Enterprise LLM Blindspot

Overcoming the Enterprise LLM Blindspot

Turns out Enterprise LLMs have a massive blindspot, diminishing AI's impact on real-world performance. Here's how to solve it.

Continue reading

automated hypothesis generation

Generative AI for data analytics: the future of enterprise sense-making

In the case of enterprise data analytics, generative AI will radically change the way we interrogate our data to explore, react to and shape our business realities.

automated hypothesis generation

Turning enterprise data into accessible knowledge for LLMs

With the recent release of the GPT edition of our Discovery Platform, we introduce novel ways to unlock the vault of deep enterprise knowledge and internally developed insights, making them accessible to decision makers at all levels

automated hypothesis generation

It was easier in this project since we used this outpout

Business insights.

Apply key dataset transformations through no/low-code workflows to clean, prep, and scope your datasets as needed for analysis

Predictive Models

Micro-segments, features for external models, business automation rules, root-cause analysis, join our event about this topic today..

Learn all about the SparkBeyond mission and product vision.

A conversation worth having today

Drop us a message and we'll get back to you promptly

Book a virtual meeting to see SparkBeyond products in action

Explore current job openings at SparkBeyond worldwide

Research Studio

Applications.

automated hypothesis generation

Accessibility Links

  • Skip to content
  • Skip to search IOPscience
  • Skip to Journals list
  • Accessibility help
  • Accessibility Help

Click here to close this panel.

Purpose-led Publishing is a coalition of three not-for-profit publishers in the field of physical sciences: AIP Publishing, the American Physical Society and IOP Publishing.

Together, as publishers that will always put purpose above profit, we have defined a set of industry standards that underpin high-quality, ethical scholarly communications.

We are proudly declaring that science is our only shareholder.

Scientific intuition inspired by machine learning-generated hypotheses

Pascal Friederich 1,2,3,4 , Mario Krenn 1,2,5 , Isaac Tamblyn 5,6 and Alán Aspuru-Guzik 1,2,5,7

Published 14 April 2021 • © 2021 The Author(s). Published by IOP Publishing Ltd Machine Learning: Science and Technology , Volume 2 , Number 2 Citation Pascal Friederich et al 2021 Mach. Learn.: Sci. Technol. 2 025027 DOI 10.1088/2632-2153/abda08

Article metrics

7689 Total downloads

Share this article

Author e-mails.

[email protected]

[email protected]

Author affiliations

1 Chemical Physics Theory Group, Department of Chemistry, University of Toronto, Toronto, Canada

2 Department of Computer Science, University of Toronto, Toronto, Canada

3 Institute of Theoretical Informatics, Karlsruhe Institute of Technology, Am Fasanengarten 5, 76131 Karlsruhe, Germany

4 Institute of Nanotechnology, Karlsruhe Institute of Technology, Hermann-von-Helmholtz-Platz 1, 76344 Eggenstein-Leopoldshafen, Germany

5 Vector Institute for Artificial Intelligence, Toronto, Canada

6 National Research Council of Canada, Ottawa, Canada

7 Canadian Institute for Advanced Research (CIFAR) Lebovic Fellow, Toronto, Canada

Pascal Friederich https://orcid.org/0000-0003-4465-1465

Mario Krenn https://orcid.org/0000-0003-1620-9207

Isaac Tamblyn https://orcid.org/0000-0002-8146-6667

Alán Aspuru-Guzik https://orcid.org/0000-0002-8277-4434

  • Received 7 November 2020
  • Accepted 8 January 2021
  • Published 14 April 2021

Peer review information

Method : Single-anonymous Revisions: 1 Screened for originality? Yes

Buy this article in print

Machine learning with application to questions in the physical sciences has become a widely used tool, successfully applied to classification, regression and optimization tasks in many areas. Research focus mostly lies in improving the accuracy of the machine learning models in numerical predictions, while scientific understanding is still almost exclusively generated by human researchers analysing numerical results and drawing conclusions. In this work, we shift the focus on the insights and the knowledge obtained by the machine learning models themselves. In particular, we study how it can be extracted and used to inspire human scientists to increase their intuitions and understanding of natural systems. We apply gradient boosting in decision trees to extract human-interpretable insights from big data sets from chemistry and physics. In chemistry, we not only rediscover widely know rules of thumb but also find new interesting motifs that tell us how to control solubility and energy levels of organic molecules. At the same time, in quantum physics, we gain new understanding on experiments for quantum entanglement. The ability to go beyond numerics and to enter the realm of scientific insight and hypothesis generation opens the door to use machine learning to accelerate the discovery of conceptual understanding in some of the most challenging domains of science.

Export citation and abstract BibTeX RIS

Original content from this work may be used under the terms of the Creative Commons Attribution 4.0 license . Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.

1. Introduction

Machine learning (ML) recently has become a widely used tool with many applications in the physical sciences [ 1 ], ranging from chemistry (for example, prediction of quantum chemistry properties [ 2 ], solving Schrödinger's equation [ 3 ], predicting reactions [ 4 ], materials discovery [ 5 ] or inverse materials design [ 6 , 7 ]) to physics (for example, identification of phases of matter [ 8 ], astronomical object recognition [ 9 ], or validation of quantum experiments [ 10 ]) and biology (for example, prediction of protein structures [ 11 ] or drug design [ 12 , 13 ]). Some open challenges regarding the application of machine learning models in natural sciences include the accessibility, homogeneity, amount and quality of available data, as well as a lack of machine learning models which inherently include physical laws, limiting the interpretability of the models' predictions. While ML models are successfully used and optimized to accelerate numerical predictions or to recognize or generate patterns in existing data, it is rarely inquired how the machine finds solutions, i.e. which patterns and correlations it detected and exploited. Thus, the scientific insight obtained by the model is not directly transferred to human scientists. First attempts to use artificial intelligence in physical sciences aimed to directly answer scientific questions, e.g. determine the location of protein encodings in the genome [ 14 ]. Further attempts to employ machine learning models to obtain insight and help scientists to develop theories were focused on rediscovering solutions to already solved problems, e.g. to rediscover the coordinate transformation in astrophysical [ 15 ] and non-linear dynamical systems [ 16 ], or to detect symmetries and conservation laws [ 17 ]. The methods used in these cases enforce information bottlenecks or interpretable transformations in the ML model that then can inspire scientific understanding [ 18 ]. However, to our knowledge such methods were mostly applied to solved problems and have not been used yet to obtain novel insight and answers to questions that are not well understood yet.

In this work, we propose to use machine learning and systematic data analysis to automate further the process of generation of interpretable scientific hypotheses. We demonstrate the applicability of the approach using two questions in the natural sciences — a rediscovery task of chemistry knowledge (hydrophobicity and molecular energy levels in simple as well as application relevant molecules) and the discovery of new intuitions in physics (quantum optics). We show that our approach 'rediscovers' but also extends known chemical rules of thumb for solubility and energy levels of organic molecules with application in organic photovoltaics and organic light-emitting diodes, and helps us to better understand the entanglement created in quantum optical experiments.

Our model represents its findings in a graph representation which is directly related to chemical or physical instances in the specific scientific domain. The results are statements regarding distinct subgraphs that can easily be comprehended and therefore, scientifically interpreted and understood by experts. This is in stark contrast to conventional machine learning models where the internal representations are only indirectly connected with the real physical entities and thus hard to impossible to interpret.

2.1. Computer generated hypotheses

We suggest an automated workflow for ML-based generation of human-interpretable scientific hypotheses as illustrated in figure 1 (a). The workflow is based on a reference database of calculated (potentially also measured) data points with graph-based structure and with corresponding target properties. A binary feature vector describing presence/absence of automatically generated subgraphs [ 20 ] is used to train a tree ensemble method, e.g. gradient boosting [ 19 ] or random forest regression/classification [ 21 , 22 ], that allows for the quantification of feature importances. Based on the features with the highest importance, a list of hypotheses is generated. Each hypothesis has the human understandable form

Feature i leads to an increase/decrease of the target property with strength s

Figure 1.

Figure 1.  Workflow for automated hypothesis generation. (a) General workflow, starting with a database of graphs and respective properties, followed by training of a machine learning model that allows for the extraction of feature importances, e.g. gradient boosting regression. Features with high importance are combined and analysed in a way that facilitates interpretation in order to stimulate scientific insight. (b) Schematic illustration of the gradient boosting regression method [ 19 ], where multiple simple decision tree models are trained sequentially. Each new decision tree is trained to correct the residual errors (red lines) of the previous models, so the final prediction F 0 ( x ) can be written as a sum of the mean label c 0 and a weighted series of models h i ( x ), where each h i predicts the deviation of the previous i  − 1 models from the ground truth. (c) Each decision tree is trained on samples that are represented using predefined input features (coloured squares) and uses their values to split the data set sequentially into smaller subsets which are used for the predictions. The subgraph based input representation used in this work allows a direct interpretation of the feature importances (d) that are computed based on a quantification of how meaningful features are for the accuracy of the machine learning model.

Download figure:

where i is the index of the corresponding feature (subgraph) in the input and strength s quantifies the degree of correlation between feature i and the target property. High feature importance does not necessarily correspond to a high direct correlation with the target feature. In many cases, multiple features have to be combined in order to become predictive, even if the single features individually do not help in the predicting the target property. Therefore, important features are combined using logical operations ( and , xor ,...) to automatically generate combined features which, especially in presence of higher-order correlations, can be directly interpreted by researchers.

2.2. Input representation and experiments

In this work, we test this workflow on two experiments in chemistry and physics. The first experiment targets the automated generation of intuitive rules that determine molecular properties, whereas the second aims at hypothesis generation for entanglement properties of quantum optical experiments. In both cases, we can describe the data points as graphs (molecules and quantum optical experiments), where nodes are chemical elements or optical instruments while edges are chemical bonds or photon paths travelling through the setup. This allows us to use fingerprinting techniques to generate input representations (bit-vectors), e.g. using the algorithm for circular extended-connectivity fingerprints [ 20 ]. This iterative algorithm generates a unique representation of each node, including its local environment. In each iteration, hashing functions are used to aggregate the information (predefined node and edge features) of the next nearest neighbors of each node, thus implicitly integrating information of one additional neighbor shell in each iteration. In the end, a hashing function is used to map all subgraphs found in the graphs to bit-vectors. Each entry in these bit-vectors encodes the presence or absence of a certain subgraph. A similar approach has been used in Lopez et al [ 23 ] to determine molecular substructures in molecules for organic solar cells that lead to high power conversion efficiencies. Other models that link the presence of subgraphs (or more generally features) in the input data to properties can potentially be employed in our workflow (see e.g. Duvenaud et al [ 24 ] where molecular fragments are identified that correlate with toxicity, the Grad-CAM method by Selvaraju et al [ 25 ] for convolutional neural networks or the GNNExplainer by Ying et al [ 26 ]). In contrast to this work, some of these approaches depend on the analysis of single samples and thus only indirectly allow to conclude about an entire data set. Furthermore, these approaches assign importance indicators to single nodes or edges of a graph, which are not necessarily binary numbers, which complicates the direct interpretation. Due to their general applicability to all graphs where node and edges can be represented by one or multiple categorical features, we focused on automatically generated circular fingerprints in this work.

To test the automated hypothesis generation workflow, we performed experiments in two scientific domains, molecular chemistry (section 3.1 ) and quantum optical experiments (section 3.2 ). We computed physical properties of these graphs and used the generated data sets and the workflow described in figure 1 to automatically generate hypotheses that can be either compared to a collection of widely known chemical rules of thumb or that can help to better understand entanglement in quantum optical experiments for designing future experiments.

3.1. Chemical intuition for solubility, energy levels

In case of the chemistry experiment, we used two prototypical target properties — the water–octanol partition coefficient which describes the solubility of molecules in water (polar) vs. octanol (non-polar) as well as the energy of the highest occupied molecular orbital. Both properties are of high relevance for the application of molecules as pharmaceuticals or in electronic devices, e.g. for organic solar cells, organic light-emitting diode (OLED) displays or organic flow batteries. We furthermore analysed existing application-specific data sets, namely a data set of thermally activated delayed fluorescent (TADF) molecules as emitter molecules for OLEDs [ 27 ], the Harvard Clean Energy project data set [ 28 , 29 ] and a data set of non-fullerene acceptor molecules for organic solar cells [ 23 ]. Solubility and energy levels are relatively well understood and for both properties there exist several widely known rules of thumb, often described as chemical intuition, which describes how certain functional groups influence them. Our experiment aims to test whether the automated hypothesis generation method can 'rediscover' those rules and potentially add new or refined rules. For frontier orbital gaps reported in the Harvard Clean Energy data set and the non-fullerene acceptor data set as well as for singlet-triplet energy splittings reported in the TADF data set, there exists less chemical intuition on how to influence and tune them.

Figure 2 shows two solubility related hypotheses that were generated using our workflow. Without prior knowledge, the algorithm predicts two widely known chemical groups/motifs for increasing solubility in polar solvents (carbonyl group in figure 2 (a) and to increase solubility in non-polar solvents (conjugated carbon chain in figure 2 (b)). Figure 3 shows an overview of molecular subgraphs that positively and negatively influence the HOMO energy of a molecule. To our surprise, five of the nine groups shown in the figure can directly be found in chemistry textbooks or Wikipedia when searching for electrophilic aromatic directing groups which can change the energy levels of molecules through the inductive effect and the mesomeric effect. Specifically, the oxido (O − ) group that shows the strongest positive influence on the HOMO is well known for a strong resonance donating and a strong inductive effect which both leads to an increase in HOMO energy. Furthermore, heterocycles that contain nitrogen, as well as amine (NH 2 ) groups are also known for lifting the HOMO level to higher energies. On the other hand, the nitrile group (C≡N) is one of the most widely known electron-withdrawing groups that lowers the HOMO energy of molecules due to its resonance withdrawing and inductively withdrawing nature.

Figure 2.

Figure 2.  Hypotheses about molecular solubility. (a) Lower log  P values (better solubility in water compared to octanol) can be achieved using the carbonyl groups, while (b) conjugated carbon chains lead to higher log  P values.

Figure 3.

Figure 3.  Hypotheses about molecular energy levels. Molecular subgraphs with a positive (left) and negative (right) influence on the HOMO energy. The groups 'discovered' by our automated workflow are widely known activating (resonance donating or electron donating) and deactivating groups, such as oxido/amino groups and nitrile groups.

The patterns found to be relevant for small HOMO–LUMO gaps in the Harvard Clean Energy data set as well as in the non-fullerene acceptor data set are mostly related to extended aromatic systems and fused aromatic rings (see figures 5 (a) and S1(a) (available online at stacks.iop.org/MLST/2/025027/mmedia )). This finding is well-understood by chemists due to the widely known relation between the size of an aromatic system (i.e. the degree of delocalization of π -electrons) and the frontier orbital gap [ 30 ]. In the limit of infinite delocalization (e.g. in graphene), the HOMO–LUMO gap closes completely. This relation was also exploited in the development of conductive polymers, which was awarded with the Nobel Price in Chemistry in 2000 and which created the field of organic electronics [ 31 ].

However, we additionally found several interesting and surprising patterns both in the photovoltaic data sets (figures 5 (b) and (c)) and in the TADF dataset (figure 4 ). In case of the Harvard Clean Energy data set, we find that aromatic heterocycles with sulfur (e.g. thiophene rings) as well as silicon heteroatoms (e.g. silole rings) significantly reduce the HOMO–LUMO gap. While the former are widely used in organic electronics to control energy levels and reduce HOMO–LUMO gaps, silole rings are more unusual.

Figure 4.

In the non-fullerene acceptor data set (see figure 5 (c)) we found that thiophene rings connected by double bonds (i.e. forming a quinoid structure instead of aromatic systems) also significantly reduce the HOMO–LUMO gap, which is a know relation first described by Brédas [ 32 ]. However, such systems require a specific functionalization in the periphery of the molecule to enforce the quinoid structure of the two thiophene rings, which intrinsically is less stable and thus higher in energy than the aromatic structure.

Figure 5.

Figure 5.  Hypotheses about HOMO–LUMO gaps in the Harvard Clean Energy data set [ 28 , 29 ] and a non-fullerene acceptor data set [ 23 ]. (a) The automated hypotheses generation protocol rediscovers the widely known relation between extended aromatic systems (containing e.g. nitrogen heteroatoms) and reduced HOMO–LUMO gaps. (b) Thiophene but also more uncommon silole rings are found to correlate with small HOMO–LUMO gaps. (c) Thiophene rings bridged with double bonds (quinoid structures) are found to decrease the HOMO–LUMO gap in the non-fullerene acceptor data set. (Note the different scale in panel (c) compared to (a, b), due to differences in the data sets.)

In case of the TADF data set (see figure 4 ), we found expected patterns such as triarylamines that correlate with decreased singlet triplet gaps (S1–T1 gaps) as well as rather unexpected patterns (e.g. conjugated bridges) that are identified by our workflow as chemical groups that highly correlate with large singlet triplet gaps. Low singlet–triplet splittings in TADF molecules are typically achieved by decoupling electron donating and electron accepting parts of a molecule to reduce the exchange interaction between the frontier orbitals which would otherwise lower the triplet state compared to the singlet state and open an undesired singlet–triplet splitting. The decoupling of the fragments can be achieved by introducing twist angles close to 90 ∘ between the fragments. One way to accomplish this are triarylamines bridges between the fragments. We expect that the conjugated bridges between fragments have precisely the opposite effect: They lead to a planar alignment of the adjacent fragments and thus an enhanced exchange interaction, reduced triplet energies and finally increased singlet–triplet splittings.

3.2. Physical intuitions for quantum experiments

As a second example, we use quantum optical experiments for producing high-dimensional, multipartite quantum entanglement [ 33 , 34 ]. These experiments grow in interest as they allow the investigation of fundamental physical properties — such as local realism [ 35 ] — in laboratories. Furthermore, such quantum states are the key resources for large and complex quantum communication networks [ 36 , 37 ], which are on the edge of commercial availability. The experimental setups that we consider consist of standard optical components that are used in labs, such as non-linear crystals for the creation of photon pairs, single-photon detectors, beam splitters, holograms or Dove prisms. Under approximations that are closely resembled in experiments, the final emergent quantum state can be reliably calculated [ 38 ].

A key challenge lies in the design of experiments which creates certain desired quantum systems. The difficulty arises from counter-intuitive quantum phenomena, which raises the question of whether human intuition is the best way to design new experiments. Several studies have therefore developed automated and machine-learning augmented approaches for the design of experiments [ 39 – 44 ]. The goal in our approach is to tackle this challenge in a completely different way, namely by improving the scientist's intuition about these systems.

Figure 6.

Figure 6.  Hypotheses about quantum optical experiments. Experimental substructures leading to a decrease in the overall size of the Hilbert space of involved qubits ( n Q ) are shown in (a) while substructures with positive influence are shown in (b).

3.3. Logically combined features

Figure 7.

4. Conclusion and outlook

We presented a data-driven machine learning workflow for automated generation and verification of hypotheses about observations in natural sciences. We presented examples from chemistry and physics, but our method is directly applicable to most applications, where structures can be represented as graphs, e.g. to DNA/RNA data in biology [ 50 , 51 ], chemical reaction networks [ 52 , 53 ] or graphs in social sciences. In chemistry, the workflow 'rediscovers' widely known relations regarding solubility and electronic properties of molecules (often referred to as chemical intuition). In physics, the algorithm discovers rules to generate highly entangled three-photon states in quantum optical experiments. These rules are interpretable by human experts in retrospect, yet not known or postulated before, and even contradicting some of the field's current understanding. Finding such rules will not only help researchers to understand complex scientific relationships and thus design better experiments, but also reduce unavoidable and often undetectable bias generated by prior knowledge and anticipations.

4.1. Hypothesis testing

In addition to automated hypothesis generation, protocols for testing of the postulated hypotheses would be beneficial. In case of the chemistry experiment, a possible hypothesis testing protocol would generate mutations of each molecule in the training set to test the hypotheses on molecules with similar representations, where (ideally) only the relevant feature is changed. In case of the quantum optical experiments, not all random mutations will lead to maximally entangled states between all photons, which is a requirement to compute the entanglement of the quantum state. We currently see two options for automated hypothesis verification both of which we are currently implementing. The first follows the same procedure of mutation and computation as in the chemistry experiment, with the caveat that only a small fraction of the mutations will lead to useful results, potentially making the procedure computationally costly. The second option is based on finding other experimental setups within the whole database that are as similar to the reference experiment as possible, with the exception of the feature that is currently analysed. This procedure is computationally costly as well but does not require new computations.

Acknowledgments

PF acknowledges funding the European Union's Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie Grant Agreement No. 795206 (MolDesign). MK acknowledges support from the Austrian Science Fund (FWF) through the Erwin Schrödinger fellowship No. J4309. IT acknowledges NSERC and performed work at the NRC under the auspices of the AI4D and MCF Programs. AA-G thanks Anders G Frøseth for his generous support. AA-G acknowledges the generous support of Natural Resources Canada and the Canada 150 Research Chairs program.

Data availability statement

The data that support the findings of this study are available upon reasonable request from the authors.

Supplementary data

Penn State  Logo

  • Help & FAQ

An automated framework for hypotheses generation using literature

  • Department of Public Health Sciences
  • Department of Neurology
  • Penn State Neuroscience Institute

Research output : Contribution to journal › Article › peer-review

Background: In bio-medicine, exploratory studies and hypothesis generation often begin with researching existing literature to identify a set of factors and their association with diseases, phenotypes, or biological processes. Many scientists are overwhelmed by the sheer volume of literature on a disease when they plan to generate a new hypothesis or study a biological phenomenon. The situation is even worse for junior investigators who often find it difficult to formulate new hypotheses or, more importantly, corroborate if their hypothesis is consistent with existing literature. It is a daunting task to be abreast with so much being published and also remember all combinations of direct and indirect associations. Fortunately there is a growing trend of using literature mining and knowledge discovery tools in biomedical research. However, there is still a large gap between the huge amount of effort and resources invested in disease research and the little effort in harvesting the published knowledge. The proposed hypothesis generation framework (HGF) finds crisp semantic associations among entities of interest - that is a step towards bridging such gaps. Methodology. The proposed HGF shares similar end goals like the SWAN but are more holistic in nature and was designed and implemented using scalable and efficient computational models of disease-disease interaction. The integration of mapping ontologies with latent semantic analysis is critical in capturing domain specific direct and indirect crisp associations, and making assertions about entities (such as disease X is associated with a set of factors Z). Results: Pilot studies were performed using two diseases. A comparative analysis of the computed associations and assertions with curated expert knowledge was performed to validate the results. It was observed that the HGF is able to capture crisp direct and indirect associations, and provide knowledge discovery on demand. Conclusions: The proposed framework is fast, efficient, and robust in generating new hypotheses to identify factors associated with a disease. A full integrated Web service application is being developed for wide dissemination of the HGF. A large-scale study by the domain experts and associated researchers is underway to validate the associations and assertions computed by the HGF.

All Science Journal Classification (ASJC) codes

  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Computational Theory and Mathematics
  • Computational Mathematics

Access to Document

  • 10.1186/1756-0381-5-13

Other files and links

  • Link to publication in Scopus
  • Link to the citations in Scopus

Fingerprint

  • Mining Chemical Compounds 100%
  • Volume Chemical Compounds 50%
  • Framework Mathematics 49%
  • Knowledge Discovery Medicine & Life Sciences 39%
  • Amount Chemical Compounds 37%
  • Application Chemical Compounds 27%
  • Assertion Mathematics 26%
  • Semantics Medicine & Life Sciences 25%

T1 - An automated framework for hypotheses generation using literature

AU - Abedi, Vida

AU - Zand, Ramin

AU - Yeasin, Mohammed

AU - Faisal, Fazle Elahi

N1 - Funding Information: This work was supported by the Electrical and Computer Engineering Department and Bioinformatics Program at the University of Memphis, by the University of Tennessee Health Science Center (UTHSC), as well as by NSF grant NSF-IIS-0746790. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the funding institution.

N2 - Background: In bio-medicine, exploratory studies and hypothesis generation often begin with researching existing literature to identify a set of factors and their association with diseases, phenotypes, or biological processes. Many scientists are overwhelmed by the sheer volume of literature on a disease when they plan to generate a new hypothesis or study a biological phenomenon. The situation is even worse for junior investigators who often find it difficult to formulate new hypotheses or, more importantly, corroborate if their hypothesis is consistent with existing literature. It is a daunting task to be abreast with so much being published and also remember all combinations of direct and indirect associations. Fortunately there is a growing trend of using literature mining and knowledge discovery tools in biomedical research. However, there is still a large gap between the huge amount of effort and resources invested in disease research and the little effort in harvesting the published knowledge. The proposed hypothesis generation framework (HGF) finds crisp semantic associations among entities of interest - that is a step towards bridging such gaps. Methodology. The proposed HGF shares similar end goals like the SWAN but are more holistic in nature and was designed and implemented using scalable and efficient computational models of disease-disease interaction. The integration of mapping ontologies with latent semantic analysis is critical in capturing domain specific direct and indirect crisp associations, and making assertions about entities (such as disease X is associated with a set of factors Z). Results: Pilot studies were performed using two diseases. A comparative analysis of the computed associations and assertions with curated expert knowledge was performed to validate the results. It was observed that the HGF is able to capture crisp direct and indirect associations, and provide knowledge discovery on demand. Conclusions: The proposed framework is fast, efficient, and robust in generating new hypotheses to identify factors associated with a disease. A full integrated Web service application is being developed for wide dissemination of the HGF. A large-scale study by the domain experts and associated researchers is underway to validate the associations and assertions computed by the HGF.

AB - Background: In bio-medicine, exploratory studies and hypothesis generation often begin with researching existing literature to identify a set of factors and their association with diseases, phenotypes, or biological processes. Many scientists are overwhelmed by the sheer volume of literature on a disease when they plan to generate a new hypothesis or study a biological phenomenon. The situation is even worse for junior investigators who often find it difficult to formulate new hypotheses or, more importantly, corroborate if their hypothesis is consistent with existing literature. It is a daunting task to be abreast with so much being published and also remember all combinations of direct and indirect associations. Fortunately there is a growing trend of using literature mining and knowledge discovery tools in biomedical research. However, there is still a large gap between the huge amount of effort and resources invested in disease research and the little effort in harvesting the published knowledge. The proposed hypothesis generation framework (HGF) finds crisp semantic associations among entities of interest - that is a step towards bridging such gaps. Methodology. The proposed HGF shares similar end goals like the SWAN but are more holistic in nature and was designed and implemented using scalable and efficient computational models of disease-disease interaction. The integration of mapping ontologies with latent semantic analysis is critical in capturing domain specific direct and indirect crisp associations, and making assertions about entities (such as disease X is associated with a set of factors Z). Results: Pilot studies were performed using two diseases. A comparative analysis of the computed associations and assertions with curated expert knowledge was performed to validate the results. It was observed that the HGF is able to capture crisp direct and indirect associations, and provide knowledge discovery on demand. Conclusions: The proposed framework is fast, efficient, and robust in generating new hypotheses to identify factors associated with a disease. A full integrated Web service application is being developed for wide dissemination of the HGF. A large-scale study by the domain experts and associated researchers is underway to validate the associations and assertions computed by the HGF.

UR - http://www.scopus.com/inward/record.url?scp=84865434310&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84865434310&partnerID=8YFLogxK

U2 - 10.1186/1756-0381-5-13

DO - 10.1186/1756-0381-5-13

M3 - Article

AN - SCOPUS:84865434310

SN - 1756-0381

JO - BioData Mining

JF - BioData Mining

Subscribe to the PwC Newsletter

Join the community, edit social preview.

automated hypothesis generation

Add a new code entry for this paper

Remove a code repository from this paper, mark the official implementation from paper authors, add a new evaluation result row.

  • LINK PREDICTION
  • VOCAL BURSTS INTENSITY PREDICTION

Remove a task

automated hypothesis generation

Add a method

Remove a method, edit datasets, explainable automatic hypothesis generation via high-order graph walks.

29 Sep 2021  ·  Uchenna Akujuobi , Xiangliang Zhang , Sucheendra Palaniappan , Michael Spranger · Edit social preview

In this paper, we study the automatic hypothesis generation (HG) problem, focusing on explainability. Given pairs of biomedical terms, we focus on link prediction to explain how the prediction was made. This more transparent process encourages trust in the biomedical community for automatic hypothesis generation systems. We use a reinforcement learning strategy to formulate the HG problem as a guided node-pair embedding-based link prediction problem via a directed graph walk. Given nodes in a node-pair, the model starts a graph walk, simultaneously aggregating information from the visited nodes and their neighbors for an improved node-pair representation. Then at the end of the walk, it infers the probability of a link from the gathered information. This guided walk framework allows for explainability via the walk trajectory information. By evaluating our model on predicting the links between millions of biomedical terms in both transductive and inductive settings, we verified the effectiveness of our proposed model on obtaining higher prediction accuracy than baselines and understanding the reason for a link prediction.

Code Edit Add Remove Mark official

Tasks edit add remove, datasets edit, results from the paper edit add remove, methods edit add remove.

Explainable Automatic Hypothesis Generation via High-order Graph Walks

Uchenna akujuobi , xiangliang zhang , sucheendra palaniappan , michael spranger, send feedback.

Enter your feedback below and we'll get back to you as soon as possible. To submit a bug report or feature request, you can use the official OpenReview GitHub repository: Report an issue

BibTeX Record

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Front Bioeng Biotechnol

Automated Hypothesis Generation to Identify Signals Relevant in the Development of Mammalian Cell and Tissue Bioprocesses, With Validation in a Retinal Culture System

1 Department of Comparative Biology and Experimental Medicine, Faculty of Veterinary Medicine, University of Calgary, Calgary, AB, Canada

2 Alberta Children’s Hospital Research Institute, University of Calgary, Calgary, AB, Canada

Abdullah Al-Ani

3 Biomedical Engineering Graduate Program, University of Calgary, Calgary, AB, Canada

4 Alberta Diabetes Institute, University of Alberta, Edmonton, AB, Canada

5 Leaders in Medicine Program, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada

Qing Yun (Victor) Tong

Matthew workentine, mark ungrin, associated data.

The datasets analyzed for this study can be found in the GEO database( https://www.ncbi.nlm.nih.gov/geo/ ), using GSM accession numbers detailed in Supplementary Table S1 .

We have developed an accessible software tool (receptoR) to predict potentially active signaling pathways in one or more cell type(s) of interest from publicly available transcriptome data. As proof-of-concept, we applied it to mouse photoreceptors, yielding the previously untested hypothesis that activin signaling pathways are active in these cells. Expression of the type 2 activin receptor ( Acvr2a ) was experimentally confirmed by both RT-qPCR and immunochemistry, and activation of this signaling pathway with recombinant activin A significantly enhanced the survival of magnetically sorted photoreceptors in culture. Taken together, we demonstrate that our approach can be easily used to mine publicly available transcriptome data and generate hypotheses around receptor expression that can be used to identify novel signaling pathways in specific cell types of interest. We anticipate that receptoR (available at https://www.ucalgary.ca/ungrinlab/receptoR ) will enable more efficient use of limited research resources.

Introduction

The ability to understand and manipulate cellular behavior is critical to conventional small-molecule pharmaceutical therapies as well as the rapidly growing fields of tissue engineering, regenerative medicine and cell-based therapy ( Alimbetov et al., 2018 ; Dalby et al., 2018 ; Loebel and Burdick, 2018 ; Zhang et al., 2018 ; Manoukian et al., 2019 ). Aside from the direct manipulation of transcription factors, much of our capacity to control this behavior comes via intervention in cellular signaling cascades. In this way, the expansion of human pluripotent stem cells (hPSCs) may be enhanced ( Lipsitz et al., 2018 ) and their differentiation directed to fates as diverse as retinal pigment epithelial cells ( da Cruz et al., 2018 ) or insulin producing beta cells ( Rezania et al., 2014 ). Chimeric antigen receptor (CAR) T cells function via an engineered signaling cascade that can redirect T cells toward a specific antigen (reviewed in Jackson et al., 2016 ), while cytokine traps that remove targeted ligands ( Economides et al., 2003 ) have shown promise as treatments for macular edema caused by an overgrowth of endothelial cells ( Heier et al., 2012 ). These diverse applications share the common mechanism of intervening in pre-existing signaling pathways within the relevant cell types.

While pathways that are active in a given type of cell may be inferred from previous developmental or functional studies ( Zhou et al., 2015 ; Fan et al., 2016 ; Yoon et al., 2018 ; van der Kant et al., 2019 ), such research may not yet have been completed for a given tissue of interest. Even if it has, there is no guarantee that all relevant interactions have been identified. While efforts such as the Human Cell Atlas ( Rozenblatt-Rosen et al., 2017 ) promise to facilitate data sharing, it is challenging for researchers who have identified a role for a particular cell type in their disease of interest, to enter new areas and acquire the depth of specialist knowledge required to predict potential interventions.

In attempting to manipulate the behavior of a given cell type, an important starting point is simply “What receptors does this cell express?” There is no point in adding a particular factor to the culture medium if the cell lacks the machinery required to respond to it ( Miyajima et al., 1992 ; Krebs and Hilton, 2000 ; Uings and Farrow, 2000 ). Conversely, while expression of a certain receptor does not guarantee its downstream signaling function ( Ris-Stalpers et al., 1990 ; Robbins et al., 1993 ), it does immediately provide us with a pair of easily testable hypotheses: firstly, that the cell will respond to its activation in some way; and secondly that this response already occurs at some point within the range of environments to which that cell type is normally exposed (niche) – and potentially in the culture system of interest as well. Additional knowledge about the function of that signaling pathway in other contexts may be informative ( Cai et al., 2016 ; Harskamp et al., 2016 ; Schwartz et al., 2016 ; Pavlos and Friedman, 2017 ) but is not required (and is not necessarily complete in any case) ( Cendrowski et al., 2016 ). The cells may then be exposed to activators and inhibitors of the receptor identified, and the impact assessed. Should a response be observed (for example, on function or proliferation), if it is a desirable one then the ligand concentration can be optimized and routinely incorporated into the bioprocess under development; if it is undesirable then antagonism of that pathway may similarly be employed.

Where resources are available, receptor expression may be characterized specifically in a system of interest. However, funding agencies are often unwilling to support expensive “fishing expeditions” and even if funding is available, it could be employed elsewhere if less expensive approaches could be identified. We therefore sought to automate the generation of hypotheses about the presence and function of receptors using the significant quantity of existing gene transcript resources. While increasingly ceding ground to RNA sequencing (RNA-seq) approaches, over half the data series available on the publicly-accessible Gene Expression Omnibus (GEO) database derive from expression profiling by microarray, with high throughput sequencing platforms making up less than one quarter. Furthermore, both technologies offer high throughput assessment of gene expression with similar quantitation accuracy and high technical reproducibility (reviewed in Lowe et al., 2017 ). Despite a more limited dynamic range and lower sensitivity than either RNA-seq or quantitative PCR, microarrays have proven to be a reliable technology for detecting significantly enriched genes between tissue types, and robust expression patterns between all three technologies correlate well ( Larkin et al., 2005 ; Allanach et al., 2008 ; Marioni et al., 2008 ; Su et al., 2014 ). Microarray data has been collected on platforms with a high degree of uniformity, and for which minimum information standards already exist ( Brazma et al., 2001 ; Ball et al., 2004 ). Originally obtained to test specific hypotheses, in aggregate they contain a tremendous amount of information on, e.g., untreated control groups that may not otherwise have been previously assessed.

The most straight-forward way to access these data are with open source tools that can query the GEO database ( Gentleman et al., 2004 ; Davis and Meltzer, 2007 ; R Development Core Team, 2008 ; Huber et al., 2015 ), however these typically make use of the command line and can have a steep learning curve. Several bioinformatics tools have been developed to provide a graphical user interface to this data ( Dumas et al., 2016 ; Nelson et al., 2017 ), although these are limited to analyzing a single experimental series at once. We therefore developed a software tool – receptoR – to enable non-bioinformaticians to access and then aggregate this data, and rapidly generate hypotheses about signaling pathways that may be relevant to the cell and tissue types they are studying. We validated receptoR’s performance in retinal photoreceptor cells, as their survival and function in vitro and in vivo are highly influenced by cytokines derived from their niche ( Strauss, 2005 ; Jindal, 2015 ). As the first component of our visual pathway they are essential for sight and therefore hugely impactful on human health and quality of life.

In the United States alone, visual impairment not due to a refractive error affects 2% of the adult population, over six million people, with associated costs exceeding $5.5 billion annually ( Congdon et al., 2004 ; Frick et al., 2007 ; Chou et al., 2013 ). When looking at age-related macular degeneration (AMD) specifically, the most common cause of visual dysfunction in industrialized countries, incidence increases to 20% of people over 65 years of age ( Mitchell et al., 1995 ; Vingerling et al., 1995 ; Huang et al., 2003 ; Kashani et al., 2018 ). Absent injury or infection of the retina, most visual dysfunction qualifies as inherited retinal degeneration, a genetically heterogeneous group of disorders affecting the viability and function of rod and cone photoreceptors that can have autosomal, X-linked, and mitochondrial patterns of inheritance ( Farrar et al., 2015 ; Thompson et al., 2015 ). Over 200 causative genes have been identified that affect multiple pathways and mechanisms associated with vision dysfunctions ( Thompson et al., 2015 ). Retinal degenerative diseases can also be a consequence of genetic dysfunction in the underlying retinal pigment epithelium (RPE) or vasculature that support the retina ( Bhutto and Lutty, 2012 ; Alexander et al., 2015 ; Farrar et al., 2015 ). Given the complexity and scope of the underlying causes, curative treatments are not currently available, with most clinical interventions aiming to slow the progression of the disorders ( Rolling, 2004 ). This approach has seen some success, and studies have demonstrated attenuation of photoreceptor loss in animal models of retina degeneration using exogenous delivery of signaling molecules including pigment epithelium-derived factor (PEDF), brain-derived neurotrophic factor (BDNF), ciliary neurotrophic factor (CNTF) and several fibroblast growth factors (FGFs) ( LaVail et al., 1992 ; Cayouette and Gravel, 1997 ; Caffé et al., 2001 ; Green et al., 2001 ; Liang et al., 2001 ; Azadi et al., 2007 ; Moeller and Neubert, 2013 ; Kimura et al., 2016 ; Comitato et al., 2018 ). Understanding signaling pathways that maintain healthy photoreceptors is therefore critical to the development of new approaches to maintain existing photoreceptor cells, as well as potentially curative future cell-based therapies to replace them ( Pearson et al., 2012 ; Santos-Ferreira et al., 2014 ).

In the present study, we used our bioinformatics tool, receptoR, to identify activin receptor 2A ( Acvr2a ) as a target present in post-mitotic photoreceptors that can be activated to increase their in vitro survival.

Our overall approach comprises the identification and importing of relevant datasets; normalization to allow comparisons; initial automated analysis; and finally, user-interactive analysis to identify and extract specific information of interest ( Figure 1A ). As we have regular access to murine retinal cells via a secondary-use ethics approval, we elected to focus on mouse transcript data, although receptoR is able to work with both human and mouse data. This pipeline was developed with the non-bioinformatician user in mind, and our web-based graphical user interface facilitates the mining of datasets from the GEO database, categorization of the retrieved samples, and downstream analysis. Results presented here thus make use of the receptoR app except where explicitly stated; details of how the data is obtained and processed can be found in Section “Bioinformatics.”

An external file that holds a picture, illustration, etc.
Object name is fbioe-08-00534-g001.jpg

Digitized gene expression data can be mined to predict receptor pathways. Experimental microarray data is publicly available for download and reanalysis, and can be used to make informed hypotheses about cell- or tissue-specific receptor expression. (A) The pipeline for mining, categorizing and analyzing the microarray data. User interaction steps are outlined in red. (B) Our web application, receptoR, allows users to analyze microarray experiments by searching public experimental series for specific sample expression data (1) and categorizing each sample according to their experimental design (2). After retrieving and processing the expression data, predictions can be filtered by genes coding for specific receptor types (3), individual gene data can be sorted (4) and visualized (5). Absolute expression levels of receptor-coding genes can be clustered based on assigned categories and filtered based on differential expression between groups (6).

We began by acquiring appropriate data records to examine the transcriptome of photoreceptors and RPE. The GEO database was searched for G EO s a m ples; each sample record (assigned a unique accession number beginning with ‘GSM’) is the digitized image of the microarray after sample hybridization and represents the transcriptome of a single biological sample. For clarity we will refer to these samples as ‘microarrays’ or simply ‘arrays’ throughout the text. Typically, these arrays will have been deposited as part of a larger GEO experimental data series containing up to 10s of GSM array records. This process is summarized in Figure 1B with a detailed step-by-step manual included as Supplementary Methods S1 .

From these search results, we selected 78 microarrays from 15 unique series, falling into of one of three categories of interest: photoreceptors ( n = 30), RPE ( n = 11), or whole retina ( n = 37) for downstream analysis. Arrays were composed of purified cell populations and isolated tissues, with a median age of 7.5 days. Assignment to these categories was based on data included in the record and associated publications ( Supplementary Table S1 – this data is made available to the receptoR user during the search process). Of note, while photoreceptor maturation continues in this time window, they are consistently post-mitotic and represent a widely used model of photoreceptor behavior ( MacLaren et al., 2006 ; Brzezinski and Reh, 2015 ; Unachukwu et al., 2016 ; Waldron et al., 2018 ). By pooling arrays obtained in multiple experiments across multiple laboratories into a single biological category, we enhance statistical power and reduce technical bias (‘batch effect’) to which all high throughput transcriptome data are vulnerable ( Turnbull et al., 2012 ). Importantly, receptoR will generate a warning if all arrays in a group come from the same series, as technical and biological differences will then be confounded. After assigning each array to a category, receptoR will then download the full raw data files from the final array list from GEO. Transparently to the user (see section “Normalization and Differential Gene Expression”), array data is normalized and significant differentially expressed genes (DEGs) among groups are predicted, before this data is returned to be analyzed based both on high relative expression and differential expression among groups.

Because of the known role of cytokine signaling in preventing photoreceptor degeneration ( Tombran-Tink and Barnstable, 2003 ; Bradford et al., 2005 ; Dilly and Rajala, 2008 ) we decided to look more closely at this mode of signaling and filter those genes annotated to code for cytokine receptors. Interestingly, the activin receptor type 2A ( Acvr2a ) was predicted to be highly transcribed in photoreceptors, with comparable levels to those receptors whose ligands have been shown to be important for photoreceptor survival, including PEDF, CNTF, PDGF, IGF-1, and FGF-2 ( Tombran-Tink and Barnstable, 2003 ; Bradford et al., 2005 ; Dilly and Rajala, 2008 ; Di Pierdomenico et al., 2018 ; Li et al., 2018 ). To our knowledge the role of activin has not been studied in this cell type. Indeed, Acvr2a was predicted to be the second most highly transcribed cytokine receptor in photoreceptors ( Figure 2A ). To validate our bioinformatic hypotheses about receptor expression in photoreceptors, 14 genes predicted to be differentially expressed between photoreceptors and RPE or highly expressed in photoreceptors ( Table 1 ) were assayed by RT-qPCR in both tissues. When we compared the ΔC T values against the predicted gene transcription profiles we observed a significant correlation ( P = 0.047) ( Figure 2B ).

ReceptoR prediction for highly transcribed cytokine receptors in photoreceptor and RPE.

An external file that holds a picture, illustration, etc.
Object name is fbioe-08-00534-g002.jpg

Prediction of receptor gene expression in photoreceptors. (A) Predicted levels of the 20 most highly expressed receptor protein-coding genes in photoreceptors that are differentially expressed between photoreceptors and RPE. Receptor type is indicated by color and plots represent the distribution of all samples ( n = 30) across all probes. (B) Fourteen genes representing varying expression levels in photoreceptors and RPE were assayed using RT-qPCR ( n = 5). Pooled archival microarray data ( n = 30, photoreceptors; n = 11, RPE) showed a moderate, yet significant correlation with ΔC T values ( R = –0.38, P = 0.047). RT-qPCR expression was normalized to three endogenous control genes ( Hprt , Polr2a , and Tbp ). (C) Reduced gene ontology map showing photoreceptor-enriched functional categories for all DEGs between photoreceptors and RPE. Circle size is based on the frequency of the term within all UniProt annotations (smaller = less common and more specific) while the color of each circle displays the P -value associated with the photoreceptor enrichment of terms in that category (redder = lower). The x- and y-axes represent semantic distances between annotation keywords in arbitrary units.

To further explore our prediction that activin signaling may play an important role in photoreceptors, we exported the list of all predicted DEGs between photoreceptors and RPE from receptoR ( Figure 1B and Supplementary Table S2 ). Then, we annotated each transcript on the list to a gene ontology term to identify enriched pathways based on a ranked list of significant DEGs (adjusted P < 0.05; highest expression to lowest) subtracted from background transcription in the mouse ( Eden et al., 2007 , 2009 ). What we observed was a significant enrichment in transcripts coding for growth factor binding and extracellular compound binding proteins, including those binding transforming growth factor beta (TGF β; Figure 2C ). This finding is consistent with photoreceptors’ role in receiving supportive signals from the RPE and with the composition of the interphotoreceptor matrix to which photoreceptors bind ( Strauss, 2005 ; Ishikawa et al., 2015 ). Activin is a member of the TGF β superfamily, whose signaling is known to play a role in cell survival and growth ( Chen et al., 2002 ), and enrichment of this pathway supported the high predicted levels of Acvr2a in photoreceptors.

To assess the plausibility of the hypothetical activin signaling implied by the receptoR predictions, we examined the expression of type 2 activin receptors at the protein level in the mouse retina at post-natal day 4. We detected Acvr2a throughout the retina, with a particularly intense staining in the outer region of rhodopsin-positive cells ( Figure 3A ) – interestingly this is the region immediately adjacent to the RPE, which is a source for many photoreceptor-supportive signals. Acvr2b showed a similar staining pattern, with strong staining at the photoreceptor-RPE margin, as well as in the ganglion cell layer ( Figure 3B ).

An external file that holds a picture, illustration, etc.
Object name is fbioe-08-00534-g003.jpg

Immunohistochemistry reveals that both predicted activin receptors (Acvr2a and Acvr2b) are expressed in photoreceptors. To validate the cytokine receptor prediction by receptoR, sections of post-natal day 4 mouse retina (without RPE) were immunostained for activin type 2 receptors and well-known photoreceptor/retina markers. (A) Co-localization of rhodopsin (green) and Acvr2a (magenta) indicates the expression of this activin receptor in photoreceptor cells. (B) In contrast, Acvr2b (green) shows weaker, less specific staining throughout the retina, similar to the expression of Otx2 (magenta). Sections are 20 μm thick and were counterstained with DAPI (gray).

With activin receptor expression confirmed at the protein level in photoreceptors, we sought to elucidate the role of activin signaling in these cells. Mouse retinas were dissociated, and photoreceptors were magnetically separated based on the expression of the photoreceptor-specific surface marker Cd73 ( Koso et al., 2009 ; Eberle et al., 2011 ) and cultured in well plates in a minimal, defined media. Recombinant activin A was added at 10 ng/ml and cell viability was determined after 72 h in culture. Significantly more cells remained alive in activin-treated cultures compared to untreated controls ( Figures 4A,B ). As canonical Activin signaling involves a heterodimer of types 1 and 2 receptors, we treated parallel photoreceptor cultures with activin A in combination with the Tgfbr1/Acvr1b/Acvr1c inhibitor SB-431542 ( Inman, 2002 ), which negated this beneficial effect ( Figure 4C ). Treatment of photoreceptor cultures with the Acvrl1/Acvr1/BmpR1a/BmpR1b inhibitor LDN 193189 ( Sanvitale et al., 2013 ) did not attenuate the effect of activin A on photoreceptor survival (data not shown). This supports the hypothesis that activin A signaling in photoreceptors is mediated by canonical receptor signaling involving type 2 receptor complexes together with Acvr1b ( Pangas and Woodruff, 2000 ).

An external file that holds a picture, illustration, etc.
Object name is fbioe-08-00534-g004.jpg

Activin signaling increases photoreceptor viability. In order to assess Activin A function in vitro enriched post-mitotic photoreceptor precursors were cultured alone (A) , in the presence of activin A only (B) , and with both activin A and the inhibitor SB-431542 (C) for 72 h. Representative images of live/dead cell staining (FDA, green, alive; PI, red, dead) are shown. Quantitative automated image analysis was carried out to determine the relative number of living cells (D) for each condition ( N = 11, 16, 14 culture wells each for control, activin treated, and activin + SB groups, respectively). A median of 361 cells were counted per image (averaging 19, 000 total cells per treatment). Different letters represent a significant difference, P < 0.05.

Interestingly, while activin A signaling significantly increased survival in magnetically enriched photoreceptor cultures, subsequent immunostaining for the rod photoreceptor protein rhodopsin was negative (data not shown). To interrogate this finding further and assess whether the result was due to enhanced survival of some other cell type (which would be somewhat unexpected, as even pre-enrichment the retinal cell population is over 70% photoreceptors) ( Akimoto et al., 2006 ), we analyzed these cultures by RT-qPCR. Intriguingly, while transcript levels of the photoreceptor-specific transcription factor Nr2e3 ( Haider et al., 2000 ; Cheng et al., 2004 ; Peng et al., 2005 ) remained high, we observed significant decreases in the mature rod and cone markers Rho and Pde6h, respectively ( Figure 5 ), which suggests substantial new areas for future investigation in light of the role for activin signaling in retinal development (see below).

An external file that holds a picture, illustration, etc.
Object name is fbioe-08-00534-g005.jpg

Activin A suppresses the transcription of mature photoreceptor genes. The impact of activin A signaling on transcription was assessed following 72 h culture without activin, with activin A only, and with activin A and the inhibitor SB-431542. While high levels of the photoreceptor-specific transcription factor Nr2e3 do not appear to be affected by activin signaling, the mature rod and cone markers Rho and Pde6h show statistically significant reductions. RT-qPCR data was normalized to three endogenous control genes ( Hprt , Polr2a , and Tbp ) and is shown as ΔCt plotted on a negative Y-axis (higher expression at the top).

Our receptoR tool allows for the interrogation of previously captured transcriptome data, pooled across multiple experiments and research groups, that can be arbitrarily organized to identify patterns and differential expression. It has been estimated that at least 10 expression sets are necessary to establish the profile of a tissue ( Chang et al., 2011 ) and combining data from multiple laboratories has been shown to improve reproducibility of preclinical animal studies more effectively than increasing sample size alone ( Voelkl et al., 2018 ). Our approach facilitates transcriptome analysis, taking into account both of these considerations, and provides for significantly more efficient use of scarce research resources when compared to the time and effort required to design and implement an experiment to obtain similar results.

Although RNA sequencing (RNA-seq) is rapidly entering widespread use for transcriptome profiling and has the ability to identify transcripts without a priori knowledge of them, the large body of microarray data accumulated over the past decades is still a tremendously valuable resource. Unlike most publicly available RNA-seq data, available in raw sequence formats that require alignment before meta analyses can be run ( Lachmann et al., 2018 ), microarray probes are well-annotated and easily converted to gene expression information. Microarrays also offer a lower “barrier to entry” for researchers new to bioinformatics, particularly where the focus is on high-level behavior rather than rare transcripts or splice variants. The a priori design of a microarray experiment, while unsuited to the discovery of new transcripts, offers the ability to quickly parse expression data and filter for well-annotated RNA species. These in turn, more likely code for receptor proteins with known ligands. However, RNA-seq offers a much greater detection power, both in its range and freedom from fixed probes. Our approach, detailed here, should work equally well with any ‘omics data type, with adjustments for input signal type and corresponding annotations. We anticipate that ongoing developments in RNA-seq data processing ( Bray et al., 2016 ) and decreasing costs of computing power will make on-the-fly RNA-seq processing of large numbers of datasets more practical for non-specialist laboratories in the near future, and plan to incorporate RNA-seq data analysis into future versions of receptoR.

While there are existing tools which make use of publicly available datasets stored on the GEO, to our knowledge receptoR is the first to allow users to assign arrays from multiple series records into customizable groups for downstream analysis (e.g., DEG prediction, cluster analysis). Recently, shinyGEO was developed with the purpose of examining gene expression and association with cancer survival ( Dumas et al., 2016 ), but is limited to a single data series and is focused on gene association as opposed to gene discovery. Also making use of the shiny framework, Shiny Transcriptome Analysis Resource Tool (START) allows for visualization and analysis of RNA-seq data, however it does not query the GEO database ( Nelson et al., 2017 ). Finally, GEO2R is GEO’s own tool and has related functionality in that groups can be redefined, and analysis undertaken on single GEO dataset. However, this analysis is not particularly straight forward for non-bioinformaticians as expression is restricted to probe-level data with several probes associated with one gene. Most importantly, GEO2R does not incorporate multiple series in the analysis.

We employed receptoR to generate hypotheses about cytokine receptor expression in mouse photoreceptors as compared to RPE to validate our bioinformatics approach. Of the multiple predictions, we chose to focus on activin signaling, confirming the presence of transcript (via RT-qPCR, Figure 2 ) and protein (via immunofluorescence, Figure 3 ). Given previous reports of a role for activin earlier in retinal development ( Bertacchi et al., 2015 ), we assessed whether activin plays a role in photoreceptor survival. We determined that activin treatment significantly enhances the survival of enriched primary photoreceptor cultures, and that this effect is precluded by pharmacological inhibition of the activin type 1 receptor b (Acvr1b, also known as Alk4), consistent with previous reports that activin acts through SMAD2/3-mediated pathways with no activation of either ERK or AKT during photoreceptor differentiation from embryonic stem cells ( Lu et al., 2017 ). This is not surprising as activin signaling has been long known for its neuroprotective effect ( Kupershmidt et al., 2007 ).

To the best of our knowledge, this effect of activin signaling on nominally post-mitotic mammalian photoreceptor precursors has not been previously identified. However, activin signaling is known to play an important role in patterning the optic vesicle in early eye development. At that developmental stage, activin signaling promotes the expression of RPE-specific genes at the expense of retina-specific genes ( Fuhrmann et al., 2000 ). Subsequently, activin signaling has been reported to promote cell cycle exit and differentiation into post-mitotic precursors in rodents ( Davis et al., 2000 ). During directed differentiation from mouse embryonic stem cells to photoreceptor precursors, Lu and colleagues showed increasing levels of Inhba , Acvr2a , and Acvr1b throughout culture, while the addition of exogenous activin A upregulated Otx2 and Crx ( Lu et al., 2017 ), which is in line with our findings (potentially increased Otx2 at P = 0.055, Figure 5 ). Our results showing that activin A-treated photoreceptors down regulate the rod photoreceptor markers Rho and Pde6h ( Figure 5 ) ( Swaroop et al., 2010 ; Brzezinski and Reh, 2015 ) likely reflect temporal changes in the role of activin signaling during development. Our findings are consistent with studies in chick retina cultures, where the role of activin during photoreceptor culture has been more extensively studied; treatment of photoreceptors with activin down regulates visual pigment genes including rhodopsin ( Belecky-Adams et al., 1999 ; Bradford et al., 2005 ). Activin has also been reported to have an inhibitory effect on the differentiation of cultured chick photoreceptors ( Belecky-Adams et al., 1999 ). The authors did not detect statistically significant increases in the number of live cells per dish in activin-treated groups, although the mean number of live cells was higher than controls at each time point from 48 h after plating onwards ( Belecky-Adams et al., 1999 ). Our observations that activin represses later photoreceptor markers suggest regression to an earlier developmental stage and could hold potential as a new source of photoreceptors ( Klassen et al., 2004 ), although significant further work will be required to confirm or refute this speculation.

In the nearer term, the ability of activin signaling to enhance photoreceptor survival has significant research potential, as the in vitro culture of primary photoreceptors is technically challenging, with the majority of cells dying shortly after initiating culture ( LaVail et al., 1998 ; Traverso et al., 2003 ). This can be attributed to the fact that photoreceptors are very dependent on incompletely understood signals present in their niche, which are lost following their isolation. As a result, many in vitro studies of primary mouse photoreceptors use early post-natal photoreceptors that have exited mitosis and are undergoing maturation ( Brzezinski and Reh, 2015 ; Unachukwu et al., 2016 ; Waldron et al., 2018 ). The inability to culture these cells efficiently has limited the ability of vision researchers to interrogate their behavior, and test the impact of various interventions such as neuroprotective molecules to combat retinal degeneration ( Skaper, 2012 ). Importantly, the fact that our array data was pooled from tissues isolated at a range of ages ( Supplementary Table 1 ) may have contributed to differences observed in gene expression between our array analysis and RT-qPCR ( Figure 2 ). While receptoR is dependent on the data available, our results demonstrate that analysis of large publicly-available datasets can reveal important new biological information.

Our meta-analysis approach to expression analysis is related to those used extensively in cancer research, where diverse transcriptomics datasets can be brought together to identify mutations and chromosomal duplications that increase patient susceptibility ( Newton and Wernisch, 2015 ). A similar approach has also been used to identify two genes that increase susceptibility to choroidal neovascularization in AMD, including a newly confirmed gene expressed in the retina ( Akagi-Kurashige et al., 2015 ). Notably, Unachukwu et al. (2016) used the popular expression analysis software Ingenuity Pathway Analysis (IPA; Qiagen) to cross-match ligands present in the light-damaged retinal microenvironment with receptors expressed in photoreceptor precursors. While their work elegantly confirmed a known signaling mechanism (SDF-1α-CXCR4) involved in axon growth and guidance in photoreceptors, the IPA platform is closed and based on manually curated datasets ( Unachukwu et al., 2016 ). Consequently, subscription fees make this a useful tool but not widely available. By contrast, the platform we present here makes use of publicly available datasets to generate hypotheses based on receptor signaling for non-experts in bioinformatics. Our tool allows for the straightforward analysis of cell communication pathways and makes use of multiple datasets to minimize laboratory or experimental bias.

While we cannot confirm the potential for activin signaling to affect photoreceptor differentiation, the identification of a new target to promote photoreceptor survival also has therapeutic implications, as photoreceptor degeneration is the leading cause of blindness in adults over 55 ( de Jong, 2006 ). In dry age-related macular degeneration (AMD), photoreceptor loss is secondary to diseased and degenerated RPE and photoreceptor degeneration can be delayed by supplementing the retina with key RPE secreted factors ( Tombran-Tink and Barnstable, 2003 ; Jayakody et al., 2015 ). It would be very interesting to test if supplementing the retina with activin A can delay photoreceptor degeneration in AMD animal models. A photoreceptor pro-survival factor could buy AMD patients time, reducing or delaying photoreceptor degeneration, and the activin signaling pathway has already been identified as a promising druggable target for other therapeutic indications (reviewed in Tsuchida et al., 2009 ). Such a factor could also be valuable in combination with cell-replacement therapies for diseased RPE ( da Cruz et al., 2018 ). While the work presented here is an interesting first step, characterization of the mechanism by which activin improves photoreceptor survival warrants further investigation.

In summary, our receptoR tool was able to raise the specific, testable hypothesis that activin signaling is active in post-mitotic photoreceptors and/or maturing precursors, which we confirmed with subsequent experiments. In an era of restricted research resources, we hope to increase research efficiency by facilitating re-use of existing datasets by non-specialists. We anticipate this tool will be particularly useful for non-bioinformaticians wishing to mine transcriptome data to generate hypotheses regarding cytokine signaling in their cell type of interest. We have made the source code for receptoR freely available at https://github.com/derektoms/receptoR and a live version can be accessed at https://www.ucalgary.ca/ungrinlab/receptoR .

Materials and Methods

Bioinformatics, querying public datasets.

Our bioinformatics platform utilizes the open source software suite Bioconductor ( Gentleman et al., 2004 ; Huber et al., 2015 ) based on the R programming language ( R Development Core Team, 2008 ). Using the R package GEOQuery ( Davis and Meltzer, 2007 ), we are able to import and process raw dataset files from the Gene Expression Omnibus (GEO).

To allow for straightforward integration of various datasets, the software makes use of only non-competitive (i.e., single color) arrays where each array contains a single biological sample that has been hybridized and its corresponding signals digitized. Data obtained from such arrays has been shown to be of the same quality as that obtained from two color, competitive arrays ( Patterson et al., 2006 ). We also chose to initially limit microarray data to the two most common in situ oligonucleotide array platforms, ensuring consistency and simple quality control between experimental samples. This permits the pooling of multiple arrays, one per sample, as would be conducted in a wet lab microarray experiment. For mouse data, we use the GeneChip Mouse Genome 430 2.0 Array ( {"type":"entrez-geo","attrs":{"text":"GPL1260","term_id":"1260"}} GPL1260 ) platform, while human data is collected from the Affymetrix Human Genome U133 Plus 2.0 Array (GPL 570). As of December 2018, these two arrays contained 53 460 and 144 134 sample records, respectively.

Normalization and Differential Gene Expression

Retrieved expression data was then normalized using the log scale multi-array analysis (RMA) algorithm ( Irizarry et al., 2003 ). Briefly, arrays were background corrected, normalized using quantile normalization, and log transformed. Following array normalization, we analyzed gene expression profiles among groups by fitting a multiple linear model based on probe level expression, with contrasts set between all our defined biological groups ( Smyth, 2004 ). To predict DEGs between groups in a biologically meaningful way, we performed significance testing relative to a threshold, namely a log-fold change of greater than one ( McCarthy and Smyth, 2009 ). We chose a lower threshold because the purpose of our application is an exploratory analysis of many expression datasets and a larger number of false positives was permissible given the requirement for external validation of these predictions.

Normalized expression data was analyzed by sparse partial least squares discriminate analysis (sPLS-DA) to determine membership for each observation of gene expression across all arrays ( Lê Cao et al., 2011 ; González et al., 2012 ; Rohart et al., 2017 ). In other words, differences in expression in these genes were rated in terms of their abilities to discriminate groups. This allows for the identification and selection of relevant genes from each biological group.

Gene Ontology

The lists of receptor type were generated using KEGG and Panther annotations genes to ensure full coverage of biological pathways ( Mi et al., 2010 ; Kanehisa et al., 2016 ) for both mouse and human genes, found in Supplementary Table S3 . These gene lists are used to filter expression data to reduce the search to molecular receptors-coding genes.

Differential expression analysis was exported from receptoR, and used to generate a list of enriched gene ontology (GO) terms using GOrilla ( Eden et al., 2007 , 2009 ). All probes available on the microarray were used to generate enriched GO terms relating to biological function. From the input list, 8,266 of 8,478 gene terms were recognized, of which 5,024 had an associated GO term. All expression values and enrichment analysis are found in Supplementary Table S4 . For visualization ( Figure 2C ), results were summarized by removing redundant GO terms by using REVIGO ( Supek et al., 2011 ).

Reverse Transcription Quantitative PCR (RT-qPCR)

RNA was isolated using the Norgen Total RNA Purification kit (Norgen Biotek cat. no. 37500) and quantified on an Implen spectrophotometer. Between 300 and 1000 ng of RNA was used for each reverse transcription reaction (iScript, BioRad cat. no. 1708841). PCR reactions were assembled using PowerUp SYBR Green Master Mix (Applied Biosystems, cat. no. A25742) and run on a Step One Plus Real-Time PCR System (Applied Biosystems). cDNA was tested using known reference primers before being used to quantitative experiments. Primers were also tested and found to have efficiencies of 100 ± 10%. Primer sequences are listed in Table 2 . Values for non-detects were imputed from reaction values in the same biological category (e.g., other photoreceptor samples) ( McCall et al., 2014 ) and relative expression (ΔCt) was calculated using an average of three stable endogenous controls: Polr2a , Tbp , Hprt ( Vandesompele et al., 2002 ). Statistical differences were determined by a Mann–Whitney U -test.

List of RT-qPCR primers.

Mouse Retina Dissection, Photoreceptor Precursor Enrichment, and Culture

All experiments involving animals were carried out in accordance with the recommendations of the Canadian Council on Animal Care’s “Guide to the Care and Use of Experimental Animals.” The protocol was approved by the Animal Care Committee at the University of Calgary. Retinal tissues were accessed under a secondary-use protocol, from animals freshly euthanized as controls in other experiments in neighboring laboratories, where the retinal tissue would otherwise be discarded. Many tissue types can be easily and regularly obtained in this way, and we encourage researchers to assess how these might reduce their own need for animals, and the associated costs. Retinas from euthanized mice at post-natal day (PN)4 were dissected from the eyes and dissociated in DPBS without calcium or magnesium (DPBS) containing 0.125% trypsin (Sigma cat. no. T1005) and 0.3 mg/ml DNaseI (EMD Millipore cat. no. 260913) for 4–6 min in a shaking waterbath at 37°C ( f = 120 rpm). The enzymatic solution was stopped by adding an equal volume of DPBS containing 20% FBS. The cell suspension was triturated with a fire-polished glass pasteur pipette before being spun at 333 rcf for 5 min. Supernatant was removed and cells were resuspended in 500 μl EasySep buffer (StemCell cat. no. 20144) before being magnetically enriched on an EasySep system using the Mouse PE Positive Selection kit (StemCell cat. no. 18554) according to manufacturer’s directions. Anti-CD73 antibody conjugated to PE (BD cat. no. 550741) was used at 3 μg/ml to select a positive photoreceptor fraction.

Prior to culture, black-walled 96 well plates (Grenier cat. no. 655090) were coated with 50 μg/ml poly- D -lysine (Corning cat no. 354210) for at least an hour before being washed twice with sterile distilled water and allowed to dry. Photoreceptors were plated at a density of 3.1 × 10 5 cells/cm 2 in a volume of 200 μl per well (media depth of 6 mm), and cultured in DMEM/F12 (70:30) supplemented with 2% B-27 and 1% antibiotic-antimycotic (all Thermo/Gibco cat. no. 11965, 11765, 17504, 15240).

Photoreceptor Viability Assay

After 72 h in culture, photoreceptor media was replaced with DPBS containing 5 μg/ml Hoechst 33342 (Invitrogen cat. no. {"type":"entrez-nucleotide","attrs":{"text":"H21492","term_id":"890187","term_text":"H21492"}} H21492 ), 50 ng/ml fluorescein diacetate (FDA; Sigma cat. no. F7378), and 2.5 μg/ml propidium iodide (PI; Sigma cat. no. P4170) to detect nuclei, live and dead cells, respectively. After 5 min incubation at 37°C, this staining solution was removed and the cells were washed with DPBS. Plates were imaged on an Olympus IX83 Microscope at 200X magnification using MicroManager software ( Edelstein et al., 2014 ). Four or five non-overlapping images per well were taken.

Images were processed using a Cell Profiler pipeline that assessed co-localization of a nucleus with either the live or dead stains to determine the percentage of cells alive ( Carpenter et al., 2006 ). Images were pre-processed using ImageJ by subtracting the background from the captured images. Primary objects were identified and subsequently related to establish a parent–child relationship between blue–red objects and blue–green objects. Finally, the filter objects module was used to quantify the number of colocalized objects that were both blue and red, and objects that were blue and green, as the blue–red objects were considered “dead” cells and the blue–green objects were considered “live” cells. Viability was calculated as the number of “live” cells divided by the sum of “live” and “dead” cells.

Results from all images of a single well were averaged and this value represents a biological replicate. Experiments were repeated three times with four litters of mice, representing a total N = 11, 16, 14 culture wells each for control, activin treated, and activin + SB groups, respectively. Differences in viability between groups were determined using a one-way ANOVA test followed by a Tukey Honest Significant Difference test.

Immunostaining

Whole eye sections were prepared by fixing dissected PN4 mouse eyes with the RPE removed in 4% paraformaldehyde (PFA) in DPBS overnight at 4°C. Fixed eyes were then transferred to a 15% sucrose solution for 24 h and a 30% sucrose solution for 24 h at 4°C. Cryoprotected eyes were then embedded in clear frozen section compound (VWR cat. no. 95057-838) compound before being frozen in a dry ice and 2-propanol slurry. Slides were prepared by cutting 20 μm sections and mounting them to charged slides. The following antibodies were used to detect the various epitopes: activin receptor type 2A (Abcam, cat. no. ab96793), Activin receptor type 2B (Abcam, cat. no. ab76940), OTX2 (Abcam, cat. no. ab114138), and Rhodopsin (Abcam, cat. no. 98887).

Cultured photoreceptors were fixed by adding 100 μl of 4% PFA in DPBS to each well, and incubated for 5 min at room temperature (RT; 22°C). The wells were then washed three times with DPBS. Photoreceptors were permeabilized by incubating each well with 100 μl of 0.1% Triton X-100 (Amresco cat. no. M143) for 5 min at RT, followed by three more washes in DPBS. Each well was blocked with 1% (w/v) bovine serum albumin (Sigma cat. no. A3294) in DPBS for 10 min at RT followed by another three washes. Primary antibodies were then added to the samples (1/500) in 0.5% BSA in DPBS (100 μl/well). Primary antibody is incubated over night at 4°C. Samples were then washed three times in DPBS and blocked with 1% BSA for 10 min at room temperature. Secondary antibody was then added (1/1000) in 0.1% BSA (100 ul/well) and incubated for 1 h at RT. Cells were then washed (3X) with DPBS and stained with 100 ul of (1/2000) 4′,6-diamidino-2-phenylindole (DAPI; Thermo Fisher Scientific cat. no. D1306) solution for 5 min at room temperature. The samples were then washed (3X) with PBS and stored in the dark at 4°C until they were imaged.

Data Availability Statement

Ethics statement.

This study was carried out in accordance with the recommendations of the Canadian Council on Animal Care’s Guide to the Care and Use of Experimental Animals. The protocol was approved by the Animal Care Committee at the University of Calgary.

Author Contributions

DT, AA-A, and MU designed the experiments and interpreted the results. DT and AA-A wrote the manuscript. AA-A, DT, SS, and QT conducted the biological validation experiments and analyzed the data. DT and MW developed the web tool. AA-A and DT contributed to the mathematical and statistical methods. MU originated the concept and edited the manuscript. All authors reviewed and approved the final manuscript.

Conflict of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Acknowledgments

The authors would like to thank the Centre for Genome Engineering for providing secondary-use mice and Valentyna Maslieieva for her assistance with preparing retina sections.

Funding. This work was funded by a CNIB Barbara Tuck MacPhee Research Grant (DT and MU); NSERC RGPIN-201404874 (MU); and an Alberta Children’s Hospital Research Institute post-doctoral fellowship (DT) and graduate studentship (AA-A).

Supplementary Material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fbioe.2020.00534/full#supplementary-material

Annotation and accession information for all datasets used in this study.

Differentially expressed gene lists.

Receptor type gene lists.

GO enrichment analysis.

Step-by-step instructions for using the online version of receptoR.

  • Akagi-Kurashige Y., Yamashiro K., Gotoh N., Miyake M., Morooka S., Yoshikawa M., et al. (2015). MMP20 and ARMS2/HTRA1 are associated with neovascular lesion size in age-related macular degeneration. Ophthalmology 122 2295.e2–2302.e2. 10.1016/j.ophtha.2015.07.032 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Akimoto M., Cheng H., Zhu D., Brzezinski J. A., Khanna R., Filippova E., et al. (2006). Targeting of GFP to newborn rods by Nrl promoter and temporal expression profiling of flow-sorted photoreceptors. Proc. Natl. Acad. Sci. U.S.A. 103 3890–3895. 10.1073/pnas.0508214103 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Alexander P., Thomson H. A. J., Luff A. J., Lotery A. J. (2015). Retinal pigment epithelium transplantation: concepts, challenges and future prospects. Eye 29 992–1002. 10.1038/eye.2015.89 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Alimbetov D., Askarova S., Umbayev B., Davis T., Kipling D. (2018). Pharmacological targeting of cell cycle, apoptotic and cell adhesion signaling pathways implicated in chemoresistance of cancer cells. Int. J. Mol. Sci. 19 : 1690 . 10.3390/ijms19061690 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Allanach K., Mengel M., Einecke G., Sis B., Hidalgo L. G., Mueller T., et al. (2008). Comparing microarray versus RT-PCR assessment of renal allograft biopsies: similar performance despite different dynamic ranges. Am. J. Transpl. 8 1006–1015. 10.1111/j.1600-6143.2008.02199.x [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Azadi S., Johnson L. E., Paquet-Durand F., Perez M.-T. R., Zhang Y., Ekström P. A. R., et al. (2007). CNTF+BDNF treatment and neuroprotective pathways in the rd1 mouse retina. Brain Res. 1129 116–129. 10.1016/j.brainres.2006.10.031 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Ball C., Brazma A., Causton H., Chervitz S., Edgar R., Hingamp P., et al. (2004). Standards for microarray data: an open letter. Environ. Health Perspect. 112 A666–A667. 10.1289/ehp.112-1277123 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Belecky-Adams T. L., Scheurer D., Adler R. (1999). Activin family members in the developing chick retina: expression patterns, protein distribution, andin VitroEffects. Dev. Biol. 210 107–123. 10.1006/dbio.1999.9268 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Bertacchi M., Lupo G., Pandolfini L., Casarosa S., D’Onofrio M., Pedersen R. A., et al. (2015). Activin/nodal signaling supports retinal progenitor specification in a narrow time window during pluripotent stem cell neuralization. Stem Cell Rep. 5 532–545. 10.1016/j.stemcr.2015.08.011 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Bhutto I., Lutty G. (2012). Understanding age-related macular degeneration (AMD): relationships between the photoreceptor/retinal pigment epithelium/Bruch’s membrane/choriocapillaris complex. Mol. Aspects Med. 33 295–317. 10.1016/j.mam.2012.04.005 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Bradford R. L., Wang C., Zack D. J., Adler R. (2005). Roles of cell-intrinsic and microenvironmental factors in photoreceptor cell differentiation. Dev. Biol. 286 31–45. 10.1016/j.ydbio.2005.07.002 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Bray N. L., Pimentel H., Melsted P., Pachter L. (2016). Near-optimal probabilistic RNA-seq quantification. Nat. Biotechnol. 34 525–527. 10.1038/nbt0816-888d [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Brazma A., Hingamp P., Quackenbush J., Sherlock G., Spellman P., Stoeckert C., et al. (2001). Minimum information about a microarray experiment (MIAME)-toward standards for microarray data. Nat. Genet. 29 365–371. 10.1038/ng1201-365 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Brzezinski J. A., Reh T. A. (2015). Photoreceptor cell fate specification in vertebrates. Development 142 3263–3273. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Caffé A. R., Söderpalm A. K., Holmqvist I., Van Veen T. (2001). A combination of CNTF and BDNF rescues rd photoreceptors but changes rod differentiation in the presence of RPE in retinal explants. Investig. Ophthalmol. Vis. Sci. 42 275–282. [ PubMed ] [ Google Scholar ]
  • Cai H., Dong L. Q., Liu F. (2016). Recent advances in adipose mTOR signaling and function: therapeutic prospects. Trends Pharmacol. Sci. 37 303–317. 10.1016/j.tips.2015.11.011 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Carpenter A. E., Jones T. R., Lamprecht M. R., Clarke C., Kang I. H., Friman O., et al. (2006). CellProfiler: image analysis software for identifying and quantifying cell phenotypes. Genome Biol. 7 : R100 . 10.1186/gb-2006-7-10-r100 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Cayouette M., Gravel C. (1997). Adenovirus-mediated gene transfer of ciliary neurotrophic factor can prevent photoreceptor degeneration in the retinal degeneration (rd) mouse. Hum. Gene Ther. 8 423–430. 10.1089/hum.1997.8.4-423 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Cendrowski J., Mamińska A., Miaczynska M. (2016). Endocytic regulation of cytokine receptor signaling. Cytokine Growth Fact. Rev. 32 63–73. 10.1016/j.cytogfr.2016.07.002 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Chang C.-W., Cheng W.-C., Chen C.-R., Shu W.-Y., Tsai M.-L., Huang C.-L., et al. (2011). Identification of human housekeeping genes and tissue-selective genes by microarray meta-analysis. PLoS One 6 : e22859 . 10.1371/journal.pone.0022859 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Chen Y.-G., Lui H. M., Lin S.-L., Lee J. M., Ying S.-Y. (2002). Regulation of cell proliferation, apoptosis, and carcinogenesis by activin. Exp. Biol. Med. 227 75–87. 10.1177/153537020623100507 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Cheng H., Khanna H., Oh E. C. T., Hicks D., Mitton K. P., Swaroop A. (2004). Photoreceptor-specific nuclear receptor NR2E3 functions as a transcriptional activator in rod photoreceptors. Hum. Mol. Genet. 13 1563–1575. 10.1093/hmg/ddh173 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Chou C. F., Frances Cotch M., Vitale S., Zhang X., Klein R., Friedman D. S., et al. (2013). Age-related eye diseases and visual impairment among U.S. adults. Am. J. Prev. Med. 45 29–35. 10.1016/j.amepre.2013.02.018 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Comitato A., Subramanian P., Turchiano G., Montanari M., Becerra S. P., Marigo V. (2018). Pigment epithelium-derived factor hinders photoreceptor cell death by reducing intracellular calcium in the degenerating retina. Cell Death Dis. 9 : 560 . 10.1038/s41419-018-0613-y [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Congdon N., O’Colmain B., Klaver C. C. (2004). Causes and prevalence of visual im- pairment among adults in the United States. Arch. Ophthalmol. 122 477–485. [ PubMed ] [ Google Scholar ]
  • da Cruz L., Fynes K., Georgiadis O., Kerby J., Luo Y. H., Ahmado A., et al. (2018). Phase 1 clinical study of an embryonic stem cell-derived retinal pigment epithelium patch in age-related macular degeneration. Nat. Biotechnol. 36 328–337. 10.1038/nbt.4114 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Dalby M. J., García A. J., Salmeron-Sanchez M. (2018). Receptor control in mesenchymal stem cell engineering. Nat. Rev. Mater. 3 ;17091. [ Google Scholar ]
  • Davis A. A., Matzuk M. M., Reh T. A. (2000). Activin A promotes progenitor differentiation into photoreceptors in rodent retina. Mol. Cell. Neurosci. 15 11–21. 10.1006/mcne.1999.0806 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Davis S., Meltzer P. S. (2007). GEOquery: a bridge between the gene expression omnibus (GEO) and BioConductor. Bioinformatics 14 1846–1847. 10.1093/bioinformatics/btm254 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • de Jong P. T. V. (2006). Mechanisms of disease age-related macular degeneration. N. Engl. J. Med. 355 1474–1485. 10.1016/j.ophtha.2018.03.034 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Di Pierdomenico J., Scholz R., Valiente-Soriano F. J., Sánchez-Migallón M. C., Vidal-Sanz M., Langmann T., et al. (2018). Neuroprotective effects of FGF2 and minocycline in two animal models of inherited retinal degeneration. Investig. Opthalmol. Vis. Sci. 59 : 4392 . 10.1167/iovs.18-24621 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Dilly A. K., Rajala R. V. S. (2008). Insulin growth factor 1 receptor/PI3K/AKT survival pathway in outer segment membranes of rod photoreceptors. Investig. Ophthalmol. Vis. Sci. 49 4765–4773. 10.1167/iovs.08-2286 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Dumas J., Gargano M. A., Dancik G. M. (2016). shinyGEO: a web-based application for analyzing gene expression omnibus datasets. Bioinformatics 32 3679–3681. 10.1093/bioinformatics/btw519 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Eberle D., Schubert S., Postel K., Corbeil D., Ader M. (2011). Increased integration of transplanted CD73-positive photoreceptor precursors into adult mouse retina. Investig. Ophthalmol. Vis. Sci. 52 6462–6471. 10.1167/iovs.11-7399 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Economides A. N., Carpenter L. R., Rudge J. S., Wong V., Koehler-Stec E. M., Hartnett C., et al. (2003). Cytokine traps: multi-component, high-affinity blockers of cytokine action. Nat. Med. 9 47–52. 10.1038/nm811 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Edelstein A. D., Tsuchida M. A., Amodaj N., Pinkard H., Vale R. D., Stuurman N. (2014). Advanced methods of microscope control using μManager software. J. Biol. Methods 1 : e10 . 10.14440/jbm.2014.36 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Eden E., Lipson D., Yogev S., Yakhini Z. (2007). Discovering motifs in ranked lists of DNA sequences. PLoS Comput. Biol. 3 : e39 . 10.1371/journal.pcbi.0030039 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Eden E., Navon R., Steinfeld I., Lipson D., Yakhini Z. (2009). GOrilla: a tool for discovery and visualization of enriched GO terms in ranked gene lists. BMC Bioinformatics 10 : 48 . 10.1186/1471-2105-10-48 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Fan F., He Z., Kong L.-L., Chen Q., Yuan Q., Zhang S., et al. (2016). Pharmacological targeting of kinases MST1 and MST2 augments tissue repair and regeneration. Sci. Transl. Med. 8 : 352ra108 . 10.1126/scitranslmed.aaf2304 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Farrar G. J., Millington-Ward S., Palfi A., Chadderton N., Kenna P. F. (2015). Gene Therapy for Dominantly Inherited Retinal Degeneration. Berlin: Springer, 43–60. [ Google Scholar ]
  • Frick K. D., Gower E. W., Kempen J. H., Wolff J. L. (2007). Socioeconomics and health services economic impact of visual impairment and blindness in the United States. Arch. Ophthalmol. 125 544–550. [ PubMed ] [ Google Scholar ]
  • Fuhrmann S., Levine E. M., Reh T. A. (2000). Extraocular mesenchyme patterns the optic vesicle during early eye development in the embryonic chick. Development 127 4599–4609. [ PubMed ] [ Google Scholar ]
  • Gentleman R. C., Carey V. J., Bates D. M., Bolstad B., Dettling M., Dudoit S., et al. (2004). Bioconductor: open software development for computational biology and bioinformatics. Genome Biol. 5 : R80 . 10.1186/gb-2004-5-10-r80 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • González I., Cao K.-A. L., Davis M. J., Déjean S. (2012). Visualising associations between paired “omics” data sets. BioData Min. 5 : 19 . 10.1186/1756-0381-5-19 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Green E. S., Rendahl K. G., Zhou S., Ladner M., Coyne M., Srivastava R., et al. (2001). Two animal models of retinal degeneration are rescued by recombinant adeno-associated virus-mediated production of FGF-5 and FGF-18. Mol. Ther. 3 507–515. 10.1006/mthe.2001.0289 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Haider N. B., Jacobson S. G., Cideciyan A. V., Swiderski R., Streb L. M., Searby C., et al. (2000). Mutation of a nuclear receptor gene, NR2E3, causes enhanced S cone syndrome, a disorder of retinal cell fate. Nat. Genet. 24 127–131. 10.1038/72777 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Harskamp L. R., Gansevoort R. T., van Goor H., Meijer E. (2016). The epidermal growth factor receptor pathway in chronic kidney diseases. Nat. Rev. Nephrol. 12 496–506. 10.1038/nrneph.2016.91 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Heier J. S., Brown D. M., Chong V., Korobelnik J.-F., Kaiser P. K., Nguyen Q. D., et al. (2012). Intravitreal aflibercept (VEGF Trap-Eye) in wet age-related macular degeneration. Ophthalmology 119 2537–2548. 10.1016/j.ophtha.2012.09.006 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Huang G. H., Klein R., Klein B. E. K., Tomany S. C. (2003). Birth cohort effect on prevalence of age-related maculopathy in the beaver dam eye Study. Am. J. Epidemiol. 157 721–729. 10.1093/aje/kwg011 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Huber W., Carey V. J., Gentleman R., Anders S., Carlson M., Carvalho B. S., et al. (2015). Orchestrating high-throughput genomic analysis with Bioconductor. Nat. Methods 12 115–121. 10.1038/nmeth.3252 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Inman G. J. (2002). SB-431542 is a potent and specific inhibitor of transforming growth factor-beta superfamily type I activin receptor-like kinase (ALK) receptors ALK4, ALK5, and ALK7. Mol. Pharmacol. 62 65–74. 10.1124/mol.62.1.65 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Irizarry R. A., Hobbs B., Collin F., Beazer-Barclay Y. D., Antonellis K. J., Scherf U., et al. (2003). Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics 4 249–264. 10.1093/biostatistics/4.2.249 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Ishikawa M., Sawada Y., Yoshitomi T. (2015). Structure and function of the interphotoreceptor matrix surrounding retinal photoreceptor cells. Exp. Eye Res. 133 3–18. 10.1016/j.exer.2015.02.017 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Jackson H. J., Rafiq S., Brentjens R. J. (2016). Driving CAR T-cells forward. Nat. Rev. Clin. Oncol. 13 370–383. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Jayakody S. A., Gonzalez-Cordero A., Ali R. R., Pearson R. A. (2015). Cellular strategies for retinal repair by photoreceptor replacement. Prog. Retin. Eye Res. 46 31–66. 10.1016/j.preteyeres.2015.01.003 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Jindal V. (2015). Neurodegeneration as a primary change and role of neuroprotection in diabetic retinopathy. Mol. Neurobiol. 51 878–884. 10.1007/s12035-014-8732-7 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Kanehisa M., Sato Y., Kawashima M., Furumichi M., Tanabe M. (2016). KEGG as a reference resource for gene and protein annotation. Nucleic Acids Res. 44 D457–D462. 10.1093/nar/gkv1070 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Kashani A. H., Lebkowski J. S., Rahhal F. M., Avery R. L., Salehi-Had H., Dang W., et al. (2018). A bioengineered retinal pigment epithelial monolayer for advanced, dry age-related macular degeneration. Sci. Transl. Med. 10 1–11. 10.1126/scitranslmed.aao4097 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Kimura A., Namekata K., Guo X., Harada C., Harada T. (2016). Neuroprotection, growth factors and BDNF-TRKB signalling in retinal degeneration. Int. J. Mol. Sci. 17 : 1584 . 10.3390/ijms17091584 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Klassen H. J., Ng T. F., Kurimoto Y., Kirov I., Shatos M., Coffey P., et al. (2004). Multipotent retinal progenitors express developmental markers, differentiate into retinal neurons, and preserve light-mediated behavior. Investig. Ophthalmol. Vis. Sci. 45 4167–4173. 10.1167/iovs.04-0511 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Koso H., Minami C., Tabata Y., Inoue M., Sasaki E., Satoh S., et al. (2009). CD73, a novel cell surface antigen that characterizes retinal photoreceptor precursor cells. Investig. Ophthalmol. Vis. Sci. 50 5411–5418. 10.1167/iovs.08-3246 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Krebs D. L., Hilton D. J. (2000). SOCS: physiological suppressors of cytokine signaling. J. Cell Sci. 113 (Pt 1), 2813–2819. [ PubMed ] [ Google Scholar ]
  • Kupershmidt L., Amit T., Bar-Am O., Youdim M. B. H., Blumenfeld Z. (2007). The neuroprotective effect of Activin A and B: implication for neurodegenerative diseases. J. Neurochem. 103 962–971. 10.1111/j.1471-4159.2007.04785.x [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Lachmann A., Torre D., Keenan A. B., Jagodnik K. M., Lee H. J., Wang L., et al. (2018). Massive mining of publicly available RNA-seq data from human and mouse. Nat. Commun. 9 : 1366 . 10.1038/s41467-018-03751-6 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Larkin J. E., Frank B. C., Gavras H., Sultana R., Quackenbush J. (2005). Independence and reproducibility across microarray platforms. Nat. Methods 2 337–344. 10.1038/nmeth757 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • LaVail M. M., Unoki K., Yasumura D., Matthes M. T., Yancopoulos G. D., Steinberg R. H. (1992). Multiple growth factors, cytokines, and neurotrophins rescue photoreceptors from the damaging effects of constant light. Proc. Natl. Acad. Sci. U.S.A. 89 11249–11253. 10.1073/pnas.89.23.11249 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • LaVail M. M., Yasumura D., Matthes M. T., Lau-Villacorta C., Unoki K., Sung C. H., et al. (1998). Protection of mouse photoreceptors by survival factors in retinal degenerations. Investig. Ophthalmol. Vis. Sci. 39 592–602. [ PubMed ] [ Google Scholar ]
  • Lê Cao K.-A., Boitard S., Besse P. (2011). Sparse PLS discriminant analysis: biologically relevant feature selection and graphical displays for multiclass problems. BMC Bioinformatics 12 : 253 . 10.1186/1471-2105-12-253 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Li S., Sato K., Gordon W. C., Sendtner M., Bazan N. G., Jin M. (2018). Ciliary neurotrophic factor (CNTF) protects retinal cone and rod photoreceptors by suppressing excessive formation of the visual pigments. J. Biol. Chem. 293 15256–15268. 10.1074/jbc.RA118.004008 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Liang F.-Q., Dejneka N. S., Cohen D. R., Krasnoperova N. V., Lem J., Maguire A. M., et al. (2001). AAV-mediated delivery of ciliary neurotrophic factor prolongs photoreceptor survival in the rhodopsin knockout mouse. Mol. Ther. 3 241–248. 10.1006/mthe.2000.0252 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Lipsitz Y. Y., Woodford C., Yin T., Hanna J. H., Zandstra P. W. (2018). Modulating cell state to enhance suspension expansion of human pluripotent stem cells. Proc. Natl. Acad. Sci. U.S.A. 115 : 201714099 . 10.1073/pnas.1714099115 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Loebel C., Burdick J. A. (2018). Engineering stem and stromal cell therapies for musculoskeletal tissue repair. Cell Stem Cell 22 325–339. 10.1016/j.stem.2018.01.014 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Lowe R., Shirley N., Bleackley M., Dolan S., Shafee T. (2017). Transcriptomics technologies. PLoS Comput. Biol. 13 : e1005457 . 10.1371/journal.pcbi.1005457 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Lu A. Q., Popova E. Y., Barnstable C. J. (2017). Activin signals through SMAD2/3 to increase photoreceptor precursor yield during embryonic stem cell differentiation. Stem Cell Rep. 9 838–852. 10.1016/j.stemcr.2017.06.021 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • MacLaren R. E., Pearson R. A., MacNeil A., Douglas R. H., Salt T. E., Akimoto M., et al. (2006). Retinal repair by transplantation of photoreceptor precursors. Nature 444 203–207. 10.1051/medsci/2007233240 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Manoukian O. S., Arul M. R., Rudraiah S., Kalajzic I., Kumbar S. G. (2019). Aligned microchannel polymer-nanotube composites for peripheral nerve regeneration: small molecule drug delivery. J. Control. Release 296 54–67. 10.1016/j.jconrel.2019.01.013 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Marioni J. C., Mason C. E., Mane S. M., Stephens M., Gilad Y. (2008). RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays. Genome Res. 18 1509–1517. 10.1101/gr.079558.108 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • McCall M. N., McMurray H. R., Land H., Almudevar A. (2014). On non-detects in qPCR data. Bioinformatics 30 2310–2316. 10.1093/bioinformatics/btu239 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • McCarthy D. J., Smyth G. K. (2009). Testing significance relative to a fold-change threshold is a TREAT. Bioinformatics 25 765–771. 10.1093/bioinformatics/btp053 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Mi H., Dong Q., Muruganujan A., Gaudet P., Lewis S., Thomas P. D. (2010). PANTHER version 7: improved phylogenetic trees, orthologs and collaboration with the gene ontology consortium. Nucleic Acids Res. 38 D204–D210. 10.1093/nar/gkp1019 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Mitchell P., Smith W., Attebo K., Wang J. J. (1995). Prevalence of age-related maculopathy in Australia. Ophthalmology 102 1450–1460. 10.1016/s0161-6420(95)30846-9 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Miyajima A., Kitamura T., Harada N., Yokota T., Arai K. (1992). Cytokine receptors and signal transduction. Annu. Rev. Immunol. 10 295–331. [ PubMed ] [ Google Scholar ]
  • Moeller H. V., Neubert M. G. (2013). Habitat damage, marine reserves, and the value of spatial management. Ecol. Appl. 23 959–971. 10.1890/12-0447.1 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Nelson J. W., Sklenar J., Barnes A. P., Minnier J. (2017). The START App: a web-based RNAseq analysis and visualization resource. Bioinformatics 33 : 447 . 10.1093/bioinformatics/btw624 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Newton R., Wernisch L. (2015). Investigating inter-chromosomal regulatory relationships through a comprehensive meta-analysis of matched copy number and transcriptomics data sets. BMC Genomics 16 : 967 . 10.1186/s12864-015-2100-5 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Pangas S. A., Woodruff T. K. (2000). Activin signal transduction pathways. Trends Endocrinol. Metab. 11 309–314. [ PubMed ] [ Google Scholar ]
  • Patterson T. A., Lobenhofer E. K., Fulmer-Smentek S. B., Collins P. J., Chu T.-M., Bao W., et al. (2006). Performance comparison of one-color and two-color platforms within the microarray quality control (MAQC) project. Nat. Biotechnol. 24 1140–1150. 10.1038/nbt1242 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Pavlos N. J., Friedman P. A. (2017). GPCR signaling and trafficking: the long and short of it. Trends Endocrinol. Metab. 28 213–226. 10.1016/j.tem.2016.10.007 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Pearson R. A., Barber A. C., Rizzi M., Hippert C., Xue T., West E. L., et al. (2012). Restoration of vision after transplantation of photoreceptors. Nature 485 99–103. 10.1038/nature10997 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Peng G.-H., Ahmad O., Ahmad F., Liu J., Chen S. (2005). The photoreceptor-specific nuclear receptor Nr2e3 interacts with Crx and exerts opposing effects on the transcription of rod versus cone genes. Hum. Mol. Genet. 14 747–764. 10.1093/hmg/ddi070 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • R Development Core Team (2008). R: A Language and Environment for Statistical Computing. Vienna: R Foundation for Statistical Computing. [ Google Scholar ]
  • Rezania A., Bruin J. E., Arora P., Rubin A., Batushansky I., Asadi A., et al. (2014). Reversal of diabetes with insulin-producing cells derived in vitro from human pluripotent stem cells. Nat. Biotechnol. 32 1121–1133. 10.1038/nbt.3033 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Ris-Stalpers C., Kuiper G. G., Faber P. W., Schweikert H. U., van Rooij H. C., Zegers N. D., et al. (1990). Aberrant splicing of androgen receptor mRNA results in synthesis of a nonfunctional receptor protein in a patient with androgen insensitivity. Proc. Natl. Acad. Sci. U.S.A. 87 7866–7870. 10.1073/pnas.87.20.7866 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Robbins L. S., Nadeau J. H., Johnson K. R., Kelly M. A., Roselli-Rehfuss L., Baack E., et al. (1993). Pigmentation phenotypes of variant extension locus alleles result from point mutations that alter MSH receptor function. Cell 72 827–834. 10.1016/0092-8674(93)90572-8 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Rohart F., Gautier B., Singh A., Cao K.-A. L. (2017). mixOmics: an R package for ’omics feature selection and multiple data integration. PLoS Comput. Biol. 13 : e1005752 . 10.1371/journal.pcbi.1005752 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Rolling F. (2004). Recombinant AAV-mediated gene transfer to the retina: gene therapy perspectives. Gene Ther. 11 ( Suppl. 1 ), S26–S32. 10.1038/sj.gt.3302366 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Rozenblatt-Rosen O., Stubbington M. J. T., Regev A., Teichmann S. A. (2017). The human cell atlas: from vision to reality. Nature 550 451–453. 10.1038/550451a [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Santos-Ferreira T., Postel K., Stutzki H., Kurth T., Zeck G., Ader M. (2014). Daylight vision repair by cell transplantation. Stem Cells 33 79–90. 10.1002/stem.1824 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Sanvitale C. E., Kerr G., Chaikuad A., Ramel M.-C., Mohedas A. H., Reichert S., et al. (2013). A new class of small molecule inhibitor of BMP signaling. PLoS One 8 : e62721 . 10.1371/journal.pone.0062721 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Schwartz D. M., Bonelli M., Gadina M., O’Shea J. J. (2016). Type I/II cytokines, JAKs, and new strategies for treating autoimmune diseases. Nat. Rev. Rheumatol. 12 25–36. 10.1038/nrrheum.2015.167 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Skaper S. D. (2012). “ Isolation and culture of rat cone photoreceptor cells ,” in Neurotrophic Factors: Methods and Protocols , ed. Skaper S. D. (Totowa, NJ: Humana Press; ), 147–158. 10.1007/978-1-61779-536-7_13 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Smyth G. K. (2004). Linear models and empirical bayes methods for assessing differential expression in microarray experiments. Stat. Appl. Genet. Mol. Biol. 3 1–25. 10.2202/1544-6115.1027 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Strauss O. (2005). The retinal pigment epithelium in visual function. Physiol. Rev. 85 845–881. 10.1152/physrev.00021.2004 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Su Z., Fang H., Hong H., Shi L., Zhang W., Zhang W., et al. (2014). An investigation of biomarkers derived from legacy microarray data for their utility in the RNA-seq era. Genome Biol. 15 : 523 . 10.1186/s13059-014-0523-y [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Supek F., Bošnjak M., Škunca N., Šmuc T. (2011). REVIGO summarizes and visualizes long lists of gene ontology terms. PLoS One 6 : e21800 . 10.1371/journal.pone.0021800 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Swaroop A., Kim D., Forrest D. (2010). Transcriptional regulation of photoreceptor development and homeostasis in the mammalian retina. Nat. Rev. Neurosci. 11 563–576. 10.1038/nrn2880 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Thompson D. A., Ali R. R., Banin E., Branham K. E., Flannery J. G., Gamm D. M., et al. (2015). Advancing therapeutic strategies for inherited retinal degeneration: recommendations from the monaciano symposium. Investig. Ophthalmol. Vis. Sci. 56 918–931. 10.1167/iovs.14-16049 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Tombran-Tink J., Barnstable C. J. (2003). PEDF: a multifaceted neurotrophic factor. Nat. Rev. Neurosci. 4 628–636. 10.1038/nrn1176 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Traverso V., Kinkl N., Grimm L., Sahel J., Hicks D. (2003). Basic fibroblast and epidermal growth factors stimulate survival in adult porcine photoreceptor cell cultures. Investig. Ophthalmol. Vis. Sci. 44 4550–4558. 10.1167/iovs.03-0460 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Tsuchida K., Nakatani M., Hitachi K., Uezumi A., Sunada Y., Ageta H., et al. (2009). Activin signaling as an emerging target for therapeutic interventions. Cell Commun. Signal. 7 : 15 . 10.1186/1478-811X-7-15 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Turnbull A. K., Kitchen R. R., Larionov A. A., Renshaw L., Dixon J. M., Sims A. H. (2012). Direct integration of intensity-level data from Affymetrix and Illumina microarrays improves statistical power for robust reanalysis. BMC Med. Genomics 5 : 35 . 10.1186/1755-8794-5-35 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Uings I. J., Farrow S. N. (2000). Cell receptors and cell signalling. Mol. Pathol. 53 295–299. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Unachukwu U. J., Warren A., Li Z., Mishra S., Zhou J., Sauane M., et al. (2016). Predicted molecular signaling guiding photoreceptor cell migration following transplantation into damaged retina. Sci. Rep. 6 : 22392 . 10.1038/srep22392 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • van der Kant R., Langness V. F., Herrera C. M., Williams D. A., Fong L. K., Leestemaker Y., et al. (2019). Cholesterol metabolism is a druggable axis that independently regulates tau and amyloid-β in iPSC-derived Alzheimer’s disease neurons. Cell Stem Cell 24 363.e9–375.e9. 10.1016/j.stem.2018.12.013 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Vandesompele J., De Preter K., Pattyn F., Poppe B., Van Roy N., De Paepe A., et al. (2002). Accurate normalization of real-time quantitative RT-PCR data by geometric averaging of multiple internal control genes. Genome Biol. 3 : RESEARCH0034 . 10.1186/gb-2002-3-7-research0034 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Vingerling J. R., Dielemans I., Hofman A., Grobbee D. E., Hijmering M., Kramer C. F., et al. (1995). The prevalence of age-related maculopathy in the Rotterdam Study. Ophthalmology 102 205–210. 10.1016/s0161-6420(95)31034-2 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Voelkl B., Vogt L., Sena E. S., Würbel H. (2018). Reproducibility of preclinical animal research improves with heterogeneity of study samples. PLoS Biol. 16 : e2003693 . 10.1371/journal.pbio.2003693 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Waldron P. V., Di Marco F., Kruczek K., Ribeiro J., Graca A. B., Hippert C., et al. (2018). Transplanted donor- or stem cell-derived cone photoreceptors can both integrate and undergo material transfer in an environment-dependent manner. Stem Cell Rep. 10 406–421. 10.1016/j.stemcr.2017.12.008 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Yoon C., Song H., Yin T., Bausch-Fluck D., Frei A. P., Kattman S., et al. (2018). FZD4 marks lateral plate mesoderm and signals with NORRIN to increase cardiomyocyte induction from pluripotent stem cell-derived cardiac progenitors. Stem Cell Rep. 10 87–100. 10.1016/j.stemcr.2017.11.008 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Zhang Y.-J., Chen X., Li G., Chan K.-M., Heng B. C., Yin Z., et al. (2018). Concise review: stem cell fate guided by bioactive molecules for tendon regeneration. Stem Cells Transl. Med. 7 404–414. 10.1002/sctm.17-0206 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Zhou S., Flamier A., Abdouh M., Tétreault N., Barabino A., Wadhwa S., et al. (2015). Differentiation of human embryonic stem cells into cone photoreceptors through simultaneous inhibition of BMP, TGFβ and Wnt signaling. Development 142 3294–3306. 10.1242/dev.125385 [ PubMed ] [ CrossRef ] [ Google Scholar ]

Info! Some features may not work due to maintenance until August 18th, 2020

Literature mining and automated hypothesis-generation with

MeTeOR, or the MeSH Term Objective Reasoning network, is a network representation of genes, drugs, and diseases built on MeSH terms curated by the National Library of Medicine

Example for MeTeOR Clusters: LCA5, CEP83, MKKS, TMEM138, B9D1 Example for MeTeOR Entities: EGFR , 1956 You can search a gene by an Entrez ID, a chemical by a PubChem CID, disease by a MeSH ID, or by their name.

MeTeOR mines the PubMed literature, revealing knowledge previously hidden in a sea of information.

The scientific literature is vast, and valuable information connecting findings from disparate works is easily missed. Teams of collaborators address this problem up to a point but could still benefit from systematic “big data” approaches that mine the entire literature to generate testable hypotheses on a large scale.

MeTeOR mines the PubMed literature, revealing knowledge previously hidden in a sea of information. Given one biological entity (a gene, drug, or disease), it can give a ranked list of associations with other biological entities, and it can highlight papers pertaining to any two biological entities.

SELECT QUERY

Type your query into the search bar. You can use MeSH term, MeSH ID, or gene symbol.

SELECT ENTITY

Select the entity of interest by Clicking on the select button. The MeSH ID column links out the NLM's page for that MeSH term.

INTERPRETING THE RESULT

Your search results will be displayed in two formats, as a network and as a table. The network view shows the different types of entities that are connected to your search term (EGFR in this case), including genes, drugs, and diseases.

Copyright © Lichtarge Lab. All Rights Reserved.

MD Anderson Cancer Center Logo

  • Help & FAQ

Automated hypothesis generation based on mining scientific literature

  • Head & Neck Surgery

Research output : Chapter in Book/Report/Conference proceeding › Conference contribution

Keeping up with the ever-expanding flow of data and publications is untenable and poses a fundamental bottleneck to scientific progress. Current search technologies typically find many relevant documents, but they do not extract and organize the information content of these documents or suggest new scientific hypotheses based on this organized content. We present an initial case study on KnIT, a prototype system that mines the information contained in the scientific literature, represents it explicitly in a queriable network, and then further reasons upon these data to generate novel and experimentally testable hypotheses. KnIT combines entity detection with neighbor-text feature analysis and with graph-based diffusion of information to identify potential new properties of entities that are strongly implied by existing relationships. We discuss a successful application of our approach that mines the published literature to identify new protein kinases that phosphorylate the protein tumor suppressor p53. Retrospective analysis demonstrates the accuracy of this approach and ongoing laboratory experiments suggest that kinases identified by our system may indeed phosphorylate p53. These results establish proof of principle for automated hypothesis generation and discovery based on text mining of the scientific literature.

Publication series

  • hypothesis generation
  • scientific discovery
  • text mining

ASJC Scopus subject areas

  • Information Systems

Access to Document

  • 10.1145/2623330.2623667

Other files and links

  • Link to publication in Scopus

Fingerprint

  • Proteins Engineering & Materials Science 100%
  • Text mining Engineering & Materials Science 63%
  • Tumors Engineering & Materials Science 55%
  • Experiments Engineering & Materials Science 19%

T1 - Automated hypothesis generation based on mining scientific literature

AU - Spangler, Scott

AU - Wilkins, Angela D.

AU - Bachman, Benjamin J.

AU - Nagarajan, Meena

AU - Dayaram, Tajhal

AU - Haas, Peter

AU - Regenbogen, Sam

AU - Pickering, Curtis R.

AU - Comer, Austin

AU - Myers, Jeffrey N.

AU - Stanoi, Ioana

AU - Kato, Linda

AU - Lelescu, Ana

AU - Labrie, Jacques J.

AU - Parikh, Neha

AU - Lisewski, Andreas Martin

AU - Donehower, Lawrence

AU - Chen, Ying

AU - Lichtarge, Olivier

N2 - Keeping up with the ever-expanding flow of data and publications is untenable and poses a fundamental bottleneck to scientific progress. Current search technologies typically find many relevant documents, but they do not extract and organize the information content of these documents or suggest new scientific hypotheses based on this organized content. We present an initial case study on KnIT, a prototype system that mines the information contained in the scientific literature, represents it explicitly in a queriable network, and then further reasons upon these data to generate novel and experimentally testable hypotheses. KnIT combines entity detection with neighbor-text feature analysis and with graph-based diffusion of information to identify potential new properties of entities that are strongly implied by existing relationships. We discuss a successful application of our approach that mines the published literature to identify new protein kinases that phosphorylate the protein tumor suppressor p53. Retrospective analysis demonstrates the accuracy of this approach and ongoing laboratory experiments suggest that kinases identified by our system may indeed phosphorylate p53. These results establish proof of principle for automated hypothesis generation and discovery based on text mining of the scientific literature.

AB - Keeping up with the ever-expanding flow of data and publications is untenable and poses a fundamental bottleneck to scientific progress. Current search technologies typically find many relevant documents, but they do not extract and organize the information content of these documents or suggest new scientific hypotheses based on this organized content. We present an initial case study on KnIT, a prototype system that mines the information contained in the scientific literature, represents it explicitly in a queriable network, and then further reasons upon these data to generate novel and experimentally testable hypotheses. KnIT combines entity detection with neighbor-text feature analysis and with graph-based diffusion of information to identify potential new properties of entities that are strongly implied by existing relationships. We discuss a successful application of our approach that mines the published literature to identify new protein kinases that phosphorylate the protein tumor suppressor p53. Retrospective analysis demonstrates the accuracy of this approach and ongoing laboratory experiments suggest that kinases identified by our system may indeed phosphorylate p53. These results establish proof of principle for automated hypothesis generation and discovery based on text mining of the scientific literature.

KW - hypothesis generation

KW - scientific discovery

KW - text mining

UR - http://www.scopus.com/inward/record.url?scp=84907033471&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84907033471&partnerID=8YFLogxK

U2 - 10.1145/2623330.2623667

DO - 10.1145/2623330.2623667

M3 - Conference contribution

AN - SCOPUS:84907033471

SN - 9781450329569

T3 - Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

BT - KDD 2014 - Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

PB - Association for Computing Machinery

T2 - 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2014

Y2 - 24 August 2014 through 27 August 2014

Hypothesis Maker Online

Looking for a hypothesis maker? This online tool for students will help you formulate a beautiful hypothesis quickly, efficiently, and for free.

Are you looking for an effective hypothesis maker online? Worry no more; try our online tool for students and formulate your hypothesis within no time.

  • 🔎 How to Use the Tool?
  • ⚗️ What Is a Hypothesis in Science?

👍 What Does a Good Hypothesis Mean?

  • 🧭 Steps to Making a Good Hypothesis

🔗 References

📄 hypothesis maker: how to use it.

Our hypothesis maker is a simple and efficient tool you can access online for free.

If you want to create a research hypothesis quickly, you should fill out the research details in the given fields on the hypothesis generator.

Below are the fields you should complete to generate your hypothesis:

  • Who or what is your research based on? For instance, the subject can be research group 1.
  • What does the subject (research group 1) do?
  • What does the subject affect? - This shows the predicted outcome, which is the object.
  • Who or what will be compared with research group 1? (research group 2).

Once you fill the in the fields, you can click the ‘Make a hypothesis’ tab and get your results.

⚗️ What Is a Hypothesis in the Scientific Method?

A hypothesis is a statement describing an expectation or prediction of your research through observation.

It is similar to academic speculation and reasoning that discloses the outcome of your scientific test . An effective hypothesis, therefore, should be crafted carefully and with precision.

A good hypothesis should have dependent and independent variables . These variables are the elements you will test in your research method – it can be a concept, an event, or an object as long as it is observable.

You can observe the dependent variables while the independent variables keep changing during the experiment.

In a nutshell, a hypothesis directs and organizes the research methods you will use, forming a large section of research paper writing.

Hypothesis vs. Theory

A hypothesis is a realistic expectation that researchers make before any investigation. It is formulated and tested to prove whether the statement is true. A theory, on the other hand, is a factual principle supported by evidence. Thus, a theory is more fact-backed compared to a hypothesis.

Another difference is that a hypothesis is presented as a single statement , while a theory can be an assortment of things . Hypotheses are based on future possibilities toward a specific projection, but the results are uncertain. Theories are verified with undisputable results because of proper substantiation.

When it comes to data, a hypothesis relies on limited information , while a theory is established on an extensive data set tested on various conditions.

You should observe the stated assumption to prove its accuracy.

Since hypotheses have observable variables, their outcome is usually based on a specific occurrence. Conversely, theories are grounded on a general principle involving multiple experiments and research tests.

This general principle can apply to many specific cases.

The primary purpose of formulating a hypothesis is to present a tentative prediction for researchers to explore further through tests and observations. Theories, in their turn, aim to explain plausible occurrences in the form of a scientific study.

It would help to rely on several criteria to establish a good hypothesis. Below are the parameters you should use to analyze the quality of your hypothesis.

🧭 6 Steps to Making a Good Hypothesis

Writing a hypothesis becomes way simpler if you follow a tried-and-tested algorithm. Let’s explore how you can formulate a good hypothesis in a few steps:

Step #1: Ask Questions

The first step in hypothesis creation is asking real questions about the surrounding reality.

Why do things happen as they do? What are the causes of some occurrences?

Your curiosity will trigger great questions that you can use to formulate a stellar hypothesis. So, ensure you pick a research topic of interest to scrutinize the world’s phenomena, processes, and events.

Step #2: Do Initial Research

Carry out preliminary research and gather essential background information about your topic of choice.

The extent of the information you collect will depend on what you want to prove.

Your initial research can be complete with a few academic books or a simple Internet search for quick answers with relevant statistics.

Still, keep in mind that in this phase, it is too early to prove or disapprove of your hypothesis.

Step #3: Identify Your Variables

Now that you have a basic understanding of the topic, choose the dependent and independent variables.

Take note that independent variables are the ones you can’t control, so understand the limitations of your test before settling on a final hypothesis.

Step #4: Formulate Your Hypothesis

You can write your hypothesis as an ‘if – then’ expression . Presenting any hypothesis in this format is reliable since it describes the cause-and-effect you want to test.

For instance: If I study every day, then I will get good grades.

Step #5: Gather Relevant Data

Once you have identified your variables and formulated the hypothesis, you can start the experiment. Remember, the conclusion you make will be a proof or rebuttal of your initial assumption.

So, gather relevant information, whether for a simple or statistical hypothesis, because you need to back your statement.

Step #6: Record Your Findings

Finally, write down your conclusions in a research paper .

Outline in detail whether the test has proved or disproved your hypothesis.

Edit and proofread your work, using a plagiarism checker to ensure the authenticity of your text.

We hope that the above tips will be useful for you. Note that if you need to conduct business analysis, you can use the free templates we’ve prepared: SWOT , PESTLE , VRIO , SOAR , and Porter’s 5 Forces .

❓ Hypothesis Formulator FAQ

Updated: Oct 25th, 2023

  • How to Write a Hypothesis in 6 Steps - Grammarly
  • Forming a Good Hypothesis for Scientific Research
  • The Hypothesis in Science Writing
  • Scientific Method: Step 3: HYPOTHESIS - Subject Guides
  • Hypothesis Template & Examples - Video & Lesson Transcript
  • Free Essays
  • Writing Tools
  • Lit. Guides
  • Donate a Paper
  • Referencing Guides
  • Free Textbooks
  • Tongue Twisters
  • Job Openings
  • Expert Application
  • Video Contest
  • Writing Scholarship
  • Discount Codes
  • IvyPanda Shop
  • Terms and Conditions
  • Privacy Policy
  • Cookies Policy
  • Copyright Principles
  • DMCA Request
  • Service Notice

Use our hypothesis maker whenever you need to formulate a hypothesis for your study. We offer a very simple tool where you just need to provide basic info about your variables, subjects, and predicted outcomes. The rest is on us. Get a perfect hypothesis in no time!

COMMENTS

  1. Hypotheses devised by AI could find 'blind spots' in research

    Researchers have automated hypothesis generation in particle physics, materials science, biology, chemistry and other fields. An AI revolution is brewing in medicine. What will it look like?

  2. Hypothesis Maker

    How to use Hypothesis Maker. Visit the tool's page. Enter your research question into the provided field. Click the 'Generate' button to let the AI generate a hypothesis based on your research question. Review the generated hypothesis and adjust it as necessary to fit your research context and objectives. Copy and paste the hypothesis into your ...

  3. Can ChatGPT be used to generate scientific hypotheses?

    Therefore, the scenario of prompted AI hypothesis generation → human curating → automated experimentation loop (e.g. active-learning [19]) → peer reviews (often adversarial) ... AI hypothesis machines and automated experimental workflows could be used to accelerate scientific discoveries and enhance the common good.

  4. [2402.14424] Automating Psychological Hypothesis Generation with AI

    Leveraging the synergy between causal knowledge graphs and a large language model (LLM), our study introduces a groundbreaking approach for computational hypothesis generation in psychology. We analyzed 43,312 psychology articles using a LLM to extract causal relation pairs. This analysis produced a specialized causal graph for psychology. Applying link prediction algorithms, we generated 130 ...

  5. Data-Driven Hypothesis Generation in Clinical Research: What We Learned

    Hypothesis generation is an early and critical step in any hypothesis-driven clinical research project. Because it is not yet a well-understood cognitive process, the need to improve the process goes unrecognized. Without an impactful hypothesis, the significance of any research project can be questionable, regardless of the rigor or diligence applied in other steps of the study, e.g., study ...

  6. PDF Automated Hypothesis Generation Based on Mining Scientific Literature

    MEDLINE data [30], hypothesis generation from unstructured text has been a hit-or-miss manual process [31] that is heavily dependent upon serendipity. Our approach leverages mining techniques for unstructured text to automatically discover hidden similarities between entities based on a corpus of scientific articles.

  7. Automated Hypothesis Generation

    Automated hypothesis generation: when machine-learning systems. produce. ideas, not just test them. Testing ideas at scale. Fast. While algorithms are mostly used as tools to number-crunch and test-drive ideas, they have yet been used to generate the ideas themselves. Let alone at scale. Rather than thinking up one idea at a time and testing it ...

  8. Scientific intuition inspired by machine learning ...

    In addition to automated hypothesis generation, protocols for testing of the postulated hypotheses would be beneficial. In case of the chemistry experiment, a possible hypothesis testing protocol would generate mutations of each molecule in the training set to test the hypotheses on molecules with similar representations, where (ideally) only ...

  9. An automated framework for hypotheses generation using literature

    The proposed hypothesis generation framework (HGF) finds crisp semantic associations among entities of interest - that is a step towards bridging such gaps. Methodology. The proposed HGF shares similar end goals like the SWAN but are more holistic in nature and was designed and implemented using scalable and efficient computational models of ...

  10. Automated cognome construction and semi-automated hypothesis generation

    Semi-automated hypothesis generation. We introduce two methods for semi-automated hypothesis generation. The first relies on a simple "friend-of-a-friend should be a friend" concept wherein we assume that two terms that each strongly relate to a parent term should relate to one another.

  11. Explainable Automatic Hypothesis Generation via High-order Graph Walks

    This more transparent process encourages trust in the biomedical community for automatic hypothesis generation systems. We use a reinforcement learning strategy to formulate the HG problem as a guided node-pair embedding-based link prediction problem via a directed graph walk. Given nodes in a node-pair, the model starts a graph walk ...

  12. Explainable Automatic Hypothesis Generation via High-order ...

    In this paper, we study the automatic hypothesis generation (HG) problem, focusing on explainability. Given pairs of biomedical terms, we focus on link prediction to explain how the prediction was made. This more transparent process encourages trust in the biomedical community for automatic hypothesis generation systems. We use a reinforcement learning strategy to formulate the HG problem as a ...

  13. MOLIERE: Automatic Biomedical Hypothesis Generation System

    Automated hypothesis generation based on mining scientific literature. KDD '14: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining . Keeping up with the ever-expanding flow of data and publications is untenable and poses a fundamental bottleneck to scientific progress. Current search technologies ...

  14. Automated Hypothesis Generation to Identify Signals Relevant in the

    Automated Hypothesis Generation to Identify Signals Relevant in the Development of Mammalian Cell and Tissue Bioprocesses, With Validation in a Retinal Culture System ... This supports the hypothesis that activin A signaling in photoreceptors is mediated by canonical receptor signaling involving type 2 receptor complexes together with Acvr1b ...

  15. Literature-based discovery

    An example diagram of Swanson linking, usinc the ABC paradigm. Literature-based discovery (LBD), also called literature-related discovery (LRD) is a form of knowledge extraction and automated hypothesis generation that uses papers and other academic publications (the "literature") to find new relationships between existing knowledge (the "discovery"). "). Literature-based discovery aims to ...

  16. [PDF] Automated hypothesis generation based on mining scientific

    Automated hypothesis generation based on mining scientific literature. An initial case study on KnIT, a prototype system that mines the information contained in the scientific literature, represents it explicitly in a queriable network, and then further reasons upon these data to generate novel and experimentally testable hypotheses. Expand.

  17. Automated hypothesis generation based on mining scientific literature

    Retrospective analysis demonstrates the accuracy of this approach and ongoing laboratory experiments suggest that kinases identified by our system may indeed phosphorylate p53. These results establish proof of principle for automated hypothesis generation and discovery based on text mining of the scientific literature.

  18. Literature mining and automated hypothesis-generation with

    Literature mining and automated hypothesis-generation with MeTeOR . MeTeOR, or the MeSH Term Objective Reasoning network, is a network representation of genes, drugs, and diseases built on MeSH terms curated by the National Library of Medicine ... up to a point but could still benefit from systematic "big data" approaches that mine the ...

  19. Hypothesis Generation from Literature for Advancing Biological

    Hypothesis Generation is a literature-based discovery approach that utilizes existing literature to automatically generate implicit biomedical associations and provide reasonable predictions for future research. Despite its potential, current hypothesis generation methods face challenges when applied to research on biological mechanisms.

  20. Free AI Hypothesis Maker

    It's easy to get started. 1 Create a free account. 2 Once you've logged in, find the Hypothesis Maker template amongst our 200+ templates. 3 Fill out Research Topic. For example: The effect of light on plant growth.

  21. Automated hypothesis generation based on mining scientific literature

    Retrospective analysis demonstrates the accuracy of this approach and ongoing laboratory experiments suggest that kinases identified by our system may indeed phosphorylate p53. These results establish proof of principle for automated hypothesis generation and discovery based on text mining of the scientific literature. Original language.

  22. Hypothesis Maker

    Our hypothesis maker is a simple and efficient tool you can access online for free. If you want to create a research hypothesis quickly, you should fill out the research details in the given fields on the hypothesis generator. Below are the fields you should complete to generate your hypothesis:

  23. Understanding and Perception of Automated Text Generation among the

    Automated text generation (ATG) technology has evolved rapidly in the last several years, enabling the spread of content produced by artificial intelligence (AI). In addition, with the release of ChatGPT, virtually everyone can now create naturally sounding text on any topic. To optimize future use and understand how humans interact with these technologies, it is essential to capture people ...