• Privacy Policy

Buy Me a Coffee

Research Method

Home » Historical Research – Types, Methods and Examples

Historical Research – Types, Methods and Examples

Table of Contents

Historical Research

Historical Research

Definition:

Historical research is the process of investigating and studying past events, people, and societies using a variety of sources and methods. This type of research aims to reconstruct and interpret the past based on the available evidence.

Types of Historical Research

There are several types of historical research, including:

Descriptive Research

This type of historical research focuses on describing events, people, or cultures in detail. It can involve examining artifacts, documents, or other sources of information to create a detailed account of what happened or existed.

Analytical Research

This type of historical research aims to explain why events, people, or cultures occurred in a certain way. It involves analyzing data to identify patterns, causes, and effects, and making interpretations based on this analysis.

Comparative Research

This type of historical research involves comparing two or more events, people, or cultures to identify similarities and differences. This can help researchers understand the unique characteristics of each and how they interacted with each other.

Interpretive Research

This type of historical research focuses on interpreting the meaning of past events, people, or cultures. It can involve analyzing cultural symbols, beliefs, and practices to understand their significance in a particular historical context.

Quantitative Research

This type of historical research involves using statistical methods to analyze historical data. It can involve examining demographic information, economic indicators, or other quantitative data to identify patterns and trends.

Qualitative Research

This type of historical research involves examining non-numerical data such as personal accounts, letters, or diaries. It can provide insights into the experiences and perspectives of individuals during a particular historical period.

Data Collection Methods

Data Collection Methods are as follows:

  • Archival research : This involves analyzing documents and records that have been preserved over time, such as government records, diaries, letters, newspapers, and photographs. Archival research is often conducted in libraries, archives, and museums.
  • Oral history : This involves conducting interviews with individuals who have lived through a particular historical period or event. Oral history can provide a unique perspective on past events and can help to fill gaps in the historical record.
  • Artifact analysis: This involves examining physical objects from the past, such as tools, clothing, and artwork, to gain insights into past cultures and practices.
  • Secondary sources: This involves analyzing published works, such as books, articles, and academic papers, that discuss past events and cultures. Secondary sources can provide context and insights into the historical period being studied.
  • Statistical analysis : This involves analyzing numerical data from the past, such as census records or economic data, to identify patterns and trends.
  • Fieldwork : This involves conducting on-site research in a particular location, such as visiting a historical site or conducting ethnographic research in a particular community. Fieldwork can provide a firsthand understanding of the culture and environment being studied.
  • Content analysis: This involves analyzing the content of media from the past, such as films, television programs, and advertisements, to gain insights into cultural attitudes and beliefs.

Data Analysis Methods

  • Content analysis : This involves analyzing the content of written or visual material, such as books, newspapers, or photographs, to identify patterns and themes. Content analysis can be used to identify changes in cultural values and beliefs over time.
  • Textual analysis : This involves analyzing written texts, such as letters or diaries, to understand the experiences and perspectives of individuals during a particular historical period. Textual analysis can provide insights into how people lived and thought in the past.
  • Discourse analysis : This involves analyzing how language is used to construct meaning and power relations in a particular historical period. Discourse analysis can help to identify how social and political ideologies were constructed and maintained over time.
  • Statistical analysis: This involves using statistical methods to analyze numerical data, such as census records or economic data, to identify patterns and trends. Statistical analysis can help to identify changes in population demographics, economic conditions, and other factors over time.
  • Comparative analysis : This involves comparing data from two or more historical periods or events to identify similarities and differences. Comparative analysis can help to identify patterns and trends that may not be apparent from analyzing data from a single historical period.
  • Qualitative analysis: This involves analyzing non-numerical data, such as oral history interviews or ethnographic field notes, to identify themes and patterns. Qualitative analysis can provide a rich understanding of the experiences and perspectives of individuals in the past.

Historical Research Methodology

Here are the general steps involved in historical research methodology:

  • Define the research question: Start by identifying a research question that you want to answer through your historical research. This question should be focused, specific, and relevant to your research goals.
  • Review the literature: Conduct a review of the existing literature on the topic of your research question. This can involve reading books, articles, and academic papers to gain a thorough understanding of the existing research.
  • Develop a research design : Develop a research design that outlines the methods you will use to collect and analyze data. This design should be based on the research question and should be feasible given the resources and time available.
  • Collect data: Use the methods outlined in your research design to collect data on past events, people, and cultures. This can involve archival research, oral history interviews, artifact analysis, and other data collection methods.
  • Analyze data : Analyze the data you have collected using the methods outlined in your research design. This can involve content analysis, textual analysis, statistical analysis, and other data analysis methods.
  • Interpret findings : Use the results of your data analysis to draw meaningful insights and conclusions related to your research question. These insights should be grounded in the data and should be relevant to the research goals.
  • Communicate results: Communicate your findings through a research report, academic paper, or other means. This should be done in a clear, concise, and well-organized manner, with appropriate citations and references to the literature.

Applications of Historical Research

Historical research has a wide range of applications in various fields, including:

  • Education : Historical research can be used to develop curriculum materials that reflect a more accurate and inclusive representation of history. It can also be used to provide students with a deeper understanding of past events and cultures.
  • Museums : Historical research is used to develop exhibits, programs, and other materials for museums. It can provide a more accurate and engaging presentation of historical events and artifacts.
  • Public policy : Historical research is used to inform public policy decisions by providing insights into the historical context of current issues. It can also be used to evaluate the effectiveness of past policies and programs.
  • Business : Historical research can be used by businesses to understand the evolution of their industry and to identify trends that may affect their future success. It can also be used to develop marketing strategies that resonate with customers’ historical interests and values.
  • Law : Historical research is used in legal proceedings to provide evidence and context for cases involving historical events or practices. It can also be used to inform the development of new laws and policies.
  • Genealogy : Historical research can be used by individuals to trace their family history and to understand their ancestral roots.
  • Cultural preservation : Historical research is used to preserve cultural heritage by documenting and interpreting past events, practices, and traditions. It can also be used to identify and preserve historical landmarks and artifacts.

Examples of Historical Research

Examples of Historical Research are as follows:

  • Examining the history of race relations in the United States: Historical research could be used to explore the historical roots of racial inequality and injustice in the United States. This could help inform current efforts to address systemic racism and promote social justice.
  • Tracing the evolution of political ideologies: Historical research could be used to study the development of political ideologies over time. This could help to contextualize current political debates and provide insights into the origins and evolution of political beliefs and values.
  • Analyzing the impact of technology on society : Historical research could be used to explore the impact of technology on society over time. This could include examining the impact of previous technological revolutions (such as the industrial revolution) on society, as well as studying the current impact of emerging technologies on society and the environment.
  • Documenting the history of marginalized communities : Historical research could be used to document the history of marginalized communities (such as LGBTQ+ communities or indigenous communities). This could help to preserve cultural heritage, promote social justice, and promote a more inclusive understanding of history.

Purpose of Historical Research

The purpose of historical research is to study the past in order to gain a better understanding of the present and to inform future decision-making. Some specific purposes of historical research include:

  • To understand the origins of current events, practices, and institutions : Historical research can be used to explore the historical roots of current events, practices, and institutions. By understanding how things developed over time, we can gain a better understanding of the present.
  • To develop a more accurate and inclusive understanding of history : Historical research can be used to correct inaccuracies and biases in historical narratives. By exploring different perspectives and sources of information, we can develop a more complete and nuanced understanding of history.
  • To inform decision-making: Historical research can be used to inform decision-making in various fields, including education, public policy, business, and law. By understanding the historical context of current issues, we can make more informed decisions about how to address them.
  • To preserve cultural heritage : Historical research can be used to document and preserve cultural heritage, including traditions, practices, and artifacts. By understanding the historical significance of these cultural elements, we can work to preserve them for future generations.
  • To stimulate curiosity and critical thinking: Historical research can be used to stimulate curiosity and critical thinking about the past. By exploring different historical perspectives and interpretations, we can develop a more critical and reflective approach to understanding history and its relevance to the present.

When to use Historical Research

Historical research can be useful in a variety of contexts. Here are some examples of when historical research might be particularly appropriate:

  • When examining the historical roots of current events: Historical research can be used to explore the historical roots of current events, practices, and institutions. By understanding how things developed over time, we can gain a better understanding of the present.
  • When examining the historical context of a particular topic : Historical research can be used to explore the historical context of a particular topic, such as a social issue, political debate, or scientific development. By understanding the historical context, we can gain a more nuanced understanding of the topic and its significance.
  • When exploring the evolution of a particular field or discipline : Historical research can be used to explore the evolution of a particular field or discipline, such as medicine, law, or art. By understanding the historical development of the field, we can gain a better understanding of its current state and future directions.
  • When examining the impact of past events on current society : Historical research can be used to examine the impact of past events (such as wars, revolutions, or social movements) on current society. By understanding the historical context and impact of these events, we can gain insights into current social and political issues.
  • When studying the cultural heritage of a particular community or group : Historical research can be used to document and preserve the cultural heritage of a particular community or group. By understanding the historical significance of cultural practices, traditions, and artifacts, we can work to preserve them for future generations.

Characteristics of Historical Research

The following are some characteristics of historical research:

  • Focus on the past : Historical research focuses on events, people, and phenomena of the past. It seeks to understand how things developed over time and how they relate to current events.
  • Reliance on primary sources: Historical research relies on primary sources such as letters, diaries, newspapers, government documents, and other artifacts from the period being studied. These sources provide firsthand accounts of events and can help researchers gain a more accurate understanding of the past.
  • Interpretation of data : Historical research involves interpretation of data from primary sources. Researchers analyze and interpret data to draw conclusions about the past.
  • Use of multiple sources: Historical research often involves using multiple sources of data to gain a more complete understanding of the past. By examining a range of sources, researchers can cross-reference information and validate their findings.
  • Importance of context: Historical research emphasizes the importance of context. Researchers analyze the historical context in which events occurred and consider how that context influenced people’s actions and decisions.
  • Subjectivity : Historical research is inherently subjective, as researchers interpret data and draw conclusions based on their own perspectives and biases. Researchers must be aware of their own biases and strive for objectivity in their analysis.
  • Importance of historical significance: Historical research emphasizes the importance of historical significance. Researchers consider the historical significance of events, people, and phenomena and their impact on the present and future.
  • Use of qualitative methods : Historical research often uses qualitative methods such as content analysis, discourse analysis, and narrative analysis to analyze data and draw conclusions about the past.

Advantages of Historical Research

There are several advantages to historical research:

  • Provides a deeper understanding of the past : Historical research can provide a more comprehensive understanding of past events and how they have shaped current social, political, and economic conditions. This can help individuals and organizations make informed decisions about the future.
  • Helps preserve cultural heritage: Historical research can be used to document and preserve cultural heritage. By studying the history of a particular culture, researchers can gain insights into the cultural practices and beliefs that have shaped that culture over time.
  • Provides insights into long-term trends : Historical research can provide insights into long-term trends and patterns. By studying historical data over time, researchers can identify patterns and trends that may be difficult to discern from short-term data.
  • Facilitates the development of hypotheses: Historical research can facilitate the development of hypotheses about how past events have influenced current conditions. These hypotheses can be tested using other research methods, such as experiments or surveys.
  • Helps identify root causes of social problems : Historical research can help identify the root causes of social problems. By studying the historical context in which these problems developed, researchers can gain a better understanding of how they emerged and what factors may have contributed to their development.
  • Provides a source of inspiration: Historical research can provide a source of inspiration for individuals and organizations seeking to address current social, political, and economic challenges. By studying the accomplishments and struggles of past generations, researchers can gain insights into how to address current challenges.

Limitations of Historical Research

Some Limitations of Historical Research are as follows:

  • Reliance on incomplete or biased data: Historical research is often limited by the availability and quality of data. Many primary sources have been lost, destroyed, or are inaccessible, making it difficult to get a complete picture of historical events. Additionally, some primary sources may be biased or represent only one perspective on an event.
  • Difficulty in generalizing findings: Historical research is often specific to a particular time and place and may not be easily generalized to other contexts. This makes it difficult to draw broad conclusions about human behavior or social phenomena.
  • Lack of control over variables : Historical research often lacks control over variables. Researchers cannot manipulate or control historical events, making it difficult to establish cause-and-effect relationships.
  • Subjectivity of interpretation : Historical research is often subjective because researchers must interpret data and draw conclusions based on their own biases and perspectives. Different researchers may interpret the same data differently, leading to different conclusions.
  • Limited ability to test hypotheses: Historical research is often limited in its ability to test hypotheses. Because the events being studied have already occurred, researchers cannot manipulate variables or conduct experiments to test their hypotheses.
  • Lack of objectivity: Historical research is often subjective, and researchers must be aware of their own biases and strive for objectivity in their analysis. However, it can be difficult to maintain objectivity when studying events that are emotionally charged or controversial.
  • Limited generalizability: Historical research is often limited in its generalizability, as the events and conditions being studied may be specific to a particular time and place. This makes it difficult to draw broad conclusions that apply to other contexts or time periods.

About the author

' src=

Muhammad Hassan

Researcher, Academic Writer, Web developer

You may also like

Documentary Research

Documentary Research – Types, Methods and...

Scientific Research

Scientific Research – Types, Purpose and Guide

Original Research

Original Research – Definition, Examples, Guide

Humanities Research

Humanities Research – Types, Methods and Examples

Artistic Research

Artistic Research – Methods, Types and Examples

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Iran J Nurs Midwifery Res
  • v.20(2); Mar-Apr 2015

Data analysis in oral history: A new approach in historical research

Mohammadreza firouzkouhi.

1 Department of Medical Surgical nursing, Faculty of Nursing and Midwifery, Zabol University of Medical Sciences, Zabol, Iran

Ali Zargham-Boroujeni

2 Nursing and Midwifery Care Research Center, Faculty of Nursing and Midwifery, Isfahan University of Medical Sciences, Isfahan, Iran

Background:

Historical research has limitations in applying proper and credit-worthy chronology to clarify the data. In this methodology, the application of oral history is one of the ways in which answers to questions addressed by the research theme are elicited. Oral history, as a clear and transparent tool, needs to be applied with guidelines for qualitative researchers regarding data analysis limitations from oral evidence and face-to-face contact. Therefore, the development of a systematic method for data analysis is needed to obtain accurate answers, based on which a credit-worthy narration can be produced. The aim of this study was to introduce an ethical and objective approach for the analysis of data obtained from oral history.

Materials and Methods:

This is a methodological article that suggests an analysis method based on qualitative approach and experiences of the authors.

A systematic method of data analysis for oral history research, based on common qualitative data analysis methods, has been suggested as the result of this article.

Conclusions:

This new technique is equipped with measures that would assist qualitative researchers in the nursing field and other disciplines regarding analysis of qualitative data resulting from oral history studies.

I NTRODUCTION

Nurses have implemented historical research methodology in order to explain the history of nursing. Historical methodology guides us in making decisions based on past experiences.[ 1 ]

Nursing history can be a source of cultural identity. It reveals and defines both the scientific and artistic dimensions of nursing. In spite of the validity of historical research, there is some paucity in analyzing data to produce credible results. In addition, there are no classic references for the novice about how to analyze a huge amount of data.[ 2 ]

Historians believe that using theories and models creates bias toward the investigations. If this is true, how do novice researchers become familiar with robust techniques and processes of inquiry required in research?[ 3 ] Hamilton stated that novice historians can understand the importance of historical techniques by immersing themselves in good historiography.[ 4 ] But this strategy seems too difficult to follow by novice researchers.

History means the complete documentation of people's ups and downs. History is not merely a list of events, but an impartial evaluation of the entirety of human interrelationships in time and space.[ 5 ] Historical research stages consist of the following: Identifying the area of interest, raising questions, formulating a title, reviewing the literature, data gathering and analysis, interpreting data, and writing the narrative. Due to the importance of the data gathering and analysis stage in qualitative historical studies, and the interconnection between these two stages, it is vital that the researcher selects a proper research method at this point.[ 6 ] One of the methods used in a qualitative historical studies approach is the oral history method,[ 7 ] and this was applied in a project related to the nature of nursing practice during the Iran–Iraq war. Since oral history lacks a clear and well-developed analysis method, the authors decided to develop a method based on the experiences they have d during their research, and also considering the strengths and weaknesses of the methods that have been used in the other oral history project. The aim of this study is focused on the development of an analysis method for oral history.

M ATERIALS AND M ETHODS

In this methodological article, a method for data analysis suitable for oral history researches has been suggested based on qualitative research tradition and experiences of the authors.

Suggested data analysis method in oral history

In order to clarify the data obtained by the researcher through oral history interviews, a historical research methodology is required to produce a good narrative, along with the application of a proper analysis. For this purpose, a four-stage method is introduced and adopted. Each stage is connected and related to the previous one, while the final stage connects to the first and closes the circuit, which means that all data analysis stages, in a sense, are complementary to one another.

These analysis stages are:

  • Data gathering through interviews with the oral witness and first-level coding
  • Second-level coding and determining the sub-categories
  • Third-level coding and determining the main categories
  • Connecting the main categories to each other and writing the narrative.

The authors hope that by applying this method, researchers in the field of nursing would be able to analyze oral history data in a new form.

The authors are of the opinion that this method is applicable in all disciplines which are concerned with oral history. This method has been practically applied in two articles titled “Nurses experiences in chemical emergency departments: Iran–Iraq war, 1980–1988” and “The wartime experience of civilian nurses in the Iran–Iraq war, 1980–1988: An historical research.”[ 8 , 9 ] It should be noted that most of the information in recent decades concerning the digital oral history paradigm have applied the related computer software.[ 10 ]

Stage one: Data gathering and analysis

At this stage, the data collected through interviews, oral witnesses, and first-level coding evolve into the following two facts:

  • Familiarity with the data and its organization
  • Extraction of the initial codes.

Familiarity begins as the interviews proceed; therefore, the proper selection of participants is essential. In reference to the digital oral history paradigm, the interviews are recorded with a voice recorder, and other primary sources like personal notes, photos, etc. are evaluated. Since there is a close correlation between the data, well-conducted interviews contribute to the next stage's analysis. The interview must be objective and proper arrangement of the questions is essential in order to obtain good results. The interviews consist of four sets of questions: Warm-up, memory, judgment, and follow-up.

The researcher, through an in-depth interview, encourages the participants to recall the events and related details and to relate them clearly in the interview.[ 11 ]

According to Thomson, a voice-recorded interview not only allows the participant to verbally express their views, but also administers and illustrates the spoken words.[ 12 ]

The interviews take place in three stages. The first interview encourages the participants to recollect and arrange their experience. The second interview represents the experience in detail and the context in which it happened, while the third interview explores the meaning that their experience holds for them.[ 13 ]

After each interview, the researcher listens to the responses many times, in order to determine the applicable portions for planning and addressing the next interview to achieve the final objective.[ 13 ]

The recorded interviews are transcribed by hand or using a computer programs and then, the data may be transferred to a Computer Assisted Qualitative Data Analysis Software (CAQDAS) for textual analysis. The interviewer takes notes when the expression of the participant changes. These notes help the researcher to recall the expressions/emotions of the interviewee when writing the narrative.

The codes used are labels that contain the interview briefing or the researcher's impressions regarding the interview. Coding takes place parallel to data compilation, by constant comparison of the codes against one another and by putting them next to other codes. Symbols or abbreviations categorize words or expressions of a given data, and coding facilitates data recovery.[ 14 ]

An example of coding at this stage:

“… in some military operations like Kheibar, 1985, the conditions were such that we worked for three days nonstop due to the excessive number of wounded. There was no time to rest, eat or pray properly. I remember that the young doctors and nurses couldn’t resist and just lay on the floor next to the patient's bed .”

Code assigned to speech participant 2: (p2i1c156) nonstop nursing activities

According to Speziale (2010), the researcher should begin coding and analyzing data as he compiles it. This helps him to be innovative in his attitude regarding the next interviews.[ 14 ]

The first stage of coding summarizes the data along with organizing it to facilitate labeling of the categories.

Stage two: Second-level coding and determining the sub-categories

Coding is considered as the first interpretation stage and is applied to obtain the participants’ live experiences. The obtained codes are a practical ring in the chain of events in historical research.[ 15 ]

Lived experiences are what the participant has been through so far. At this stage, the following two things evolve:

Formation of sub-categories

The data with close conceptions are extracted from and studied on the initial codes and set next to one another to form the sub-categories.

Review on the obtained related codes

Here the selected sub-categories that represent the initial codes are reviewed. It should be mentioned that a sub-category may contain many codes, which are determined at stage one of the coding.[ 16 ]

Stage three: Third-level coding and determining the main categories

At this point, the main categories are defined, labeled, and coded. The emphasis here is put on the key concepts that are going to be applied in the fourth stage, where the study starts to take shape.

In fact, at this stage, the pieces of the puzzle are arranged in a manner in which the whole picture is revealed. The credibility of the oral history method and this critical stage is determined by analyzing the approved documentation. To assure coding validity at different stages, the codes are sent to the participants.

Stage three shows the focus of the study regarding data correlation with the study, and at this stage the main categories are formed by similar sub-categories and the inter-connections appear.

Stage four: Connecting the main categories to one another and writing the narrative

The narrative is the outcome of the words spoken by the participants, and packed in correlation with the main categories. In the narrative, the historical picture of the participants’ perspective on the historical events can be depicted in the research. To write the narrative, the following are used: Books, primary and secondary sources, newspaper articles, poems, songs, and other items that have played a role in the reconstruction of historical events such as wars.

Such historical narratives are of special importance: Revealing the events, projecting the findings, answering questions addressed in the study, exposing categories, clarifying ambiguities, and preventing bias in researcher's accounts.

Historic narration attracts readers’ attention to the event by revealing a complete perception of the experience. Here, the authors respond to the questions raised in the process of the study, determine the primary and secondary sources, combine the ideas, and share the views of the participants. All these enable the reader to have a meaningful and inspiring assessment of the presented issue through this model [ Figure 1 ].

An external file that holds a picture, illustration, etc.
Object name is IJNMR-20-161-g001.jpg

Analyzing oral history

Judging the study and its success is related to joining the categories to one another and the acceptance of the work for publication as a final product.[ 17 ]

C ONCLUSION

Historical research, as a research methodology, is supported by oral history (research method), which is a valuable tool for researchers in the nursing field. During the passage of time, a lack of an objective method to assist the analysis of the data obtained through oral evidence interviews limits the researcher's efforts.

Here, an attempt has been made to introduce a model for better data analysis with respect to oral history that would facilitate better research approaches where oral history is the main source. Through systematic application of this method, the nursing researcher would be able to produce a correct and reliable narrative that would illustrate nursing discipline abilities and assist in the production of knowledge in this realm.

Source of Support: Nill.

Conflict of Interest: None declared.

R EFERENCES

  • Skip to main content
  • Skip to primary sidebar
  • Skip to footer
  • QuestionPro

survey software icon

  • Solutions Industries Gaming Automotive Sports and events Education Government Travel & Hospitality Financial Services Healthcare Cannabis Technology Use Case NPS+ Communities Audience Contactless surveys Mobile LivePolls Member Experience GDPR Positive People Science 360 Feedback Surveys
  • Resources Blog eBooks Survey Templates Case Studies Training Help center

methods of data analysis in historical research

Home Market Research

Data Analysis in Research: Types & Methods

data-analysis-in-research

Content Index

Why analyze data in research?

Types of data in research, finding patterns in the qualitative data, methods used for data analysis in qualitative research, preparing data for analysis, methods used for data analysis in quantitative research, considerations in research data analysis, what is data analysis in research.

Definition of research in data analysis: According to LeCompte and Schensul, research data analysis is a process used by researchers to reduce data to a story and interpret it to derive insights. The data analysis process helps reduce a large chunk of data into smaller fragments, which makes sense. 

Three essential things occur during the data analysis process — the first is data organization . Summarization and categorization together contribute to becoming the second known method used for data reduction. It helps find patterns and themes in the data for easy identification and linking. The third and last way is data analysis – researchers do it in both top-down and bottom-up fashion.

LEARN ABOUT: Research Process Steps

On the other hand, Marshall and Rossman describe data analysis as a messy, ambiguous, and time-consuming but creative and fascinating process through which a mass of collected data is brought to order, structure and meaning.

We can say that “the data analysis and data interpretation is a process representing the application of deductive and inductive logic to the research and data analysis.”

Researchers rely heavily on data as they have a story to tell or research problems to solve. It starts with a question, and data is nothing but an answer to that question. But, what if there is no question to ask? Well! It is possible to explore data even without a problem – we call it ‘Data Mining’, which often reveals some interesting patterns within the data that are worth exploring.

Irrelevant to the type of data researchers explore, their mission and audiences’ vision guide them to find the patterns to shape the story they want to tell. One of the essential things expected from researchers while analyzing data is to stay open and remain unbiased toward unexpected patterns, expressions, and results. Remember, sometimes, data analysis tells the most unforeseen yet exciting stories that were not expected when initiating data analysis. Therefore, rely on the data you have at hand and enjoy the journey of exploratory research. 

Create a Free Account

Every kind of data has a rare quality of describing things after assigning a specific value to it. For analysis, you need to organize these values, processed and presented in a given context, to make it useful. Data can be in different forms; here are the primary data types.

  • Qualitative data: When the data presented has words and descriptions, then we call it qualitative data . Although you can observe this data, it is subjective and harder to analyze data in research, especially for comparison. Example: Quality data represents everything describing taste, experience, texture, or an opinion that is considered quality data. This type of data is usually collected through focus groups, personal qualitative interviews , qualitative observation or using open-ended questions in surveys.
  • Quantitative data: Any data expressed in numbers of numerical figures are called quantitative data . This type of data can be distinguished into categories, grouped, measured, calculated, or ranked. Example: questions such as age, rank, cost, length, weight, scores, etc. everything comes under this type of data. You can present such data in graphical format, charts, or apply statistical analysis methods to this data. The (Outcomes Measurement Systems) OMS questionnaires in surveys are a significant source of collecting numeric data.
  • Categorical data: It is data presented in groups. However, an item included in the categorical data cannot belong to more than one group. Example: A person responding to a survey by telling his living style, marital status, smoking habit, or drinking habit comes under the categorical data. A chi-square test is a standard method used to analyze this data.

Learn More : Examples of Qualitative Data in Education

Data analysis in qualitative research

Data analysis and qualitative data research work a little differently from the numerical data as the quality data is made up of words, descriptions, images, objects, and sometimes symbols. Getting insight from such complicated information is a complicated process. Hence it is typically used for exploratory research and data analysis .

Although there are several ways to find patterns in the textual information, a word-based method is the most relied and widely used global technique for research and data analysis. Notably, the data analysis process in qualitative research is manual. Here the researchers usually read the available data and find repetitive or commonly used words. 

For example, while studying data collected from African countries to understand the most pressing issues people face, researchers might find  “food”  and  “hunger” are the most commonly used words and will highlight them for further analysis.

LEARN ABOUT: Level of Analysis

The keyword context is another widely used word-based technique. In this method, the researcher tries to understand the concept by analyzing the context in which the participants use a particular keyword.  

For example , researchers conducting research and data analysis for studying the concept of ‘diabetes’ amongst respondents might analyze the context of when and how the respondent has used or referred to the word ‘diabetes.’

The scrutiny-based technique is also one of the highly recommended  text analysis  methods used to identify a quality data pattern. Compare and contrast is the widely used method under this technique to differentiate how a specific text is similar or different from each other. 

For example: To find out the “importance of resident doctor in a company,” the collected data is divided into people who think it is necessary to hire a resident doctor and those who think it is unnecessary. Compare and contrast is the best method that can be used to analyze the polls having single-answer questions types .

Metaphors can be used to reduce the data pile and find patterns in it so that it becomes easier to connect data with theory.

Variable Partitioning is another technique used to split variables so that researchers can find more coherent descriptions and explanations from the enormous data.

LEARN ABOUT: Qualitative Research Questions and Questionnaires

There are several techniques to analyze the data in qualitative research, but here are some commonly used methods,

  • Content Analysis:  It is widely accepted and the most frequently employed technique for data analysis in research methodology. It can be used to analyze the documented information from text, images, and sometimes from the physical items. It depends on the research questions to predict when and where to use this method.
  • Narrative Analysis: This method is used to analyze content gathered from various sources such as personal interviews, field observation, and  surveys . The majority of times, stories, or opinions shared by people are focused on finding answers to the research questions.
  • Discourse Analysis:  Similar to narrative analysis, discourse analysis is used to analyze the interactions with people. Nevertheless, this particular method considers the social context under which or within which the communication between the researcher and respondent takes place. In addition to that, discourse analysis also focuses on the lifestyle and day-to-day environment while deriving any conclusion.
  • Grounded Theory:  When you want to explain why a particular phenomenon happened, then using grounded theory for analyzing quality data is the best resort. Grounded theory is applied to study data about the host of similar cases occurring in different settings. When researchers are using this method, they might alter explanations or produce new ones until they arrive at some conclusion.

LEARN ABOUT: 12 Best Tools for Researchers

Data analysis in quantitative research

The first stage in research and data analysis is to make it for the analysis so that the nominal data can be converted into something meaningful. Data preparation consists of the below phases.

Phase I: Data Validation

Data validation is done to understand if the collected data sample is per the pre-set standards, or it is a biased data sample again divided into four different stages

  • Fraud: To ensure an actual human being records each response to the survey or the questionnaire
  • Screening: To make sure each participant or respondent is selected or chosen in compliance with the research criteria
  • Procedure: To ensure ethical standards were maintained while collecting the data sample
  • Completeness: To ensure that the respondent has answered all the questions in an online survey. Else, the interviewer had asked all the questions devised in the questionnaire.

Phase II: Data Editing

More often, an extensive research data sample comes loaded with errors. Respondents sometimes fill in some fields incorrectly or sometimes skip them accidentally. Data editing is a process wherein the researchers have to confirm that the provided data is free of such errors. They need to conduct necessary checks and outlier checks to edit the raw edit and make it ready for analysis.

Phase III: Data Coding

Out of all three, this is the most critical phase of data preparation associated with grouping and assigning values to the survey responses . If a survey is completed with a 1000 sample size, the researcher will create an age bracket to distinguish the respondents based on their age. Thus, it becomes easier to analyze small data buckets rather than deal with the massive data pile.

LEARN ABOUT: Steps in Qualitative Research

After the data is prepared for analysis, researchers are open to using different research and data analysis methods to derive meaningful insights. For sure, statistical analysis plans are the most favored to analyze numerical data. In statistical analysis, distinguishing between categorical data and numerical data is essential, as categorical data involves distinct categories or labels, while numerical data consists of measurable quantities. The method is again classified into two groups. First, ‘Descriptive Statistics’ used to describe data. Second, ‘Inferential statistics’ that helps in comparing the data .

Descriptive statistics

This method is used to describe the basic features of versatile types of data in research. It presents the data in such a meaningful way that pattern in the data starts making sense. Nevertheless, the descriptive analysis does not go beyond making conclusions. The conclusions are again based on the hypothesis researchers have formulated so far. Here are a few major types of descriptive analysis methods.

Measures of Frequency

  • Count, Percent, Frequency
  • It is used to denote home often a particular event occurs.
  • Researchers use it when they want to showcase how often a response is given.

Measures of Central Tendency

  • Mean, Median, Mode
  • The method is widely used to demonstrate distribution by various points.
  • Researchers use this method when they want to showcase the most commonly or averagely indicated response.

Measures of Dispersion or Variation

  • Range, Variance, Standard deviation
  • Here the field equals high/low points.
  • Variance standard deviation = difference between the observed score and mean
  • It is used to identify the spread of scores by stating intervals.
  • Researchers use this method to showcase data spread out. It helps them identify the depth until which the data is spread out that it directly affects the mean.

Measures of Position

  • Percentile ranks, Quartile ranks
  • It relies on standardized scores helping researchers to identify the relationship between different scores.
  • It is often used when researchers want to compare scores with the average count.

For quantitative research use of descriptive analysis often give absolute numbers, but the in-depth analysis is never sufficient to demonstrate the rationale behind those numbers. Nevertheless, it is necessary to think of the best method for research and data analysis suiting your survey questionnaire and what story researchers want to tell. For example, the mean is the best way to demonstrate the students’ average scores in schools. It is better to rely on the descriptive statistics when the researchers intend to keep the research or outcome limited to the provided  sample  without generalizing it. For example, when you want to compare average voting done in two different cities, differential statistics are enough.

Descriptive analysis is also called a ‘univariate analysis’ since it is commonly used to analyze a single variable.

Inferential statistics

Inferential statistics are used to make predictions about a larger population after research and data analysis of the representing population’s collected sample. For example, you can ask some odd 100 audiences at a movie theater if they like the movie they are watching. Researchers then use inferential statistics on the collected  sample  to reason that about 80-90% of people like the movie. 

Here are two significant areas of inferential statistics.

  • Estimating parameters: It takes statistics from the sample research data and demonstrates something about the population parameter.
  • Hypothesis test: I t’s about sampling research data to answer the survey research questions. For example, researchers might be interested to understand if the new shade of lipstick recently launched is good or not, or if the multivitamin capsules help children to perform better at games.

These are sophisticated analysis methods used to showcase the relationship between different variables instead of describing a single variable. It is often used when researchers want something beyond absolute numbers to understand the relationship between variables.

Here are some of the commonly used methods for data analysis in research.

  • Correlation: When researchers are not conducting experimental research or quasi-experimental research wherein the researchers are interested to understand the relationship between two or more variables, they opt for correlational research methods.
  • Cross-tabulation: Also called contingency tables,  cross-tabulation  is used to analyze the relationship between multiple variables.  Suppose provided data has age and gender categories presented in rows and columns. A two-dimensional cross-tabulation helps for seamless data analysis and research by showing the number of males and females in each age category.
  • Regression analysis: For understanding the strong relationship between two variables, researchers do not look beyond the primary and commonly used regression analysis method, which is also a type of predictive analysis used. In this method, you have an essential factor called the dependent variable. You also have multiple independent variables in regression analysis. You undertake efforts to find out the impact of independent variables on the dependent variable. The values of both independent and dependent variables are assumed as being ascertained in an error-free random manner.
  • Frequency tables: The statistical procedure is used for testing the degree to which two or more vary or differ in an experiment. A considerable degree of variation means research findings were significant. In many contexts, ANOVA testing and variance analysis are similar.
  • Analysis of variance: The statistical procedure is used for testing the degree to which two or more vary or differ in an experiment. A considerable degree of variation means research findings were significant. In many contexts, ANOVA testing and variance analysis are similar.
  • Researchers must have the necessary research skills to analyze and manipulation the data , Getting trained to demonstrate a high standard of research practice. Ideally, researchers must possess more than a basic understanding of the rationale of selecting one statistical method over the other to obtain better data insights.
  • Usually, research and data analytics projects differ by scientific discipline; therefore, getting statistical advice at the beginning of analysis helps design a survey questionnaire, select data collection  methods, and choose samples.

LEARN ABOUT: Best Data Collection Tools

  • The primary aim of data research and analysis is to derive ultimate insights that are unbiased. Any mistake in or keeping a biased mind to collect data, selecting an analysis method, or choosing  audience  sample il to draw a biased inference.
  • Irrelevant to the sophistication used in research data and analysis is enough to rectify the poorly defined objective outcome measurements. It does not matter if the design is at fault or intentions are not clear, but lack of clarity might mislead readers, so avoid the practice.
  • The motive behind data analysis in research is to present accurate and reliable data. As far as possible, avoid statistical errors, and find a way to deal with everyday challenges like outliers, missing data, data altering, data mining , or developing graphical representation.

LEARN MORE: Descriptive Research vs Correlational Research The sheer amount of data generated daily is frightening. Especially when data analysis has taken center stage. in 2018. In last year, the total data supply amounted to 2.8 trillion gigabytes. Hence, it is clear that the enterprises willing to survive in the hypercompetitive world must possess an excellent capability to analyze complex research data, derive actionable insights, and adapt to the new market needs.

LEARN ABOUT: Average Order Value

QuestionPro is an online survey platform that empowers organizations in data analysis and research and provides them a medium to collect data by creating appealing surveys.

MORE LIKE THIS

AI Question Generator

AI Question Generator: Create Easy + Accurate Tests and Surveys

Apr 6, 2024

ux research software

Top 17 UX Research Software for UX Design in 2024

Apr 5, 2024

Healthcare Staff Burnout

Healthcare Staff Burnout: What it Is + How To Manage It

Apr 4, 2024

employee retention software

Top 15 Employee Retention Software in 2024

Other categories.

  • Academic Research
  • Artificial Intelligence
  • Assessments
  • Brand Awareness
  • Case Studies
  • Communities
  • Consumer Insights
  • Customer effort score
  • Customer Engagement
  • Customer Experience
  • Customer Loyalty
  • Customer Research
  • Customer Satisfaction
  • Employee Benefits
  • Employee Engagement
  • Employee Retention
  • Friday Five
  • General Data Protection Regulation
  • Insights Hub
  • Life@QuestionPro
  • Market Research
  • Mobile diaries
  • Mobile Surveys
  • New Features
  • Online Communities
  • Question Types
  • Questionnaire
  • QuestionPro Products
  • Release Notes
  • Research Tools and Apps
  • Revenue at Risk
  • Survey Templates
  • Training Tips
  • Uncategorized
  • Video Learning Series
  • What’s Coming Up
  • Workforce Intelligence

Logo for Open Educational Resources

Chapter 16. Archival and Historical Research

Introduction.

The British sociologist John Goldthorpe ( 2000 ) once remarked, “Any sociologist who is concerned with a theory that can be tested in the present should so test it, in the first place; for it is, in all probability, in this way that it can be tested most rigorously” ( 33 ). Testing can be done through either qualitative or quantitative methods or some mixture of the two. But sometimes a theory cannot be tested in the present at all. What happens when the persons or phenomena we are interested in happened in the past? It’s hardly possible to interview the people involved in abolishing the slave trade, for example. Does this mean social scientists have no role to play in understanding past phenomena? Not at all. People leave traces behind, and although these traces may not be exactly as we would like them to be had we ordered them (as, in a way, we do when we construct an interview guide or a survey with the questions we want answered), they are nevertheless full of potential for exploration and analysis. For examining traces left by persons, we turn to archival methods, the subject of this chapter.

methods of data analysis in historical research

Things happening in the past are not the only reason we turn to archival methods. Sometimes, the people we are interested in are inaccessible to us for other reasons. For example, we are probably not going to be able to sit down and ask Mark Zuckerberg, Bill Gates, and Jeff Bezos a long list of questions about what it is like to be wealthy. And it is even more unlikely that we can get into the boardrooms of Facebook (Meta), Microsoft, or Amazon to watch how corporate decisions are made. But these men and these companies still leave traces, through public records, media reportage, and public meeting minutes. We can use archival methods here too. They might not be quite as good as face-to-face interviews with billionaires or deep ethnographies of corporate culture, but they are nevertheless valid forms of research with much to tell us.

This chapter introduces archival methods of data collection. We begin by exploring in more detail why and when archival methods should be employed and with what limitations. We then discuss the importance of special collections and archives as potential gold mines for social science research. We will explain how to access these places, for what purposes, and how to begin to make sense of what you find.

Disciplinary Segue: Why Social Scientists Don’t Leave Archives to the Historians

One might suppose that only historians look at the past and that historical archives are no place for social scientists. Goldthorpe ( 2000 ) even suggested this. But it would be a mistake to leave historical analyses entirely to historians because historians “typically do not understand our [social science] intellectual and organizational projects.…Social scientists must learn to use the materials that historians have staked out traditionally as their own” ( Hill 1993:4 ). The key difference for our purposes between history and social science is how each discipline understands the goals of its work and how to understand social life. Historians are (mostly) committed to an idiographic approach, where each case is explored to understand itself (this is the “idios” part, where ιδιοs is Ancient Greek for single self). [1] As an example of an idiographic approach, a historian might study the events of January 6, 2021, to understand how a violent mob attempted to stop the electoral count. This might mean tracing motivations back to beliefs in fanciful conspiracies, measuring the impact of Donald Trump’s rhetoric on the violence, or any number of interesting facts and circumstances about that day and what led up to it. But the focus would remain on understanding this case itself. In contrast, social scientists are (mostly) committed to nomothetic research, in which generalizations about the social world are made to understand large-scale social patterns. [2] Whether this generalization is statistical, as quantitative research produces (e.g., we can predict this outcome in other cases and places based on measurable relationships among variables), or theoretical, as qualitative research produces (e.g., we can expect to find similar patterns between conspiratorial belief and action), the point of (most) social science research is to explain the world in such a way that we can possibly expect (if not outright predict) what will happen or be believed in a different place and time . Social scientists are engaged in this “scientific” project of prediction (loosely understood), while historians are (usually) not. It is for this reason that social scientists should not leave archival research to the historians!

When to Use Archival Materials

As mentioned above, sometimes the people we want to hear from or observe are simply not available to us. This may be because they are no longer living or because they are unwilling or unable to be part of a research study, as in the case of elites (e.g., CEOs of Fortune 500 companies, political leaders and other public figures, the very wealthy). In both cases, you might wonder about the ethics of studying people who have not given written consent to be studied. But using archival and historical sources as your research data is not the same thing as studying persons (“human subjects”). When we use archival and historical sources, we are examining the traces that people and institutions have left. Institutional review boards (IRBs) do not have jurisdiction in this area, although we still want to consider the ethics of our research and try to respect privacy and confidentiality when warranted.

In addition to using archival and historical sources when people are inaccessible, there are other reasons we might want to collect this data. First, we may want to explore the generalized discourse about a phenomenon. [3] For example, perhaps we want to understand the historical context of the 2016 US presidential election, so we think it is important to go back in time and collect data that will more vividly paint a picture of how people at the time were evaluating and experiencing the election. We might use archives to collect data about what people were saying about the third presidential debate in 2016 between candidates Hillary Clinton and Donald Trump. There are many ways we could go about doing that. We could sample local and national newspapers and collect op-eds and letters to the editor about the debate. Perhaps we can get Twitter feeds #thirddebate , or perhaps some librarian in 2016 collected oral histories of people’s reactions the day after. Unlike previous person-focused qualitative research strategies, where we carefully create a research design that allows us to construct data through questioning and observing, we will spend our time tracking down data and finding out what possibly exists.

A second (or third) reason to employ these archival and historical sources is that we are interested in the historical “record” as the phenomenon itself. We want to know what was written down by Acme Company in letters to its shareholders from 1945 to 1960 about its Acme Pocket Sled (which had the unfortunate habit of accelerating and hurling its bearers off cliffs). [4] Our interest here is not in any particular human subject but in the record left by the company. If we were forced to employ interviews or observational methods to get this record, we could interview current and former employees of Acme or shareholders who received letters from the company, but all of this would actually be second best because what the employees and shareholders remember would probably be nowhere near as accurate as what the records reflect. I once did a study of the development of US political party platforms over the course of the nineteenth century, using a huge volume I randomly found in the library ( Hurst 2010b ). The volume recorded each party’s platform by election year, so I could trace how parties talked about and included “class” and “class inequality” in their platforms. This allowed me to show how third parties pushed the two major parties toward some recognition of labor rights over time. There was obviously no way to get at this information through interviews or observations.

Finally, archival and historical sources are often used to supplement other qualitative data collection as a form of verification through triangulation. Perhaps you interviewed several Starbucks employees in 2021 about their experiences working for the company, particularly how the company responded to labor organizing attempts. You might also search official Starbucks company records to compare and contrast the official line with the experience of workers. Alternatively, you could collect media coverage of local organizing campaigns that might include quotes or statements from Starbucks representatives. The best and most convincing qualitative researchers often employ archival and historical material in this way. In addition to providing verification through triangulation, supplementing your data with these sources can deepen contextualization. I encourage you to think about what possible archival and historical sources could strengthen any interview or observational-based study you are designing. [5]

How to Find Archival and Historical Sources

People and institutions leave traces in a variety of ways. This section documents some of those ways with the hope that the possibilities listed here will inspire you to explore further.

It might help to distinguish between public and private sources. Many public archives have dedicated web addresses so you can search them from anywhere. More on those below. Private individuals are more likely to have donated personal information to particular archives, perhaps the archival center associated with the college they attended. Famous and not-so-famous people’s diaries and letters are often searchable in particular university archives. Each former US president has his (!) own dedicated national archive. Towns and cities often house interesting historical records in their public libraries. Archivists and librarians at special archives have often done monumental work creating and curating collections of various kinds. Oregon State University’s Special Collections and Archives Research Center (SCARC) is no exception. In addition to a ton of material related to the history of the university, including private diaries of students, financial aid records, and photographs of carpentry classes from the nineteenth century, the librarians have documented the experiences of LGBTQ people within OSU and Corvallis, the history of hops and brewing in the Northwest, and the history of natural resources in the Pacific Northwest, especially around agriculture and forestry.

Oregon State University’s Special Collections and Archives, The Douglas Strain Reading Room.

It can be overwhelming to think about where to start. Being strategic about your use of archival and historical material is often a large part of an effective research plan. Here are some options for kinds of materials to explore:

Public archives include the following:

  • Commercial media accounts . These are anything written, drawn, or recorded that is produced for a general audience, including newspapers, books, magazines, television program transcripts, drawn comics, and so on.
  • Where to find these: special collections, online newspaper/magazine databases, collected publications [6]
  • Examples: Time Magazine Vault is completely free and covers everything the magazine published from 1923 to today; Harper’s Magazine archives go back to 1859; Internet Archive’s Ebony collection is a wealth of historically important images and stories about African American life in the twentieth century and covers the magazine from 1945 to 2015.
  • Actuarial and military records . These include birth and death records, records of marriages and divorces, applications for insurance and credit, military service records, and cemeteries (gravestones).
  • Where to find these: state archives/state vital records offices, US Census / government agencies, US National Archives
  • Examples: USAgov/genealogy will help you walk through the ordering of various vital records related to ancestry; US Census 1950 includes information on household size and occupation for all persons living in the US in 1950; [7] your local historical cemetery will have lots of information recorded on gravestones of possible historical use, as the case where deaths are clustered around a particular point in time or where military service is involved.
  • Official and quasi-official documentary records . These include organization meeting minutes, reports to shareholders, interoffice memos, company emails, company newsletters, and so on.
  • Where to find these: Historical records are often donated to a special collection or are even included in an official online database. More recent records may have been “leaked” to the public, as in the case of the Democratic National Committee’s emails in 2016 or the Panama (2016) and Pandora (2021) Papers leaks. The National Archives are also a great source for official documentary records of the US and its various organizations and branches (e.g., Supreme Court, US Patent Office).
  • Examples: The Forest History Society’s Weyerhauser Collection holds correspondences, director and executive files, branch and region files, advertising materials, oral histories, scrapbooks, publications, photographs, and audio/visual items documenting the activities of the Pacific Northwest timber company from its inception in 1864 through to 2010; the National Archive’s Lewis and Clark documents include presidential correspondences and a list of “presents” received from Native Americans.
  • Governmental and legislative documentary records
  • Where to find these: National Archives, state archives, Library of Congress, governmental agency records (often available in public libraries)
  • Example: Records of the Supreme Court of the United States are housed in the National Archives and include scrapbooks from 1880 to 1935 on microfilm, sound recordings, and case files going back to 1792.

Private archives include the following:

  • Autobiographies and memoirs . These might have been published, but they are just as likely to have been written for oneself or one’s family, with no intention of publication. Some of these have been digitized, but others will require an actual visit to the site to see the physical object itself.
  • Where to find these: if not published, special collections and archives
  • Example: John Adger McCrary graduated from Clemson University in 1898, where he received a degree in mechanical and electrical engineering. After graduation, he was stationed at the Washington Navy Yard as senior mechanical engineer. He donated a 1939 unpublished memoir regarding the early days of Clemson College, which includes a description of the first dormitory being built by convict labor.
  • Diaries and letters . These are probably not intended for publication; rather, they are contemporaneous private accounts and correspondences. Some of these have been digitized, but others will require an actual visit to the site to see the physical object itself.
  • Where to find these: special collections and archives, Library of Congress for notable persons’ diaries and letters
  • Examples: Abraham Lincoln’s Papers housed in the Library of Congress; Diary of Ella Mae Cloake , an OSU student, from 1941 to 1944, documenting her daily activities as a high school and college student in Oregon during World War II, located in OSU Special Collections and Archives
  • Home movies, videos, photographs of various kinds . These include drawings and sketches, recordings of places seen and visited, scrapbooks, and other ephemera. People leave traces in various forms, so it is best not to confine yourself solely to what has been written.
  • Where to find these: special collections and archives, Library of Congress, Smithsonian
  • Example: The McMenamins Brewery Collection at OSU SCARC includes digitized brew sheets, digital images, brochures, coasters, decals, event programs, flyers, newspaper clippings, tap handles, posters, labels, a wooden cask, and a six-pack of Hammerhead beer.
  • Oral histories . Oral histories are recorded and often transcribed interviews of various persons for purposes of historic documentation. To the untrained eye, they appear to be qualitative “interviews,” but they are in fact specifically excluded from IRB jurisdiction because their purpose is documentation, not research.
  • Where to find these: special collections and archives; Smithsonian
  • Examples: Many archivists and librarians are involved in the collecting of such oral histories, often with a particular theme in mind or to strengthen a particular collection. For example, OSU’s SCARC has an Oregon Multicultural Archive, which includes oral histories that document the experiences and perspectives of people of color in Oregon. The Smithsonian is another great resource on a wide variety of historical events and persons.

How to Find Special Collections and Archives

Although much material has been “digitized” and is thus searchable online, the vast majority of private archival material, including ephemera like scrapbooks and beer coasters, is only available “on site.” Qualitative researchers who employ archival and historical sources must often travel to special collections to find the material they are interested in. Often, the material they want has never really been looked at by another researcher. It may belong to a general catalog entry (such as “Student Scrapbooks, 1930–1950”). For official records at the city or county level, travel to the records office or local public library is often required to access the desired material. You will want to consider what kinds of material are available and what kinds of access are required for that material in your research plan.

The good news is that, even if much material has not been digitized, there are general searchable databases for most archives. If you have a particular topic of interest, you can run a general web search and include the topic and “archives” or “special collection.” The more public and well known the entity, the more likely you will find digitally available material or special collections dedicated to the person or phenomenon. Or you might find an archive housed one thousand miles away that is happy to work with you on a visit. Some researchers become very familiar with a particular collection or database and tend to rely on that in their research. As you gain experience with historical documents, you will find it easier to narrow down your searches. One great place to start, though, is your college or university archives. And the librarians who work there will be more than happy to help answer your questions about both the particular collections housed there and how to do archival research in general.

What to Do with All That Content

Once you have found a collection or body of material, what do you do with that? Analyzing content will be discussed in some detail in chapter 17, but for now, let’s think about what can be made of this kind of material and what cannot. As Goldthorpe ( 2000 ) suggested, using historical material or traces left by people is sometimes second best to actually talking to people or observing them in action. We have to be very clear about recognizing the limitations of what we find in the archives.

First, not everything produced manages to survive the ravages of time. Without digitization, historical records are vulnerable to a host of destroyers. Some vital records get destroyed when the local registry burns down, for example. Some memoirs or diaries are destroyed from mildew while sitting in a box in the basement. Photographs get torn up. Boxes of records get accidentally thrown in the garbage. We call this the historical-trace problem. What we have in front of us is thus probably not the entire record of whatever it is we are looking for.

Second, what gets collected is itself often related to who has power and who is perceived as being worthy of recording and collection. This is why projects like OSU’s multicultural archives are so important, as librarians intervene to ensure that it is not only the stories (diaries, papers) of the powerful that are found in the archives. If one were to read all the newspaper editorials from the nineteenth century, one would learn a lot about particular White men’s thoughts on current events but very little from women or people of color or working-class people. This is the power problem of archives, and we need to be aware of it, especially when we are using historical material to build a context of what a time or place was like. What it was like for whom always needs to be properly addressed.

Third, there are issues related to truth telling and audience. There are no at-the-moment credibility checks on the materials you find in archives. Although we think people tend to write honestly in their personal journals, we don’t actually know if this is the case—what about the person who expected to be famous and writes for an imagined posterity? There could be significant gaps and even falsehoods in such an account. People can lie to themselves too, which is something qualitative researchers know well (and partly the reason ethnographers favor observation over interviews). Despite the absence of credibility checks, historical documents sometimes appear more honest simply by having survived for so long. It is important to remember that they are prone to all the same problems as contemporaneously collected data. A diary by a planter in South Carolina in the 1840s is no more and often less truthful to the facts than an interview would have been had it been possible. Newspapers and magazines have always targeted particular audiences—a fact we understand about our own media (e.g., Fox News is hardly “fair and balanced” toward Democrats) but something we are prone to overlook when reading historic media stories.

Whenever using archival or historical sources, then, it is important to clearly identify and state the limitations of their use and any intended audience. In the case of diaries of Southern planters from the 1840s, “This is the story we get told from the point of view of relatively elite White men whose work was collected and safeguarded (and not destroyed) for posterity.” Or in the case of a Harper’s Magazine story from the 1950s, “This is an understanding of Eisenhower politics by a liberal magazine read by a relatively well-educated and affluent audience.”

Collecting the data for an archival-based study is just the beginning. Once you have downloaded all the advertisements from Men’s Health or compiled all the tweets put out on January 6 or scanned all the photographs of the childcare center in the 1950s, you will need to start “analyzing” it. What does that analysis entail? That is the subject of our next several chapters.

Further Readings

Baker, Alan R. H. 1997. “‘The Dead Don’t Answer Questionnaires’: Researching and Writing Historical Geography.” Journal of Geography in Higher Education 21(2):231–243. Among other things, this article discusses the problems associated with making geographical interpretations from historical sources.

Benzecry, Claudio, Andrew Deener, and Armando Lara-Millán. 2020. “Archival Work as Qualitative Sociology.” Qualitative Sociology 43:297–303. An editorial foreword to an issue of Qualitative Sociology dedicated to archival research briefly describing included articles (many of which you may want to read). Distinguishes the “heroic moment of data accumulation” from the “ascetic and sober exercise of distancing oneself from the data, analyzing it, and communicating the meaning to others.” For advanced students only.

Bloch, Marc. 1954. The Historian’s Craft . Manchester: Manchester University Press. A classic midcentury statement of what history is and does from a research perspective. Bloch’s particular understanding and approach to history has resonance for social science too.

Fones-Wolf, Elizabeth A. 1994. Selling Free Enterprise: The Business Assault on Labor and Liberalism, 1945–60 . Urbana: University of Illinois Press.* Using corporate records, published advertisements, and congressional testimony (among other sources), Fones-Wolf builds an impressive account of a coordinated corporate campaign against labor unions and working people in the postwar years.

Hill, Michael R. 1993. Archival Strategies and Techniques . Thousand Oaks, CA: SAGE. Guidebook to archival research. For advanced students.

Moore, Niamh, Andrea Salter, Liz Stanley, and Maria Tamboukou. 2017. The Archive Project: Archival Research in the Social Sciences . London: Routledge. An advanced collection of essays on various methodological ideas and debates in archival research.

Stoler, Ann Laura. 2009. Along the Archival Grain: Epistemic Anxieties and Colonial Common Sense . Princeton, NJ: Princeton University Press.* A difficult but rewarding read for advanced students. Using archives in Indonesia, Stoler explores the history of colonialism and the making of racialized classes while also proposing and demonstrating innovative archival methodologies.

Wilder, Craig Stevens. 2014. Ebony and Ivory: Race, Slavery, and the Troubled History of America’s Universities . London: Bloomsbury.* Although perhaps more history than social science, this is a great example of using university archival data to tell a story about national development, racism, and the role of universities.

  • This is where the word idiot comes from as well; in Ancient Greece, failing to participate in collective democracy making was seen as “idiotic”—or, put another way, selfish. ↵
  • This word also comes from Greek roots, although it was created recently (we often rummage around in Ancient Greek and Latin when we come up with new concepts!). In Greek, nomos (νομος) means “law.” The use here makes much of the generation of laws or regularities about the social world in the sense of Newton’s “law” of gravity. ↵
  • If this is your interest, see also chapter 17, “Content Analysis”! ↵
  • For those of you too young to remember, this was a standard plot of Looney Tunes cartoons featuring Wile E. Coyote ( Frazier 1990 ). ↵
  • Note that this would be an example of strength through multiple methods rather than strength through mixed methods (chapter 15). The former deepens the contextualization, while the latter increases the overall validity of the findings. ↵
  • Such as that volume of party platforms I stumbled across in the library! ↵
  • US Census material becomes available to the public seventy years after collection; Census data from the 1950s recently became available for the very first time. ↵

A form of social science research that generally follows the scientific method as established in the natural sciences.  In contrast to idiographic research , the nomothetic researcher looks for general patterns and “laws” of human behavior and social relationships.  Once discovered, these patterns and laws will be expected to be widely applicable.  Quantitative social science research is nomothetic because it seeks to generalize findings from samples to larger populations.  Most qualitative social science research is also nomothetic, although generalizability is here understood to be theoretical in nature rather than statistical .  Some qualitative researchers, however, espouse the idiographic research paradigm instead.

An administrative body established to protect the rights and welfare of human research subjects recruited to participate in research activities conducted under the auspices of the institution with which it is affiliated. The IRB is charged with the responsibility of reviewing all research involving human participants. The IRB is concerned with protecting the welfare, rights, and privacy of human subjects. The IRB has the authority to approve, disapprove, monitor, and require modifications in all research activities that fall within its jurisdiction as specified by both the federal regulations and institutional policy.

Introduction to Qualitative Research Methods Copyright © 2023 by Allison Hurst is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License , except where otherwise noted.

Home

Comprehensive Guide to Historical Research Methods

Historical research is a meticulous process that involves the exploration and analysis of past events, societies, and cultures. This comprehensive guide aims to assist researchers and historians in navigating the intricate landscape of historical research. From understanding different methodologies to utilizing archival resources effectively, this guide provides valuable insights into the diverse approaches to historical inquiry.

Section 1: Overview of Historical Research Methodologies

1.1 Historiography and Approaches:

  • Explanation of major historiographical approaches (e.g., social history, cultural history, political history) and their significance in shaping historical research.

1.2 Comparative History:

  • Exploration of the comparative method in historical research, highlighting its uses in identifying patterns, differences, and causal relationships across different contexts.

1.3 Quantitative and Qualitative Methods:

  • Differentiation between quantitative and qualitative historical research methods, with examples of how each can be applied to historical inquiries.

Section 2: Conducting Archival Research

2.1 Introduction to Archival Research:

  • Definition and importance of archival research in historical studies, emphasizing the role of primary sources in reconstructing the past.

2.2 Selecting Archives and Repositories:

  • Guidance on choosing the right archives and repositories based on research goals, with a focus on both physical and digital collections.

2.3 Effective Use of Primary Sources:

  • Tips on extracting valuable information from primary sources, including letters, diaries, newspapers, and official documents.

2.4 Paleography and Manuscript Analysis:

  • Introduction to paleography (the study of ancient handwriting) and techniques for analyzing handwritten documents and manuscripts.

Section 3: Analyzing Historical Documents

3.1 Critical Source Analysis:

  • Techniques for critically evaluating historical sources, considering factors such as authorship, context, bias, and reliability.

3.2 Content Analysis:

  • Overview of content analysis methods for systematically examining and interpreting the content of historical documents.

3.3 Historical GIS (Geographic Information Systems):

  • Introduction to GIS applications in historical research, illustrating how spatial analysis can enhance the understanding of historical events and patterns.

Section 4: Writing and Presenting Historical Research

4.1 Structure of Historical Research Papers:

  • Guidelines for structuring a historical research paper, including the introduction, literature review, methodology, analysis, and conclusion.

4.2 Citation Styles in Historical Research:

  • Overview of common citation styles used in historical research, such as Chicago, MLA, and APA, with examples and guidelines.

4.3 Effective Presentation of Historical Findings:

  • Tips for presenting historical research in various formats, including academic articles, presentations, and exhibitions.

Conclusion:

This comprehensive guide aims to empower researchers and historians with the tools and knowledge needed to conduct robust historical research. By understanding the diverse methodologies and utilizing archival resources effectively, scholars can contribute meaningfully to the ongoing dialogue about the past.

Related Guides

Researcher's Realm

Methods of historical data analysis and criticism in historical research

  • Vo Van That Saigon University, Hochiminh City, Vietnam
  • Pham Phuc Vinh Saigon University, Ho Chi Minh City, Vietnam
  • Mai Quoc Dung University of Food Industry Ho Chi Minh City, Vietnam

historical research, historical sources play a decisive role in the quality of historical research products. History is reflected in historical sources and through historical documents, historians can learn about history. In order to obtain an objective and reliable source of historical data to reconstruct the past, historians must adhere to the principles and methods of analyzing and criticizing historical data. This research article describes methods and techniques for analyzing and criticizing historical data in historical research

More information about the publishing system, Platform and Workflow by OJS/PKP.

Data analysis in oral history: A new approach in historical research

Affiliations.

  • 1 Department of Medical Surgical nursing, Faculty of Nursing and Midwifery, Zabol University of Medical Sciences, Zabol, Iran.
  • 2 Nursing and Midwifery Care Research Center, Faculty of Nursing and Midwifery, Isfahan University of Medical Sciences, Isfahan, Iran.
  • PMID: 25878689
  • PMCID: PMC4387636

Background: Historical research has limitations in applying proper and credit-worthy chronology to clarify the data. In this methodology, the application of oral history is one of the ways in which answers to questions addressed by the research theme are elicited. Oral history, as a clear and transparent tool, needs to be applied with guidelines for qualitative researchers regarding data analysis limitations from oral evidence and face-to-face contact. Therefore, the development of a systematic method for data analysis is needed to obtain accurate answers, based on which a credit-worthy narration can be produced. The aim of this study was to introduce an ethical and objective approach for the analysis of data obtained from oral history.

Materials and methods: This is a methodological article that suggests an analysis method based on qualitative approach and experiences of the authors.

Results: A systematic method of data analysis for oral history research, based on common qualitative data analysis methods, has been suggested as the result of this article.

Conclusions: This new technique is equipped with measures that would assist qualitative researchers in the nursing field and other disciplines regarding analysis of qualitative data resulting from oral history studies.

Keywords: Data analysis; Iran; nursing; oral history; qualitative research.

Your Modern Business Guide To Data Analysis Methods And Techniques

Data analysis methods and techniques blog post by datapine

Table of Contents

1) What Is Data Analysis?

2) Why Is Data Analysis Important?

3) What Is The Data Analysis Process?

4) Types Of Data Analysis Methods

5) Top Data Analysis Techniques To Apply

6) Quality Criteria For Data Analysis

7) Data Analysis Limitations & Barriers

8) Data Analysis Skills

9) Data Analysis In The Big Data Environment

In our data-rich age, understanding how to analyze and extract true meaning from our business’s digital insights is one of the primary drivers of success.

Despite the colossal volume of data we create every day, a mere 0.5% is actually analyzed and used for data discovery , improvement, and intelligence. While that may not seem like much, considering the amount of digital information we have at our fingertips, half a percent still accounts for a vast amount of data.

With so much data and so little time, knowing how to collect, curate, organize, and make sense of all of this potentially business-boosting information can be a minefield – but online data analysis is the solution.

In science, data analysis uses a more complex approach with advanced techniques to explore and experiment with data. On the other hand, in a business context, data is used to make data-driven decisions that will enable the company to improve its overall performance. In this post, we will cover the analysis of data from an organizational point of view while still going through the scientific and statistical foundations that are fundamental to understanding the basics of data analysis. 

To put all of that into perspective, we will answer a host of important analytical questions, explore analytical methods and techniques, while demonstrating how to perform analysis in the real world with a 17-step blueprint for success.

What Is Data Analysis?

Data analysis is the process of collecting, modeling, and analyzing data using various statistical and logical methods and techniques. Businesses rely on analytics processes and tools to extract insights that support strategic and operational decision-making.

All these various methods are largely based on two core areas: quantitative and qualitative research.

To explain the key differences between qualitative and quantitative research, here’s a video for your viewing pleasure:

Gaining a better understanding of different techniques and methods in quantitative research as well as qualitative insights will give your analyzing efforts a more clearly defined direction, so it’s worth taking the time to allow this particular knowledge to sink in. Additionally, you will be able to create a comprehensive analytical report that will skyrocket your analysis.

Apart from qualitative and quantitative categories, there are also other types of data that you should be aware of before dividing into complex data analysis processes. These categories include: 

  • Big data: Refers to massive data sets that need to be analyzed using advanced software to reveal patterns and trends. It is considered to be one of the best analytical assets as it provides larger volumes of data at a faster rate. 
  • Metadata: Putting it simply, metadata is data that provides insights about other data. It summarizes key information about specific data that makes it easier to find and reuse for later purposes. 
  • Real time data: As its name suggests, real time data is presented as soon as it is acquired. From an organizational perspective, this is the most valuable data as it can help you make important decisions based on the latest developments. Our guide on real time analytics will tell you more about the topic. 
  • Machine data: This is more complex data that is generated solely by a machine such as phones, computers, or even websites and embedded systems, without previous human interaction.

Why Is Data Analysis Important?

Before we go into detail about the categories of analysis along with its methods and techniques, you must understand the potential that analyzing data can bring to your organization.

  • Informed decision-making : From a management perspective, you can benefit from analyzing your data as it helps you make decisions based on facts and not simple intuition. For instance, you can understand where to invest your capital, detect growth opportunities, predict your income, or tackle uncommon situations before they become problems. Through this, you can extract relevant insights from all areas in your organization, and with the help of dashboard software , present the data in a professional and interactive way to different stakeholders.
  • Reduce costs : Another great benefit is to reduce costs. With the help of advanced technologies such as predictive analytics, businesses can spot improvement opportunities, trends, and patterns in their data and plan their strategies accordingly. In time, this will help you save money and resources on implementing the wrong strategies. And not just that, by predicting different scenarios such as sales and demand you can also anticipate production and supply. 
  • Target customers better : Customers are arguably the most crucial element in any business. By using analytics to get a 360° vision of all aspects related to your customers, you can understand which channels they use to communicate with you, their demographics, interests, habits, purchasing behaviors, and more. In the long run, it will drive success to your marketing strategies, allow you to identify new potential customers, and avoid wasting resources on targeting the wrong people or sending the wrong message. You can also track customer satisfaction by analyzing your client’s reviews or your customer service department’s performance.

What Is The Data Analysis Process?

Data analysis process graphic

When we talk about analyzing data there is an order to follow in order to extract the needed conclusions. The analysis process consists of 5 key stages. We will cover each of them more in detail later in the post, but to start providing the needed context to understand what is coming next, here is a rundown of the 5 essential steps of data analysis. 

  • Identify: Before you get your hands dirty with data, you first need to identify why you need it in the first place. The identification is the stage in which you establish the questions you will need to answer. For example, what is the customer's perception of our brand? Or what type of packaging is more engaging to our potential customers? Once the questions are outlined you are ready for the next step. 
  • Collect: As its name suggests, this is the stage where you start collecting the needed data. Here, you define which sources of data you will use and how you will use them. The collection of data can come in different forms such as internal or external sources, surveys, interviews, questionnaires, and focus groups, among others.  An important note here is that the way you collect the data will be different in a quantitative and qualitative scenario. 
  • Clean: Once you have the necessary data it is time to clean it and leave it ready for analysis. Not all the data you collect will be useful, when collecting big amounts of data in different formats it is very likely that you will find yourself with duplicate or badly formatted data. To avoid this, before you start working with your data you need to make sure to erase any white spaces, duplicate records, or formatting errors. This way you avoid hurting your analysis with bad-quality data. 
  • Analyze : With the help of various techniques such as statistical analysis, regressions, neural networks, text analysis, and more, you can start analyzing and manipulating your data to extract relevant conclusions. At this stage, you find trends, correlations, variations, and patterns that can help you answer the questions you first thought of in the identify stage. Various technologies in the market assist researchers and average users with the management of their data. Some of them include business intelligence and visualization software, predictive analytics, and data mining, among others. 
  • Interpret: Last but not least you have one of the most important steps: it is time to interpret your results. This stage is where the researcher comes up with courses of action based on the findings. For example, here you would understand if your clients prefer packaging that is red or green, plastic or paper, etc. Additionally, at this stage, you can also find some limitations and work on them. 

Now that you have a basic understanding of the key data analysis steps, let’s look at the top 17 essential methods.

17 Essential Types Of Data Analysis Methods

Before diving into the 17 essential types of methods, it is important that we go over really fast through the main analysis categories. Starting with the category of descriptive up to prescriptive analysis, the complexity and effort of data evaluation increases, but also the added value for the company.

a) Descriptive analysis - What happened.

The descriptive analysis method is the starting point for any analytic reflection, and it aims to answer the question of what happened? It does this by ordering, manipulating, and interpreting raw data from various sources to turn it into valuable insights for your organization.

Performing descriptive analysis is essential, as it enables us to present our insights in a meaningful way. Although it is relevant to mention that this analysis on its own will not allow you to predict future outcomes or tell you the answer to questions like why something happened, it will leave your data organized and ready to conduct further investigations.

b) Exploratory analysis - How to explore data relationships.

As its name suggests, the main aim of the exploratory analysis is to explore. Prior to it, there is still no notion of the relationship between the data and the variables. Once the data is investigated, exploratory analysis helps you to find connections and generate hypotheses and solutions for specific problems. A typical area of ​​application for it is data mining.

c) Diagnostic analysis - Why it happened.

Diagnostic data analytics empowers analysts and executives by helping them gain a firm contextual understanding of why something happened. If you know why something happened as well as how it happened, you will be able to pinpoint the exact ways of tackling the issue or challenge.

Designed to provide direct and actionable answers to specific questions, this is one of the world’s most important methods in research, among its other key organizational functions such as retail analytics , e.g.

c) Predictive analysis - What will happen.

The predictive method allows you to look into the future to answer the question: what will happen? In order to do this, it uses the results of the previously mentioned descriptive, exploratory, and diagnostic analysis, in addition to machine learning (ML) and artificial intelligence (AI). Through this, you can uncover future trends, potential problems or inefficiencies, connections, and casualties in your data.

With predictive analysis, you can unfold and develop initiatives that will not only enhance your various operational processes but also help you gain an all-important edge over the competition. If you understand why a trend, pattern, or event happened through data, you will be able to develop an informed projection of how things may unfold in particular areas of the business.

e) Prescriptive analysis - How will it happen.

Another of the most effective types of analysis methods in research. Prescriptive data techniques cross over from predictive analysis in the way that it revolves around using patterns or trends to develop responsive, practical business strategies.

By drilling down into prescriptive analysis, you will play an active role in the data consumption process by taking well-arranged sets of visual data and using it as a powerful fix to emerging issues in a number of key areas, including marketing, sales, customer experience, HR, fulfillment, finance, logistics analytics , and others.

Top 17 data analysis methods

As mentioned at the beginning of the post, data analysis methods can be divided into two big categories: quantitative and qualitative. Each of these categories holds a powerful analytical value that changes depending on the scenario and type of data you are working with. Below, we will discuss 17 methods that are divided into qualitative and quantitative approaches. 

Without further ado, here are the 17 essential types of data analysis methods with some use cases in the business world: 

A. Quantitative Methods 

To put it simply, quantitative analysis refers to all methods that use numerical data or data that can be turned into numbers (e.g. category variables like gender, age, etc.) to extract valuable insights. It is used to extract valuable conclusions about relationships, differences, and test hypotheses. Below we discuss some of the key quantitative methods. 

1. Cluster analysis

The action of grouping a set of data elements in a way that said elements are more similar (in a particular sense) to each other than to those in other groups – hence the term ‘cluster.’ Since there is no target variable when clustering, the method is often used to find hidden patterns in the data. The approach is also used to provide additional context to a trend or dataset.

Let's look at it from an organizational perspective. In a perfect world, marketers would be able to analyze each customer separately and give them the best-personalized service, but let's face it, with a large customer base, it is timely impossible to do that. That's where clustering comes in. By grouping customers into clusters based on demographics, purchasing behaviors, monetary value, or any other factor that might be relevant for your company, you will be able to immediately optimize your efforts and give your customers the best experience based on their needs.

2. Cohort analysis

This type of data analysis approach uses historical data to examine and compare a determined segment of users' behavior, which can then be grouped with others with similar characteristics. By using this methodology, it's possible to gain a wealth of insight into consumer needs or a firm understanding of a broader target group.

Cohort analysis can be really useful for performing analysis in marketing as it will allow you to understand the impact of your campaigns on specific groups of customers. To exemplify, imagine you send an email campaign encouraging customers to sign up for your site. For this, you create two versions of the campaign with different designs, CTAs, and ad content. Later on, you can use cohort analysis to track the performance of the campaign for a longer period of time and understand which type of content is driving your customers to sign up, repurchase, or engage in other ways.  

A useful tool to start performing cohort analysis method is Google Analytics. You can learn more about the benefits and limitations of using cohorts in GA in this useful guide . In the bottom image, you see an example of how you visualize a cohort in this tool. The segments (devices traffic) are divided into date cohorts (usage of devices) and then analyzed week by week to extract insights into performance.

Cohort analysis chart example from google analytics

3. Regression analysis

Regression uses historical data to understand how a dependent variable's value is affected when one (linear regression) or more independent variables (multiple regression) change or stay the same. By understanding each variable's relationship and how it developed in the past, you can anticipate possible outcomes and make better decisions in the future.

Let's bring it down with an example. Imagine you did a regression analysis of your sales in 2019 and discovered that variables like product quality, store design, customer service, marketing campaigns, and sales channels affected the overall result. Now you want to use regression to analyze which of these variables changed or if any new ones appeared during 2020. For example, you couldn’t sell as much in your physical store due to COVID lockdowns. Therefore, your sales could’ve either dropped in general or increased in your online channels. Through this, you can understand which independent variables affected the overall performance of your dependent variable, annual sales.

If you want to go deeper into this type of analysis, check out this article and learn more about how you can benefit from regression.

4. Neural networks

The neural network forms the basis for the intelligent algorithms of machine learning. It is a form of analytics that attempts, with minimal intervention, to understand how the human brain would generate insights and predict values. Neural networks learn from each and every data transaction, meaning that they evolve and advance over time.

A typical area of application for neural networks is predictive analytics. There are BI reporting tools that have this feature implemented within them, such as the Predictive Analytics Tool from datapine. This tool enables users to quickly and easily generate all kinds of predictions. All you have to do is select the data to be processed based on your KPIs, and the software automatically calculates forecasts based on historical and current data. Thanks to its user-friendly interface, anyone in your organization can manage it; there’s no need to be an advanced scientist. 

Here is an example of how you can use the predictive analysis tool from datapine:

Example on how to use predictive analytics tool from datapine

**click to enlarge**

5. Factor analysis

The factor analysis also called “dimension reduction” is a type of data analysis used to describe variability among observed, correlated variables in terms of a potentially lower number of unobserved variables called factors. The aim here is to uncover independent latent variables, an ideal method for streamlining specific segments.

A good way to understand this data analysis method is a customer evaluation of a product. The initial assessment is based on different variables like color, shape, wearability, current trends, materials, comfort, the place where they bought the product, and frequency of usage. Like this, the list can be endless, depending on what you want to track. In this case, factor analysis comes into the picture by summarizing all of these variables into homogenous groups, for example, by grouping the variables color, materials, quality, and trends into a brother latent variable of design.

If you want to start analyzing data using factor analysis we recommend you take a look at this practical guide from UCLA.

6. Data mining

A method of data analysis that is the umbrella term for engineering metrics and insights for additional value, direction, and context. By using exploratory statistical evaluation, data mining aims to identify dependencies, relations, patterns, and trends to generate advanced knowledge.  When considering how to analyze data, adopting a data mining mindset is essential to success - as such, it’s an area that is worth exploring in greater detail.

An excellent use case of data mining is datapine intelligent data alerts . With the help of artificial intelligence and machine learning, they provide automated signals based on particular commands or occurrences within a dataset. For example, if you’re monitoring supply chain KPIs , you could set an intelligent alarm to trigger when invalid or low-quality data appears. By doing so, you will be able to drill down deep into the issue and fix it swiftly and effectively.

In the following picture, you can see how the intelligent alarms from datapine work. By setting up ranges on daily orders, sessions, and revenues, the alarms will notify you if the goal was not completed or if it exceeded expectations.

Example on how to use intelligent alerts from datapine

7. Time series analysis

As its name suggests, time series analysis is used to analyze a set of data points collected over a specified period of time. Although analysts use this method to monitor the data points in a specific interval of time rather than just monitoring them intermittently, the time series analysis is not uniquely used for the purpose of collecting data over time. Instead, it allows researchers to understand if variables changed during the duration of the study, how the different variables are dependent, and how did it reach the end result. 

In a business context, this method is used to understand the causes of different trends and patterns to extract valuable insights. Another way of using this method is with the help of time series forecasting. Powered by predictive technologies, businesses can analyze various data sets over a period of time and forecast different future events. 

A great use case to put time series analysis into perspective is seasonality effects on sales. By using time series forecasting to analyze sales data of a specific product over time, you can understand if sales rise over a specific period of time (e.g. swimwear during summertime, or candy during Halloween). These insights allow you to predict demand and prepare production accordingly.  

8. Decision Trees 

The decision tree analysis aims to act as a support tool to make smart and strategic decisions. By visually displaying potential outcomes, consequences, and costs in a tree-like model, researchers and company users can easily evaluate all factors involved and choose the best course of action. Decision trees are helpful to analyze quantitative data and they allow for an improved decision-making process by helping you spot improvement opportunities, reduce costs, and enhance operational efficiency and production.

But how does a decision tree actually works? This method works like a flowchart that starts with the main decision that you need to make and branches out based on the different outcomes and consequences of each decision. Each outcome will outline its own consequences, costs, and gains and, at the end of the analysis, you can compare each of them and make the smartest decision. 

Businesses can use them to understand which project is more cost-effective and will bring more earnings in the long run. For example, imagine you need to decide if you want to update your software app or build a new app entirely.  Here you would compare the total costs, the time needed to be invested, potential revenue, and any other factor that might affect your decision.  In the end, you would be able to see which of these two options is more realistic and attainable for your company or research.

9. Conjoint analysis 

Last but not least, we have the conjoint analysis. This approach is usually used in surveys to understand how individuals value different attributes of a product or service and it is one of the most effective methods to extract consumer preferences. When it comes to purchasing, some clients might be more price-focused, others more features-focused, and others might have a sustainable focus. Whatever your customer's preferences are, you can find them with conjoint analysis. Through this, companies can define pricing strategies, packaging options, subscription packages, and more. 

A great example of conjoint analysis is in marketing and sales. For instance, a cupcake brand might use conjoint analysis and find that its clients prefer gluten-free options and cupcakes with healthier toppings over super sugary ones. Thus, the cupcake brand can turn these insights into advertisements and promotions to increase sales of this particular type of product. And not just that, conjoint analysis can also help businesses segment their customers based on their interests. This allows them to send different messaging that will bring value to each of the segments. 

10. Correspondence Analysis

Also known as reciprocal averaging, correspondence analysis is a method used to analyze the relationship between categorical variables presented within a contingency table. A contingency table is a table that displays two (simple correspondence analysis) or more (multiple correspondence analysis) categorical variables across rows and columns that show the distribution of the data, which is usually answers to a survey or questionnaire on a specific topic. 

This method starts by calculating an “expected value” which is done by multiplying row and column averages and dividing it by the overall original value of the specific table cell. The “expected value” is then subtracted from the original value resulting in a “residual number” which is what allows you to extract conclusions about relationships and distribution. The results of this analysis are later displayed using a map that represents the relationship between the different values. The closest two values are in the map, the bigger the relationship. Let’s put it into perspective with an example. 

Imagine you are carrying out a market research analysis about outdoor clothing brands and how they are perceived by the public. For this analysis, you ask a group of people to match each brand with a certain attribute which can be durability, innovation, quality materials, etc. When calculating the residual numbers, you can see that brand A has a positive residual for innovation but a negative one for durability. This means that brand A is not positioned as a durable brand in the market, something that competitors could take advantage of. 

11. Multidimensional Scaling (MDS)

MDS is a method used to observe the similarities or disparities between objects which can be colors, brands, people, geographical coordinates, and more. The objects are plotted using an “MDS map” that positions similar objects together and disparate ones far apart. The (dis) similarities between objects are represented using one or more dimensions that can be observed using a numerical scale. For example, if you want to know how people feel about the COVID-19 vaccine, you can use 1 for “don’t believe in the vaccine at all”  and 10 for “firmly believe in the vaccine” and a scale of 2 to 9 for in between responses.  When analyzing an MDS map the only thing that matters is the distance between the objects, the orientation of the dimensions is arbitrary and has no meaning at all. 

Multidimensional scaling is a valuable technique for market research, especially when it comes to evaluating product or brand positioning. For instance, if a cupcake brand wants to know how they are positioned compared to competitors, it can define 2-3 dimensions such as taste, ingredients, shopping experience, or more, and do a multidimensional scaling analysis to find improvement opportunities as well as areas in which competitors are currently leading. 

Another business example is in procurement when deciding on different suppliers. Decision makers can generate an MDS map to see how the different prices, delivery times, technical services, and more of the different suppliers differ and pick the one that suits their needs the best. 

A final example proposed by a research paper on "An Improved Study of Multilevel Semantic Network Visualization for Analyzing Sentiment Word of Movie Review Data". Researchers picked a two-dimensional MDS map to display the distances and relationships between different sentiments in movie reviews. They used 36 sentiment words and distributed them based on their emotional distance as we can see in the image below where the words "outraged" and "sweet" are on opposite sides of the map, marking the distance between the two emotions very clearly.

Example of multidimensional scaling analysis

Aside from being a valuable technique to analyze dissimilarities, MDS also serves as a dimension-reduction technique for large dimensional data. 

B. Qualitative Methods

Qualitative data analysis methods are defined as the observation of non-numerical data that is gathered and produced using methods of observation such as interviews, focus groups, questionnaires, and more. As opposed to quantitative methods, qualitative data is more subjective and highly valuable in analyzing customer retention and product development.

12. Text analysis

Text analysis, also known in the industry as text mining, works by taking large sets of textual data and arranging them in a way that makes it easier to manage. By working through this cleansing process in stringent detail, you will be able to extract the data that is truly relevant to your organization and use it to develop actionable insights that will propel you forward.

Modern software accelerate the application of text analytics. Thanks to the combination of machine learning and intelligent algorithms, you can perform advanced analytical processes such as sentiment analysis. This technique allows you to understand the intentions and emotions of a text, for example, if it's positive, negative, or neutral, and then give it a score depending on certain factors and categories that are relevant to your brand. Sentiment analysis is often used to monitor brand and product reputation and to understand how successful your customer experience is. To learn more about the topic check out this insightful article .

By analyzing data from various word-based sources, including product reviews, articles, social media communications, and survey responses, you will gain invaluable insights into your audience, as well as their needs, preferences, and pain points. This will allow you to create campaigns, services, and communications that meet your prospects’ needs on a personal level, growing your audience while boosting customer retention. There are various other “sub-methods” that are an extension of text analysis. Each of them serves a more specific purpose and we will look at them in detail next. 

13. Content Analysis

This is a straightforward and very popular method that examines the presence and frequency of certain words, concepts, and subjects in different content formats such as text, image, audio, or video. For example, the number of times the name of a celebrity is mentioned on social media or online tabloids. It does this by coding text data that is later categorized and tabulated in a way that can provide valuable insights, making it the perfect mix of quantitative and qualitative analysis.

There are two types of content analysis. The first one is the conceptual analysis which focuses on explicit data, for instance, the number of times a concept or word is mentioned in a piece of content. The second one is relational analysis, which focuses on the relationship between different concepts or words and how they are connected within a specific context. 

Content analysis is often used by marketers to measure brand reputation and customer behavior. For example, by analyzing customer reviews. It can also be used to analyze customer interviews and find directions for new product development. It is also important to note, that in order to extract the maximum potential out of this analysis method, it is necessary to have a clearly defined research question. 

14. Thematic Analysis

Very similar to content analysis, thematic analysis also helps in identifying and interpreting patterns in qualitative data with the main difference being that the first one can also be applied to quantitative analysis. The thematic method analyzes large pieces of text data such as focus group transcripts or interviews and groups them into themes or categories that come up frequently within the text. It is a great method when trying to figure out peoples view’s and opinions about a certain topic. For example, if you are a brand that cares about sustainability, you can do a survey of your customers to analyze their views and opinions about sustainability and how they apply it to their lives. You can also analyze customer service calls transcripts to find common issues and improve your service. 

Thematic analysis is a very subjective technique that relies on the researcher’s judgment. Therefore,  to avoid biases, it has 6 steps that include familiarization, coding, generating themes, reviewing themes, defining and naming themes, and writing up. It is also important to note that, because it is a flexible approach, the data can be interpreted in multiple ways and it can be hard to select what data is more important to emphasize. 

15. Narrative Analysis 

A bit more complex in nature than the two previous ones, narrative analysis is used to explore the meaning behind the stories that people tell and most importantly, how they tell them. By looking into the words that people use to describe a situation you can extract valuable conclusions about their perspective on a specific topic. Common sources for narrative data include autobiographies, family stories, opinion pieces, and testimonials, among others. 

From a business perspective, narrative analysis can be useful to analyze customer behaviors and feelings towards a specific product, service, feature, or others. It provides unique and deep insights that can be extremely valuable. However, it has some drawbacks.  

The biggest weakness of this method is that the sample sizes are usually very small due to the complexity and time-consuming nature of the collection of narrative data. Plus, the way a subject tells a story will be significantly influenced by his or her specific experiences, making it very hard to replicate in a subsequent study. 

16. Discourse Analysis

Discourse analysis is used to understand the meaning behind any type of written, verbal, or symbolic discourse based on its political, social, or cultural context. It mixes the analysis of languages and situations together. This means that the way the content is constructed and the meaning behind it is significantly influenced by the culture and society it takes place in. For example, if you are analyzing political speeches you need to consider different context elements such as the politician's background, the current political context of the country, the audience to which the speech is directed, and so on. 

From a business point of view, discourse analysis is a great market research tool. It allows marketers to understand how the norms and ideas of the specific market work and how their customers relate to those ideas. It can be very useful to build a brand mission or develop a unique tone of voice. 

17. Grounded Theory Analysis

Traditionally, researchers decide on a method and hypothesis and start to collect the data to prove that hypothesis. The grounded theory is the only method that doesn’t require an initial research question or hypothesis as its value lies in the generation of new theories. With the grounded theory method, you can go into the analysis process with an open mind and explore the data to generate new theories through tests and revisions. In fact, it is not necessary to collect the data and then start to analyze it. Researchers usually start to find valuable insights as they are gathering the data. 

All of these elements make grounded theory a very valuable method as theories are fully backed by data instead of initial assumptions. It is a great technique to analyze poorly researched topics or find the causes behind specific company outcomes. For example, product managers and marketers might use the grounded theory to find the causes of high levels of customer churn and look into customer surveys and reviews to develop new theories about the causes. 

How To Analyze Data? Top 17 Data Analysis Techniques To Apply

17 top data analysis techniques by datapine

Now that we’ve answered the questions “what is data analysis’”, why is it important, and covered the different data analysis types, it’s time to dig deeper into how to perform your analysis by working through these 17 essential techniques.

1. Collaborate your needs

Before you begin analyzing or drilling down into any techniques, it’s crucial to sit down collaboratively with all key stakeholders within your organization, decide on your primary campaign or strategic goals, and gain a fundamental understanding of the types of insights that will best benefit your progress or provide you with the level of vision you need to evolve your organization.

2. Establish your questions

Once you’ve outlined your core objectives, you should consider which questions will need answering to help you achieve your mission. This is one of the most important techniques as it will shape the very foundations of your success.

To help you ask the right things and ensure your data works for you, you have to ask the right data analysis questions .

3. Data democratization

After giving your data analytics methodology some real direction, and knowing which questions need answering to extract optimum value from the information available to your organization, you should continue with democratization.

Data democratization is an action that aims to connect data from various sources efficiently and quickly so that anyone in your organization can access it at any given moment. You can extract data in text, images, videos, numbers, or any other format. And then perform cross-database analysis to achieve more advanced insights to share with the rest of the company interactively.  

Once you have decided on your most valuable sources, you need to take all of this into a structured format to start collecting your insights. For this purpose, datapine offers an easy all-in-one data connectors feature to integrate all your internal and external sources and manage them at your will. Additionally, datapine’s end-to-end solution automatically updates your data, allowing you to save time and focus on performing the right analysis to grow your company.

data connectors from datapine

4. Think of governance 

When collecting data in a business or research context you always need to think about security and privacy. With data breaches becoming a topic of concern for businesses, the need to protect your client's or subject’s sensitive information becomes critical. 

To ensure that all this is taken care of, you need to think of a data governance strategy. According to Gartner , this concept refers to “ the specification of decision rights and an accountability framework to ensure the appropriate behavior in the valuation, creation, consumption, and control of data and analytics .” In simpler words, data governance is a collection of processes, roles, and policies, that ensure the efficient use of data while still achieving the main company goals. It ensures that clear roles are in place for who can access the information and how they can access it. In time, this not only ensures that sensitive information is protected but also allows for an efficient analysis as a whole. 

5. Clean your data

After harvesting from so many sources you will be left with a vast amount of information that can be overwhelming to deal with. At the same time, you can be faced with incorrect data that can be misleading to your analysis. The smartest thing you can do to avoid dealing with this in the future is to clean the data. This is fundamental before visualizing it, as it will ensure that the insights you extract from it are correct.

There are many things that you need to look for in the cleaning process. The most important one is to eliminate any duplicate observations; this usually appears when using multiple internal and external sources of information. You can also add any missing codes, fix empty fields, and eliminate incorrectly formatted data.

Another usual form of cleaning is done with text data. As we mentioned earlier, most companies today analyze customer reviews, social media comments, questionnaires, and several other text inputs. In order for algorithms to detect patterns, text data needs to be revised to avoid invalid characters or any syntax or spelling errors. 

Most importantly, the aim of cleaning is to prevent you from arriving at false conclusions that can damage your company in the long run. By using clean data, you will also help BI solutions to interact better with your information and create better reports for your organization.

6. Set your KPIs

Once you’ve set your sources, cleaned your data, and established clear-cut questions you want your insights to answer, you need to set a host of key performance indicators (KPIs) that will help you track, measure, and shape your progress in a number of key areas.

KPIs are critical to both qualitative and quantitative analysis research. This is one of the primary methods of data analysis you certainly shouldn’t overlook.

To help you set the best possible KPIs for your initiatives and activities, here is an example of a relevant logistics KPI : transportation-related costs. If you want to see more go explore our collection of key performance indicator examples .

Transportation costs logistics KPIs

7. Omit useless data

Having bestowed your data analysis tools and techniques with true purpose and defined your mission, you should explore the raw data you’ve collected from all sources and use your KPIs as a reference for chopping out any information you deem to be useless.

Trimming the informational fat is one of the most crucial methods of analysis as it will allow you to focus your analytical efforts and squeeze every drop of value from the remaining ‘lean’ information.

Any stats, facts, figures, or metrics that don’t align with your business goals or fit with your KPI management strategies should be eliminated from the equation.

8. Build a data management roadmap

While, at this point, this particular step is optional (you will have already gained a wealth of insight and formed a fairly sound strategy by now), creating a data governance roadmap will help your data analysis methods and techniques become successful on a more sustainable basis. These roadmaps, if developed properly, are also built so they can be tweaked and scaled over time.

Invest ample time in developing a roadmap that will help you store, manage, and handle your data internally, and you will make your analysis techniques all the more fluid and functional – one of the most powerful types of data analysis methods available today.

9. Integrate technology

There are many ways to analyze data, but one of the most vital aspects of analytical success in a business context is integrating the right decision support software and technology.

Robust analysis platforms will not only allow you to pull critical data from your most valuable sources while working with dynamic KPIs that will offer you actionable insights; it will also present them in a digestible, visual, interactive format from one central, live dashboard . A data methodology you can count on.

By integrating the right technology within your data analysis methodology, you’ll avoid fragmenting your insights, saving you time and effort while allowing you to enjoy the maximum value from your business’s most valuable insights.

For a look at the power of software for the purpose of analysis and to enhance your methods of analyzing, glance over our selection of dashboard examples .

10. Answer your questions

By considering each of the above efforts, working with the right technology, and fostering a cohesive internal culture where everyone buys into the different ways to analyze data as well as the power of digital intelligence, you will swiftly start to answer your most burning business questions. Arguably, the best way to make your data concepts accessible across the organization is through data visualization.

11. Visualize your data

Online data visualization is a powerful tool as it lets you tell a story with your metrics, allowing users across the organization to extract meaningful insights that aid business evolution – and it covers all the different ways to analyze data.

The purpose of analyzing is to make your entire organization more informed and intelligent, and with the right platform or dashboard, this is simpler than you think, as demonstrated by our marketing dashboard .

An executive dashboard example showcasing high-level marketing KPIs such as cost per lead, MQL, SQL, and cost per customer.

This visual, dynamic, and interactive online dashboard is a data analysis example designed to give Chief Marketing Officers (CMO) an overview of relevant metrics to help them understand if they achieved their monthly goals.

In detail, this example generated with a modern dashboard creator displays interactive charts for monthly revenues, costs, net income, and net income per customer; all of them are compared with the previous month so that you can understand how the data fluctuated. In addition, it shows a detailed summary of the number of users, customers, SQLs, and MQLs per month to visualize the whole picture and extract relevant insights or trends for your marketing reports .

The CMO dashboard is perfect for c-level management as it can help them monitor the strategic outcome of their marketing efforts and make data-driven decisions that can benefit the company exponentially.

12. Be careful with the interpretation

We already dedicated an entire post to data interpretation as it is a fundamental part of the process of data analysis. It gives meaning to the analytical information and aims to drive a concise conclusion from the analysis results. Since most of the time companies are dealing with data from many different sources, the interpretation stage needs to be done carefully and properly in order to avoid misinterpretations. 

To help you through the process, here we list three common practices that you need to avoid at all costs when looking at your data:

  • Correlation vs. causation: The human brain is formatted to find patterns. This behavior leads to one of the most common mistakes when performing interpretation: confusing correlation with causation. Although these two aspects can exist simultaneously, it is not correct to assume that because two things happened together, one provoked the other. A piece of advice to avoid falling into this mistake is never to trust just intuition, trust the data. If there is no objective evidence of causation, then always stick to correlation. 
  • Confirmation bias: This phenomenon describes the tendency to select and interpret only the data necessary to prove one hypothesis, often ignoring the elements that might disprove it. Even if it's not done on purpose, confirmation bias can represent a real problem, as excluding relevant information can lead to false conclusions and, therefore, bad business decisions. To avoid it, always try to disprove your hypothesis instead of proving it, share your analysis with other team members, and avoid drawing any conclusions before the entire analytical project is finalized.
  • Statistical significance: To put it in short words, statistical significance helps analysts understand if a result is actually accurate or if it happened because of a sampling error or pure chance. The level of statistical significance needed might depend on the sample size and the industry being analyzed. In any case, ignoring the significance of a result when it might influence decision-making can be a huge mistake.

13. Build a narrative

Now, we’re going to look at how you can bring all of these elements together in a way that will benefit your business - starting with a little something called data storytelling.

The human brain responds incredibly well to strong stories or narratives. Once you’ve cleansed, shaped, and visualized your most invaluable data using various BI dashboard tools , you should strive to tell a story - one with a clear-cut beginning, middle, and end.

By doing so, you will make your analytical efforts more accessible, digestible, and universal, empowering more people within your organization to use your discoveries to their actionable advantage.

14. Consider autonomous technology

Autonomous technologies, such as artificial intelligence (AI) and machine learning (ML), play a significant role in the advancement of understanding how to analyze data more effectively.

Gartner predicts that by the end of this year, 80% of emerging technologies will be developed with AI foundations. This is a testament to the ever-growing power and value of autonomous technologies.

At the moment, these technologies are revolutionizing the analysis industry. Some examples that we mentioned earlier are neural networks, intelligent alarms, and sentiment analysis.

15. Share the load

If you work with the right tools and dashboards, you will be able to present your metrics in a digestible, value-driven format, allowing almost everyone in the organization to connect with and use relevant data to their advantage.

Modern dashboards consolidate data from various sources, providing access to a wealth of insights in one centralized location, no matter if you need to monitor recruitment metrics or generate reports that need to be sent across numerous departments. Moreover, these cutting-edge tools offer access to dashboards from a multitude of devices, meaning that everyone within the business can connect with practical insights remotely - and share the load.

Once everyone is able to work with a data-driven mindset, you will catalyze the success of your business in ways you never thought possible. And when it comes to knowing how to analyze data, this kind of collaborative approach is essential.

16. Data analysis tools

In order to perform high-quality analysis of data, it is fundamental to use tools and software that will ensure the best results. Here we leave you a small summary of four fundamental categories of data analysis tools for your organization.

  • Business Intelligence: BI tools allow you to process significant amounts of data from several sources in any format. Through this, you can not only analyze and monitor your data to extract relevant insights but also create interactive reports and dashboards to visualize your KPIs and use them for your company's good. datapine is an amazing online BI software that is focused on delivering powerful online analysis features that are accessible to beginner and advanced users. Like this, it offers a full-service solution that includes cutting-edge analysis of data, KPIs visualization, live dashboards, reporting, and artificial intelligence technologies to predict trends and minimize risk.
  • Statistical analysis: These tools are usually designed for scientists, statisticians, market researchers, and mathematicians, as they allow them to perform complex statistical analyses with methods like regression analysis, predictive analysis, and statistical modeling. A good tool to perform this type of analysis is R-Studio as it offers a powerful data modeling and hypothesis testing feature that can cover both academic and general data analysis. This tool is one of the favorite ones in the industry, due to its capability for data cleaning, data reduction, and performing advanced analysis with several statistical methods. Another relevant tool to mention is SPSS from IBM. The software offers advanced statistical analysis for users of all skill levels. Thanks to a vast library of machine learning algorithms, text analysis, and a hypothesis testing approach it can help your company find relevant insights to drive better decisions. SPSS also works as a cloud service that enables you to run it anywhere.
  • SQL Consoles: SQL is a programming language often used to handle structured data in relational databases. Tools like these are popular among data scientists as they are extremely effective in unlocking these databases' value. Undoubtedly, one of the most used SQL software in the market is MySQL Workbench . This tool offers several features such as a visual tool for database modeling and monitoring, complete SQL optimization, administration tools, and visual performance dashboards to keep track of KPIs.
  • Data Visualization: These tools are used to represent your data through charts, graphs, and maps that allow you to find patterns and trends in the data. datapine's already mentioned BI platform also offers a wealth of powerful online data visualization tools with several benefits. Some of them include: delivering compelling data-driven presentations to share with your entire company, the ability to see your data online with any device wherever you are, an interactive dashboard design feature that enables you to showcase your results in an interactive and understandable way, and to perform online self-service reports that can be used simultaneously with several other people to enhance team productivity.

17. Refine your process constantly 

Last is a step that might seem obvious to some people, but it can be easily ignored if you think you are done. Once you have extracted the needed results, you should always take a retrospective look at your project and think about what you can improve. As you saw throughout this long list of techniques, data analysis is a complex process that requires constant refinement. For this reason, you should always go one step further and keep improving. 

Quality Criteria For Data Analysis

So far we’ve covered a list of methods and techniques that should help you perform efficient data analysis. But how do you measure the quality and validity of your results? This is done with the help of some science quality criteria. Here we will go into a more theoretical area that is critical to understanding the fundamentals of statistical analysis in science. However, you should also be aware of these steps in a business context, as they will allow you to assess the quality of your results in the correct way. Let’s dig in. 

  • Internal validity: The results of a survey are internally valid if they measure what they are supposed to measure and thus provide credible results. In other words , internal validity measures the trustworthiness of the results and how they can be affected by factors such as the research design, operational definitions, how the variables are measured, and more. For instance, imagine you are doing an interview to ask people if they brush their teeth two times a day. While most of them will answer yes, you can still notice that their answers correspond to what is socially acceptable, which is to brush your teeth at least twice a day. In this case, you can’t be 100% sure if respondents actually brush their teeth twice a day or if they just say that they do, therefore, the internal validity of this interview is very low. 
  • External validity: Essentially, external validity refers to the extent to which the results of your research can be applied to a broader context. It basically aims to prove that the findings of a study can be applied in the real world. If the research can be applied to other settings, individuals, and times, then the external validity is high. 
  • Reliability : If your research is reliable, it means that it can be reproduced. If your measurement were repeated under the same conditions, it would produce similar results. This means that your measuring instrument consistently produces reliable results. For example, imagine a doctor building a symptoms questionnaire to detect a specific disease in a patient. Then, various other doctors use this questionnaire but end up diagnosing the same patient with a different condition. This means the questionnaire is not reliable in detecting the initial disease. Another important note here is that in order for your research to be reliable, it also needs to be objective. If the results of a study are the same, independent of who assesses them or interprets them, the study can be considered reliable. Let’s see the objectivity criteria in more detail now. 
  • Objectivity: In data science, objectivity means that the researcher needs to stay fully objective when it comes to its analysis. The results of a study need to be affected by objective criteria and not by the beliefs, personality, or values of the researcher. Objectivity needs to be ensured when you are gathering the data, for example, when interviewing individuals, the questions need to be asked in a way that doesn't influence the results. Paired with this, objectivity also needs to be thought of when interpreting the data. If different researchers reach the same conclusions, then the study is objective. For this last point, you can set predefined criteria to interpret the results to ensure all researchers follow the same steps. 

The discussed quality criteria cover mostly potential influences in a quantitative context. Analysis in qualitative research has by default additional subjective influences that must be controlled in a different way. Therefore, there are other quality criteria for this kind of research such as credibility, transferability, dependability, and confirmability. You can see each of them more in detail on this resource . 

Data Analysis Limitations & Barriers

Analyzing data is not an easy task. As you’ve seen throughout this post, there are many steps and techniques that you need to apply in order to extract useful information from your research. While a well-performed analysis can bring various benefits to your organization it doesn't come without limitations. In this section, we will discuss some of the main barriers you might encounter when conducting an analysis. Let’s see them more in detail. 

  • Lack of clear goals: No matter how good your data or analysis might be if you don’t have clear goals or a hypothesis the process might be worthless. While we mentioned some methods that don’t require a predefined hypothesis, it is always better to enter the analytical process with some clear guidelines of what you are expecting to get out of it, especially in a business context in which data is utilized to support important strategic decisions. 
  • Objectivity: Arguably one of the biggest barriers when it comes to data analysis in research is to stay objective. When trying to prove a hypothesis, researchers might find themselves, intentionally or unintentionally, directing the results toward an outcome that they want. To avoid this, always question your assumptions and avoid confusing facts with opinions. You can also show your findings to a research partner or external person to confirm that your results are objective. 
  • Data representation: A fundamental part of the analytical procedure is the way you represent your data. You can use various graphs and charts to represent your findings, but not all of them will work for all purposes. Choosing the wrong visual can not only damage your analysis but can mislead your audience, therefore, it is important to understand when to use each type of data depending on your analytical goals. Our complete guide on the types of graphs and charts lists 20 different visuals with examples of when to use them. 
  • Flawed correlation : Misleading statistics can significantly damage your research. We’ve already pointed out a few interpretation issues previously in the post, but it is an important barrier that we can't avoid addressing here as well. Flawed correlations occur when two variables appear related to each other but they are not. Confusing correlations with causation can lead to a wrong interpretation of results which can lead to building wrong strategies and loss of resources, therefore, it is very important to identify the different interpretation mistakes and avoid them. 
  • Sample size: A very common barrier to a reliable and efficient analysis process is the sample size. In order for the results to be trustworthy, the sample size should be representative of what you are analyzing. For example, imagine you have a company of 1000 employees and you ask the question “do you like working here?” to 50 employees of which 49 say yes, which means 95%. Now, imagine you ask the same question to the 1000 employees and 950 say yes, which also means 95%. Saying that 95% of employees like working in the company when the sample size was only 50 is not a representative or trustworthy conclusion. The significance of the results is way more accurate when surveying a bigger sample size.   
  • Privacy concerns: In some cases, data collection can be subjected to privacy regulations. Businesses gather all kinds of information from their customers from purchasing behaviors to addresses and phone numbers. If this falls into the wrong hands due to a breach, it can affect the security and confidentiality of your clients. To avoid this issue, you need to collect only the data that is needed for your research and, if you are using sensitive facts, make it anonymous so customers are protected. The misuse of customer data can severely damage a business's reputation, so it is important to keep an eye on privacy. 
  • Lack of communication between teams : When it comes to performing data analysis on a business level, it is very likely that each department and team will have different goals and strategies. However, they are all working for the same common goal of helping the business run smoothly and keep growing. When teams are not connected and communicating with each other, it can directly affect the way general strategies are built. To avoid these issues, tools such as data dashboards enable teams to stay connected through data in a visually appealing way. 
  • Innumeracy : Businesses are working with data more and more every day. While there are many BI tools available to perform effective analysis, data literacy is still a constant barrier. Not all employees know how to apply analysis techniques or extract insights from them. To prevent this from happening, you can implement different training opportunities that will prepare every relevant user to deal with data. 

Key Data Analysis Skills

As you've learned throughout this lengthy guide, analyzing data is a complex task that requires a lot of knowledge and skills. That said, thanks to the rise of self-service tools the process is way more accessible and agile than it once was. Regardless, there are still some key skills that are valuable to have when working with data, we list the most important ones below.

  • Critical and statistical thinking: To successfully analyze data you need to be creative and think out of the box. Yes, that might sound like a weird statement considering that data is often tight to facts. However, a great level of critical thinking is required to uncover connections, come up with a valuable hypothesis, and extract conclusions that go a step further from the surface. This, of course, needs to be complemented by statistical thinking and an understanding of numbers. 
  • Data cleaning: Anyone who has ever worked with data before will tell you that the cleaning and preparation process accounts for 80% of a data analyst's work, therefore, the skill is fundamental. But not just that, not cleaning the data adequately can also significantly damage the analysis which can lead to poor decision-making in a business scenario. While there are multiple tools that automate the cleaning process and eliminate the possibility of human error, it is still a valuable skill to dominate. 
  • Data visualization: Visuals make the information easier to understand and analyze, not only for professional users but especially for non-technical ones. Having the necessary skills to not only choose the right chart type but know when to apply it correctly is key. This also means being able to design visually compelling charts that make the data exploration process more efficient. 
  • SQL: The Structured Query Language or SQL is a programming language used to communicate with databases. It is fundamental knowledge as it enables you to update, manipulate, and organize data from relational databases which are the most common databases used by companies. It is fairly easy to learn and one of the most valuable skills when it comes to data analysis. 
  • Communication skills: This is a skill that is especially valuable in a business environment. Being able to clearly communicate analytical outcomes to colleagues is incredibly important, especially when the information you are trying to convey is complex for non-technical people. This applies to in-person communication as well as written format, for example, when generating a dashboard or report. While this might be considered a “soft” skill compared to the other ones we mentioned, it should not be ignored as you most likely will need to share analytical findings with others no matter the context. 

Data Analysis In The Big Data Environment

Big data is invaluable to today’s businesses, and by using different methods for data analysis, it’s possible to view your data in a way that can help you turn insight into positive action.

To inspire your efforts and put the importance of big data into context, here are some insights that you should know:

  • By 2026 the industry of big data is expected to be worth approximately $273.4 billion.
  • 94% of enterprises say that analyzing data is important for their growth and digital transformation. 
  • Companies that exploit the full potential of their data can increase their operating margins by 60% .
  • We already told you the benefits of Artificial Intelligence through this article. This industry's financial impact is expected to grow up to $40 billion by 2025.

Data analysis concepts may come in many forms, but fundamentally, any solid methodology will help to make your business more streamlined, cohesive, insightful, and successful than ever before.

Key Takeaways From Data Analysis 

As we reach the end of our data analysis journey, we leave a small summary of the main methods and techniques to perform excellent analysis and grow your business.

17 Essential Types of Data Analysis Methods:

  • Cluster analysis
  • Cohort analysis
  • Regression analysis
  • Factor analysis
  • Neural Networks
  • Data Mining
  • Text analysis
  • Time series analysis
  • Decision trees
  • Conjoint analysis 
  • Correspondence Analysis
  • Multidimensional Scaling 
  • Content analysis 
  • Thematic analysis
  • Narrative analysis 
  • Grounded theory analysis
  • Discourse analysis 

Top 17 Data Analysis Techniques:

  • Collaborate your needs
  • Establish your questions
  • Data democratization
  • Think of data governance 
  • Clean your data
  • Set your KPIs
  • Omit useless data
  • Build a data management roadmap
  • Integrate technology
  • Answer your questions
  • Visualize your data
  • Interpretation of data
  • Consider autonomous technology
  • Build a narrative
  • Share the load
  • Data Analysis tools
  • Refine your process constantly 

We’ve pondered the data analysis definition and drilled down into the practical applications of data-centric analytics, and one thing is clear: by taking measures to arrange your data and making your metrics work for you, it’s possible to transform raw information into action - the kind of that will push your business to the next level.

Yes, good data analytics techniques result in enhanced business intelligence (BI). To help you understand this notion in more detail, read our exploration of business intelligence reporting .

And, if you’re ready to perform your own analysis, drill down into your facts and figures while interacting with your data on astonishing visuals, you can try our software for a free, 14-day trial .

Historical Research Method: Home

What is historical research.

Historical research  or historiography , "attempts to systematically recapture the complex nuances, the people,meanings,events,and even ideas of the past that have influenced and shaped the present". (Berg & Lure, 2012, p. 305 )

Historical research relies on a wide variety of sources, both primary & secondary including unpublished material. 

Primary Sources

  • Eyewitness accounts of events
  • Can be oral or written testimony
  • Found in public records & legal documents, minutes of meetings, corporate records, recordings, letters, diaries, journals, drawings.
  • Located in university archives, libraries or privately run collections such as local historical society.

Secondary Sources

  • Can be oral or written
  • Secondhand accounts of events
  • Found in textbooks, encyclopedias, journal articles, newspapers, biographies and other media such as films or tape recordings.

Steps in Historical Research

Historical research involves the following steps:

  • Identify an idea, topic or research question
  • Conduct a background literature review
  • Refine the research idea and questions
  • Determine that historical methods will be the method used
  • Identify and locate primary and secondary data sources
  • Evaluate the authenticity and accuracy of source materials
  • Analyze the date and develop a narrative exposition of the findings.

(Berg & Lune, 2012, p.311)

Locating Information: Libraries

In addition to raw data and unpublished manuscripts, libraries also hold back copies of journals and newspapers.

  • Western Australia
  • ECU Library
  • Curtin University
  • Murdoch University
  • Notre Dame University
  • State Library of W.A.
  • Trove Books, images, historic newspapers, maps, music, archives and more
  • WorldCat Can limit to archival and downloadable

Locating information - Archives

  • National Archives of Australia
  • UK Government Web Archive
  • National Archives (U.S.)
  • Nursing History: Historical Methodology Produced by the AAHN

Key Sources

  • Pandora PANDORA, Australia's Web Archive was established by the National Library in 1996 and is a collection of historic online publications relating to Australia and Australians. Online publications and web sites are selected for inclusion in the collection with the purpose of providing long-term and persistent access to them.
  • Directory of Archives in Australia
  • RSL Living History The Listening Post is the official organ of the RSL in Western Australia and was first published in December 1921. The first two decades of the Listening Post, are now available online for viewing with more scheduled releases throughout the year.
  • Internet Archive Digital library of Internet sites and other cultural artifacts in digital form. Free access to researchers, historians, scholars, and the general public.
  • Repositories of primary sources
  • The national union catalog of manuscript collections (United States)
  • National Technical Information Service (U.S.) Provides access to a large collection of historical and current government technical reports that exists in many academic, public, government, and corporate libraries.
  • A history of nursing Four vols available online
  • British Journal of Nursing The journal contains a wide range of information about hospitals, wards, staff, patients, illness and diseases, medicine and treatments, hospital equipment and events.
  • Directory of history of medicine colections U.S. National Library of medicine. National Institutes of Health
  • The Australian Nursing and Midwifery History Project
  • Nursing Studies Index An annotated guide to reported studies, research methods, and historical and biographical materials in periodicals, books, and pamphlets published in English. Prepared by Yale University.
  • Early experieneces in Australasia: Primary sources and personal nattatives, 1788-1901 Provides a unique and personal view of events in the region from the arrival of the first settlers through to Australian Federation at the close of the nineteenth century. Includes first-person accounts, including letters and diaries, narratives, and other primary source materials.

Locating Information: Museums

  • Alfred Archives Alfred Hospital, Melbourne
  • Florence Nightingale Museum (U.K.) London
  • London Museums of Health & Medicine
  • National Museum Australia. Research Centre National Museum Australia
  • Nursing Museum Royal Brisbane & Women's Hospital
  • Virtual Museum South Australian Medical Heritage Society
  • W.A. Medical Museum Harvey House, King Edward Memorial Hospital

Locating Information: Historical Societies

  • Directory of Australian Historical Societies at Society hill Extensive list of historical societies throughout Australia
  • Royal Australian Historical Society Sydney
  • Royal Western Australian Historical Society

methods of data analysis in historical research

  • Next: Research Methodology Books >>
  • Research Methodology Books
  • Research This link opens in a new window
  • Last Updated: Feb 7, 2024 10:52 AM
  • URL: https://ecu.au.libguides.com/historical-research-method

Edith Cowan University acknowledges and respects the Noongar people, who are the traditional custodians of the land upon which its campuses stand and its programs operate. In particular ECU pays its respects to the Elders, past and present, of the Noongar people, and embrace their culture, wisdom and knowledge.

  • Search Menu
  • Browse content in Arts and Humanities
  • Browse content in Archaeology
  • Anglo-Saxon and Medieval Archaeology
  • Archaeological Methodology and Techniques
  • Archaeology by Region
  • Archaeology of Religion
  • Archaeology of Trade and Exchange
  • Biblical Archaeology
  • Contemporary and Public Archaeology
  • Environmental Archaeology
  • Historical Archaeology
  • History and Theory of Archaeology
  • Industrial Archaeology
  • Landscape Archaeology
  • Mortuary Archaeology
  • Prehistoric Archaeology
  • Underwater Archaeology
  • Urban Archaeology
  • Zooarchaeology
  • Browse content in Architecture
  • Architectural Structure and Design
  • History of Architecture
  • Residential and Domestic Buildings
  • Theory of Architecture
  • Browse content in Art
  • Art Subjects and Themes
  • History of Art
  • Industrial and Commercial Art
  • Theory of Art
  • Biographical Studies
  • Byzantine Studies
  • Browse content in Classical Studies
  • Classical History
  • Classical Philosophy
  • Classical Mythology
  • Classical Literature
  • Classical Reception
  • Classical Art and Architecture
  • Classical Oratory and Rhetoric
  • Greek and Roman Epigraphy
  • Greek and Roman Law
  • Greek and Roman Papyrology
  • Greek and Roman Archaeology
  • Late Antiquity
  • Religion in the Ancient World
  • Digital Humanities
  • Browse content in History
  • Colonialism and Imperialism
  • Diplomatic History
  • Environmental History
  • Genealogy, Heraldry, Names, and Honours
  • Genocide and Ethnic Cleansing
  • Historical Geography
  • History by Period
  • History of Emotions
  • History of Agriculture
  • History of Education
  • History of Gender and Sexuality
  • Industrial History
  • Intellectual History
  • International History
  • Labour History
  • Legal and Constitutional History
  • Local and Family History
  • Maritime History
  • Military History
  • National Liberation and Post-Colonialism
  • Oral History
  • Political History
  • Public History
  • Regional and National History
  • Revolutions and Rebellions
  • Slavery and Abolition of Slavery
  • Social and Cultural History
  • Theory, Methods, and Historiography
  • Urban History
  • World History
  • Browse content in Language Teaching and Learning
  • Language Learning (Specific Skills)
  • Language Teaching Theory and Methods
  • Browse content in Linguistics
  • Applied Linguistics
  • Cognitive Linguistics
  • Computational Linguistics
  • Forensic Linguistics
  • Grammar, Syntax and Morphology
  • Historical and Diachronic Linguistics
  • History of English
  • Language Acquisition
  • Language Evolution
  • Language Reference
  • Language Variation
  • Language Families
  • Lexicography
  • Linguistic Anthropology
  • Linguistic Theories
  • Linguistic Typology
  • Phonetics and Phonology
  • Psycholinguistics
  • Sociolinguistics
  • Translation and Interpretation
  • Writing Systems
  • Browse content in Literature
  • Bibliography
  • Children's Literature Studies
  • Literary Studies (Asian)
  • Literary Studies (European)
  • Literary Studies (Eco-criticism)
  • Literary Studies (Romanticism)
  • Literary Studies (American)
  • Literary Studies (Modernism)
  • Literary Studies - World
  • Literary Studies (1500 to 1800)
  • Literary Studies (19th Century)
  • Literary Studies (20th Century onwards)
  • Literary Studies (African American Literature)
  • Literary Studies (British and Irish)
  • Literary Studies (Early and Medieval)
  • Literary Studies (Fiction, Novelists, and Prose Writers)
  • Literary Studies (Gender Studies)
  • Literary Studies (Graphic Novels)
  • Literary Studies (History of the Book)
  • Literary Studies (Plays and Playwrights)
  • Literary Studies (Poetry and Poets)
  • Literary Studies (Postcolonial Literature)
  • Literary Studies (Queer Studies)
  • Literary Studies (Science Fiction)
  • Literary Studies (Travel Literature)
  • Literary Studies (War Literature)
  • Literary Studies (Women's Writing)
  • Literary Theory and Cultural Studies
  • Mythology and Folklore
  • Shakespeare Studies and Criticism
  • Browse content in Media Studies
  • Browse content in Music
  • Applied Music
  • Dance and Music
  • Ethics in Music
  • Ethnomusicology
  • Gender and Sexuality in Music
  • Medicine and Music
  • Music Cultures
  • Music and Religion
  • Music and Media
  • Music and Culture
  • Music Education and Pedagogy
  • Music Theory and Analysis
  • Musical Scores, Lyrics, and Libretti
  • Musical Structures, Styles, and Techniques
  • Musicology and Music History
  • Performance Practice and Studies
  • Race and Ethnicity in Music
  • Sound Studies
  • Browse content in Performing Arts
  • Browse content in Philosophy
  • Aesthetics and Philosophy of Art
  • Epistemology
  • Feminist Philosophy
  • History of Western Philosophy
  • Metaphysics
  • Moral Philosophy
  • Non-Western Philosophy
  • Philosophy of Science
  • Philosophy of Language
  • Philosophy of Mind
  • Philosophy of Perception
  • Philosophy of Action
  • Philosophy of Law
  • Philosophy of Religion
  • Philosophy of Mathematics and Logic
  • Practical Ethics
  • Social and Political Philosophy
  • Browse content in Religion
  • Biblical Studies
  • Christianity
  • East Asian Religions
  • History of Religion
  • Judaism and Jewish Studies
  • Qumran Studies
  • Religion and Education
  • Religion and Health
  • Religion and Politics
  • Religion and Science
  • Religion and Law
  • Religion and Art, Literature, and Music
  • Religious Studies
  • Browse content in Society and Culture
  • Cookery, Food, and Drink
  • Cultural Studies
  • Customs and Traditions
  • Ethical Issues and Debates
  • Hobbies, Games, Arts and Crafts
  • Lifestyle, Home, and Garden
  • Natural world, Country Life, and Pets
  • Popular Beliefs and Controversial Knowledge
  • Sports and Outdoor Recreation
  • Technology and Society
  • Travel and Holiday
  • Visual Culture
  • Browse content in Law
  • Arbitration
  • Browse content in Company and Commercial Law
  • Commercial Law
  • Company Law
  • Browse content in Comparative Law
  • Systems of Law
  • Competition Law
  • Browse content in Constitutional and Administrative Law
  • Government Powers
  • Judicial Review
  • Local Government Law
  • Military and Defence Law
  • Parliamentary and Legislative Practice
  • Construction Law
  • Contract Law
  • Browse content in Criminal Law
  • Criminal Procedure
  • Criminal Evidence Law
  • Sentencing and Punishment
  • Employment and Labour Law
  • Environment and Energy Law
  • Browse content in Financial Law
  • Banking Law
  • Insolvency Law
  • History of Law
  • Human Rights and Immigration
  • Intellectual Property Law
  • Browse content in International Law
  • Private International Law and Conflict of Laws
  • Public International Law
  • IT and Communications Law
  • Jurisprudence and Philosophy of Law
  • Law and Politics
  • Law and Society
  • Browse content in Legal System and Practice
  • Courts and Procedure
  • Legal Skills and Practice
  • Primary Sources of Law
  • Regulation of Legal Profession
  • Medical and Healthcare Law
  • Browse content in Policing
  • Criminal Investigation and Detection
  • Police and Security Services
  • Police Procedure and Law
  • Police Regional Planning
  • Browse content in Property Law
  • Personal Property Law
  • Study and Revision
  • Terrorism and National Security Law
  • Browse content in Trusts Law
  • Wills and Probate or Succession
  • Browse content in Medicine and Health
  • Browse content in Allied Health Professions
  • Arts Therapies
  • Clinical Science
  • Dietetics and Nutrition
  • Occupational Therapy
  • Operating Department Practice
  • Physiotherapy
  • Radiography
  • Speech and Language Therapy
  • Browse content in Anaesthetics
  • General Anaesthesia
  • Neuroanaesthesia
  • Browse content in Clinical Medicine
  • Acute Medicine
  • Cardiovascular Medicine
  • Clinical Genetics
  • Clinical Pharmacology and Therapeutics
  • Dermatology
  • Endocrinology and Diabetes
  • Gastroenterology
  • Genito-urinary Medicine
  • Geriatric Medicine
  • Infectious Diseases
  • Medical Toxicology
  • Medical Oncology
  • Pain Medicine
  • Palliative Medicine
  • Rehabilitation Medicine
  • Respiratory Medicine and Pulmonology
  • Rheumatology
  • Sleep Medicine
  • Sports and Exercise Medicine
  • Clinical Neuroscience
  • Community Medical Services
  • Critical Care
  • Emergency Medicine
  • Forensic Medicine
  • Haematology
  • History of Medicine
  • Browse content in Medical Dentistry
  • Oral and Maxillofacial Surgery
  • Paediatric Dentistry
  • Restorative Dentistry and Orthodontics
  • Surgical Dentistry
  • Browse content in Medical Skills
  • Clinical Skills
  • Communication Skills
  • Nursing Skills
  • Surgical Skills
  • Medical Ethics
  • Medical Statistics and Methodology
  • Browse content in Neurology
  • Clinical Neurophysiology
  • Neuropathology
  • Nursing Studies
  • Browse content in Obstetrics and Gynaecology
  • Gynaecology
  • Occupational Medicine
  • Ophthalmology
  • Otolaryngology (ENT)
  • Browse content in Paediatrics
  • Neonatology
  • Browse content in Pathology
  • Chemical Pathology
  • Clinical Cytogenetics and Molecular Genetics
  • Histopathology
  • Medical Microbiology and Virology
  • Patient Education and Information
  • Browse content in Pharmacology
  • Psychopharmacology
  • Browse content in Popular Health
  • Caring for Others
  • Complementary and Alternative Medicine
  • Self-help and Personal Development
  • Browse content in Preclinical Medicine
  • Cell Biology
  • Molecular Biology and Genetics
  • Reproduction, Growth and Development
  • Primary Care
  • Professional Development in Medicine
  • Browse content in Psychiatry
  • Addiction Medicine
  • Child and Adolescent Psychiatry
  • Forensic Psychiatry
  • Learning Disabilities
  • Old Age Psychiatry
  • Psychotherapy
  • Browse content in Public Health and Epidemiology
  • Epidemiology
  • Public Health
  • Browse content in Radiology
  • Clinical Radiology
  • Interventional Radiology
  • Nuclear Medicine
  • Radiation Oncology
  • Reproductive Medicine
  • Browse content in Surgery
  • Cardiothoracic Surgery
  • Gastro-intestinal and Colorectal Surgery
  • General Surgery
  • Neurosurgery
  • Paediatric Surgery
  • Peri-operative Care
  • Plastic and Reconstructive Surgery
  • Surgical Oncology
  • Transplant Surgery
  • Trauma and Orthopaedic Surgery
  • Vascular Surgery
  • Browse content in Science and Mathematics
  • Browse content in Biological Sciences
  • Aquatic Biology
  • Biochemistry
  • Bioinformatics and Computational Biology
  • Developmental Biology
  • Ecology and Conservation
  • Evolutionary Biology
  • Genetics and Genomics
  • Microbiology
  • Molecular and Cell Biology
  • Natural History
  • Plant Sciences and Forestry
  • Research Methods in Life Sciences
  • Structural Biology
  • Systems Biology
  • Zoology and Animal Sciences
  • Browse content in Chemistry
  • Analytical Chemistry
  • Computational Chemistry
  • Crystallography
  • Environmental Chemistry
  • Industrial Chemistry
  • Inorganic Chemistry
  • Materials Chemistry
  • Medicinal Chemistry
  • Mineralogy and Gems
  • Organic Chemistry
  • Physical Chemistry
  • Polymer Chemistry
  • Study and Communication Skills in Chemistry
  • Theoretical Chemistry
  • Browse content in Computer Science
  • Artificial Intelligence
  • Computer Architecture and Logic Design
  • Game Studies
  • Human-Computer Interaction
  • Mathematical Theory of Computation
  • Programming Languages
  • Software Engineering
  • Systems Analysis and Design
  • Virtual Reality
  • Browse content in Computing
  • Business Applications
  • Computer Security
  • Computer Games
  • Computer Networking and Communications
  • Digital Lifestyle
  • Graphical and Digital Media Applications
  • Operating Systems
  • Browse content in Earth Sciences and Geography
  • Atmospheric Sciences
  • Environmental Geography
  • Geology and the Lithosphere
  • Maps and Map-making
  • Meteorology and Climatology
  • Oceanography and Hydrology
  • Palaeontology
  • Physical Geography and Topography
  • Regional Geography
  • Soil Science
  • Urban Geography
  • Browse content in Engineering and Technology
  • Agriculture and Farming
  • Biological Engineering
  • Civil Engineering, Surveying, and Building
  • Electronics and Communications Engineering
  • Energy Technology
  • Engineering (General)
  • Environmental Science, Engineering, and Technology
  • History of Engineering and Technology
  • Mechanical Engineering and Materials
  • Technology of Industrial Chemistry
  • Transport Technology and Trades
  • Browse content in Environmental Science
  • Applied Ecology (Environmental Science)
  • Conservation of the Environment (Environmental Science)
  • Environmental Sustainability
  • Environmentalist Thought and Ideology (Environmental Science)
  • Management of Land and Natural Resources (Environmental Science)
  • Natural Disasters (Environmental Science)
  • Nuclear Issues (Environmental Science)
  • Pollution and Threats to the Environment (Environmental Science)
  • Social Impact of Environmental Issues (Environmental Science)
  • History of Science and Technology
  • Browse content in Materials Science
  • Ceramics and Glasses
  • Composite Materials
  • Metals, Alloying, and Corrosion
  • Nanotechnology
  • Browse content in Mathematics
  • Applied Mathematics
  • Biomathematics and Statistics
  • History of Mathematics
  • Mathematical Education
  • Mathematical Finance
  • Mathematical Analysis
  • Numerical and Computational Mathematics
  • Probability and Statistics
  • Pure Mathematics
  • Browse content in Neuroscience
  • Cognition and Behavioural Neuroscience
  • Development of the Nervous System
  • Disorders of the Nervous System
  • History of Neuroscience
  • Invertebrate Neurobiology
  • Molecular and Cellular Systems
  • Neuroendocrinology and Autonomic Nervous System
  • Neuroscientific Techniques
  • Sensory and Motor Systems
  • Browse content in Physics
  • Astronomy and Astrophysics
  • Atomic, Molecular, and Optical Physics
  • Biological and Medical Physics
  • Classical Mechanics
  • Computational Physics
  • Condensed Matter Physics
  • Electromagnetism, Optics, and Acoustics
  • History of Physics
  • Mathematical and Statistical Physics
  • Measurement Science
  • Nuclear Physics
  • Particles and Fields
  • Plasma Physics
  • Quantum Physics
  • Relativity and Gravitation
  • Semiconductor and Mesoscopic Physics
  • Browse content in Psychology
  • Affective Sciences
  • Clinical Psychology
  • Cognitive Psychology
  • Cognitive Neuroscience
  • Criminal and Forensic Psychology
  • Developmental Psychology
  • Educational Psychology
  • Evolutionary Psychology
  • Health Psychology
  • History and Systems in Psychology
  • Music Psychology
  • Neuropsychology
  • Organizational Psychology
  • Psychological Assessment and Testing
  • Psychology of Human-Technology Interaction
  • Psychology Professional Development and Training
  • Research Methods in Psychology
  • Social Psychology
  • Browse content in Social Sciences
  • Browse content in Anthropology
  • Anthropology of Religion
  • Human Evolution
  • Medical Anthropology
  • Physical Anthropology
  • Regional Anthropology
  • Social and Cultural Anthropology
  • Theory and Practice of Anthropology
  • Browse content in Business and Management
  • Business Strategy
  • Business Ethics
  • Business History
  • Business and Government
  • Business and Technology
  • Business and the Environment
  • Comparative Management
  • Corporate Governance
  • Corporate Social Responsibility
  • Entrepreneurship
  • Health Management
  • Human Resource Management
  • Industrial and Employment Relations
  • Industry Studies
  • Information and Communication Technologies
  • International Business
  • Knowledge Management
  • Management and Management Techniques
  • Operations Management
  • Organizational Theory and Behaviour
  • Pensions and Pension Management
  • Public and Nonprofit Management
  • Strategic Management
  • Supply Chain Management
  • Browse content in Criminology and Criminal Justice
  • Criminal Justice
  • Criminology
  • Forms of Crime
  • International and Comparative Criminology
  • Youth Violence and Juvenile Justice
  • Development Studies
  • Browse content in Economics
  • Agricultural, Environmental, and Natural Resource Economics
  • Asian Economics
  • Behavioural Finance
  • Behavioural Economics and Neuroeconomics
  • Econometrics and Mathematical Economics
  • Economic Systems
  • Economic History
  • Economic Methodology
  • Economic Development and Growth
  • Financial Markets
  • Financial Institutions and Services
  • General Economics and Teaching
  • Health, Education, and Welfare
  • History of Economic Thought
  • International Economics
  • Labour and Demographic Economics
  • Law and Economics
  • Macroeconomics and Monetary Economics
  • Microeconomics
  • Public Economics
  • Urban, Rural, and Regional Economics
  • Welfare Economics
  • Browse content in Education
  • Adult Education and Continuous Learning
  • Care and Counselling of Students
  • Early Childhood and Elementary Education
  • Educational Equipment and Technology
  • Educational Strategies and Policy
  • Higher and Further Education
  • Organization and Management of Education
  • Philosophy and Theory of Education
  • Schools Studies
  • Secondary Education
  • Teaching of a Specific Subject
  • Teaching of Specific Groups and Special Educational Needs
  • Teaching Skills and Techniques
  • Browse content in Environment
  • Applied Ecology (Social Science)
  • Climate Change
  • Conservation of the Environment (Social Science)
  • Environmentalist Thought and Ideology (Social Science)
  • Natural Disasters (Environment)
  • Social Impact of Environmental Issues (Social Science)
  • Browse content in Human Geography
  • Cultural Geography
  • Economic Geography
  • Political Geography
  • Browse content in Interdisciplinary Studies
  • Communication Studies
  • Museums, Libraries, and Information Sciences
  • Browse content in Politics
  • African Politics
  • Asian Politics
  • Chinese Politics
  • Comparative Politics
  • Conflict Politics
  • Elections and Electoral Studies
  • Environmental Politics
  • European Union
  • Foreign Policy
  • Gender and Politics
  • Human Rights and Politics
  • Indian Politics
  • International Relations
  • International Organization (Politics)
  • International Political Economy
  • Irish Politics
  • Latin American Politics
  • Middle Eastern Politics
  • Political Methodology
  • Political Communication
  • Political Philosophy
  • Political Sociology
  • Political Behaviour
  • Political Economy
  • Political Institutions
  • Political Theory
  • Politics and Law
  • Public Administration
  • Public Policy
  • Quantitative Political Methodology
  • Regional Political Studies
  • Russian Politics
  • Security Studies
  • State and Local Government
  • UK Politics
  • US Politics
  • Browse content in Regional and Area Studies
  • African Studies
  • Asian Studies
  • East Asian Studies
  • Japanese Studies
  • Latin American Studies
  • Middle Eastern Studies
  • Native American Studies
  • Scottish Studies
  • Browse content in Research and Information
  • Research Methods
  • Browse content in Social Work
  • Addictions and Substance Misuse
  • Adoption and Fostering
  • Care of the Elderly
  • Child and Adolescent Social Work
  • Couple and Family Social Work
  • Developmental and Physical Disabilities Social Work
  • Direct Practice and Clinical Social Work
  • Emergency Services
  • Human Behaviour and the Social Environment
  • International and Global Issues in Social Work
  • Mental and Behavioural Health
  • Social Justice and Human Rights
  • Social Policy and Advocacy
  • Social Work and Crime and Justice
  • Social Work Macro Practice
  • Social Work Practice Settings
  • Social Work Research and Evidence-based Practice
  • Welfare and Benefit Systems
  • Browse content in Sociology
  • Childhood Studies
  • Community Development
  • Comparative and Historical Sociology
  • Economic Sociology
  • Gender and Sexuality
  • Gerontology and Ageing
  • Health, Illness, and Medicine
  • Marriage and the Family
  • Migration Studies
  • Occupations, Professions, and Work
  • Organizations
  • Population and Demography
  • Race and Ethnicity
  • Social Theory
  • Social Movements and Social Change
  • Social Research and Statistics
  • Social Stratification, Inequality, and Mobility
  • Sociology of Religion
  • Sociology of Education
  • Sport and Leisure
  • Urban and Rural Studies
  • Browse content in Warfare and Defence
  • Defence Strategy, Planning, and Research
  • Land Forces and Warfare
  • Military Administration
  • Military Life and Institutions
  • Naval Forces and Warfare
  • Other Warfare and Defence Issues
  • Peace Studies and Conflict Resolution
  • Weapons and Equipment

Historical Research

  • < Previous chapter
  • Next chapter >

4 Historical Data Sources

  • Published: September 2008
  • Cite Icon Cite
  • Permissions Icon Permissions

This chapter compares and contrasts various historical research sources and their appropriate uses. Although effective research requires significant use of primary sources, secondary sources will help contextualize, explicate, and defend the study’s hypothesis. The sources for historical research are many and varied, and their geographic spread varies with the scope of the study. The study’s hypothesis guides researchers as they decide on their prospective sources including primary, secondary, interview subjects, and collections of realia. The best project includes as wide a sampling of sources as possible — not just the oldest or best-known, but anyone and anything that helps formulate a reasoned response to the historical problem.

Signed in as

Institutional accounts.

  • Google Scholar Indexing
  • GoogleCrawler [DO NOT DELETE]

Personal account

  • Sign in with email/username & password
  • Get email alerts
  • Save searches
  • Purchase content
  • Activate your purchase/trial code

Institutional access

  • Sign in with a library card Sign in with username/password Recommend to your librarian
  • Institutional account management
  • Get help with access

Access to content on Oxford Academic is often provided through institutional subscriptions and purchases. If you are a member of an institution with an active account, you may be able to access content in one of the following ways:

IP based access

Typically, access is provided across an institutional network to a range of IP addresses. This authentication occurs automatically, and it is not possible to sign out of an IP authenticated account.

Sign in through your institution

Choose this option to get remote access when outside your institution. Shibboleth/Open Athens technology is used to provide single sign-on between your institution’s website and Oxford Academic.

  • Click Sign in through your institution.
  • Select your institution from the list provided, which will take you to your institution's website to sign in.
  • When on the institution site, please use the credentials provided by your institution. Do not use an Oxford Academic personal account.
  • Following successful sign in, you will be returned to Oxford Academic.

If your institution is not listed or you cannot sign in to your institution’s website, please contact your librarian or administrator.

Sign in with a library card

Enter your library card number to sign in. If you cannot sign in, please contact your librarian.

Society Members

Society member access to a journal is achieved in one of the following ways:

Sign in through society site

Many societies offer single sign-on between the society website and Oxford Academic. If you see ‘Sign in through society site’ in the sign in pane within a journal:

  • Click Sign in through society site.
  • When on the society site, please use the credentials provided by that society. Do not use an Oxford Academic personal account.

If you do not have a society account or have forgotten your username or password, please contact your society.

Sign in using a personal account

Some societies use Oxford Academic personal accounts to provide access to their members. See below.

A personal account can be used to get email alerts, save searches, purchase content, and activate subscriptions.

Some societies use Oxford Academic personal accounts to provide access to their members.

Viewing your signed in accounts

Click the account icon in the top right to:

  • View your signed in personal account and access account management features.
  • View the institutional accounts that are providing access.

Signed in but can't access content

Oxford Academic is home to a wide variety of products. The institutional subscription may not cover the content that you are trying to access. If you believe you should have access to that content, please contact your librarian.

For librarians and administrators, your personal account also provides access to institutional account management. Here you will find options to view and activate subscriptions, manage institutional settings and access options, access usage statistics, and more.

Our books are available by subscription or purchase to libraries and institutions.

  • About Oxford Academic
  • Publish journals with us
  • University press partners
  • What we publish
  • New features  
  • Open access
  • Rights and permissions
  • Accessibility
  • Advertising
  • Media enquiries
  • Oxford University Press
  • Oxford Languages
  • University of Oxford

Oxford University Press is a department of the University of Oxford. It furthers the University's objective of excellence in research, scholarship, and education by publishing worldwide

  • Copyright © 2024 Oxford University Press
  • Cookie settings
  • Cookie policy
  • Privacy policy
  • Legal notice

This Feature Is Available To Subscribers Only

Sign In or Create an Account

This PDF is available to Subscribers Only

For full access to this pdf, sign in to an existing account, or purchase an annual subscription.

Performance Analysis of Handwritten Text Augmentation on Style-Based Dating of Historical Documents

  • Original Research
  • Open access
  • Published: 04 April 2024
  • Volume 5 , article number  397 , ( 2024 )

Cite this article

You have full access to this open access article

  • Lisa Koopmans   ORCID: orcid.org/0000-0001-6556-2600 1 ,
  • Maruf A. Dhali   ORCID: orcid.org/0000-0002-7548-3858 1 &
  • Lambert Schomaker   ORCID: orcid.org/0000-0003-2351-930X 1  

77 Accesses

Explore all metrics

One of the main questions paleographers aim to answer while studying historical manuscripts is when they were produced. Automatized methods provide tools that can aid in a more accurate and objective date estimation. Many of these methods are based on the hypothesis that handwriting styles change over periods. However, the sparse availability of digitized historical manuscripts poses a challenge in obtaining robust systems. The presented research extends previous research that explored the effects of data augmentation by elastic morphing on the dating of historical manuscripts. Linear support vector machines were trained on k-fold cross-validation on textural and grapheme-based features extracted from the Medieval Paleographical Scale, early Aramaic manuscripts, the Dead Sea Scrolls, and volumes of the French Royal Chancery collection. Results indicate training models with augmented data can improve the performance of historical manuscript dating by 1–3% in cumulative scores, but also diminish it. Data augmentation using elastic morphing can both improve and decrease date prediction of historical manuscripts and should be carefully considered. Moreover, further enhancements are possible by considering models tuned to the features and documents’ scripts.

Avoid common mistakes on your manuscript.

Introduction

Historical manuscripts, i.e., handwritten accounts, letters, and similar documents, contain essential information required to understand historical events. Paleographers study such documents, seeking to understand their social and cultural contexts. They specifically seek to identify the script(s), author(s), location, and production date of historical manuscripts, often through the study of handwriting styles, writing materials, and contents. To do so, specific domain knowledge is needed. Additionally, these methods are time costly and lead to subjective estimations. Moreover, the repetitive physical handling of the documents contributes to the documents’ further degradation.

Digitizing historical manuscripts by, e.g., scanning contributes to their preservation and has allowed for the application of machine learning to help paleographers answer the aforementioned questions in a more objective manner. Historical manuscript dating, in particular, can benefit from these automatized methods, as it can be necessary to resort to physical methods that can be destructive and have limited reliability.

Automatized methods are commonly based on the hypothesis that handwriting styles change over a period [ 1 ], allowing for the documents’ date estimation. These methods thus aim to estimate dates by identifying characteristics in handwriting specific to time periods.

Due to the limited availability of historical manuscripts, research has mainly focused on statistical feature-extraction techniques. These statistical methods extract the handwriting style by capturing attributes such as curvature or slant or representing the general character shapes in the documents [ 2 ]. However, for reliable results, manuscripts need a sufficient amount of handwriting to extract the handwriting styles.

Both traditional and automatized methods must deal with data sparsity and the degradation of ancient materials; new data can only be obtained by digitizing or discovering more manuscripts. Data augmentation can be a potential solution to this issue. Data augmentation is commonly used in machine learning to generate additional realistic training data from existing data to obtain more robust models. However, this must be done carefully to prevent the loss of information on the handwriting styles. This occurs when using standard techniques, such as rotating or mirroring the images. Data augmentation on the character level, on the other hand, could generate realistic samples that simulate an author’s own variability in their handwriting.

In previous research [ 3 ], we investigated the effects of character-level data augmentation on the style-based dating of historical manuscripts. The current research extends this by introducing another medieval data set, which contains images from the French Royal Chancery (FRC) registers, and applying the previously presented techniques. Specifically, manuscript images taken from the Medieval Paleographical Scale (MPS) collections, the FRC (HIMANIS Footnote 1 ) collection, the Bodleian Libraries of the University of Oxford, the Khalili collections, and the Dead Sea Scrolls were augmented with an elastic rubber-sheet algorithm [ 5 ]. The first collection, MPS, has medieval charters produced between 1300 and 1550 CE in four cities: Arnhem, Leiden, Leuven, and Groningen. Additionally, 840 images from 28 books of the FRC collection were used that date between 1302 and 1330 CE. A number of early Aramaic, Aramaic, and Hebrew manuscripts were taken from the last three collections. Several statistical feature-extraction methods on the textural and character level were used to train linear support vector machines (SVM) with only non-augmented images and with both non-augmented and augmented images.

Related Works

Scripts have their own characteristics, and some features represent these better than others. Consequently, feature selection is one of the challenges in style-based dating. Digital collections of historical manuscripts are available in various languages and scripts, such as the Medieval Paleographical Scale [ 6 ], and the Svenskt Diplomatariums huvudkartotek (SDHK) data sets, Footnote 2 which are in Dutch and Swedish, respectively. Additionally, the early Aramaic and Dead Sea Scrolls collections [ 7 ] contain ancient texts in Hebrew, Aramaic, Greek, and Arabic, dating from the fifth century BCE (Before the Common Era) until the Crusader Period (12th–13th centuries CE).

Statistical feature-extraction techniques are commonly divided into textural-based and grapheme-based features that capture information on the handwriting across an entire image on the textural and character level, respectively. One example of a statistical feature is the ‘Hinge’ feature, which captures a handwriting sample’s slant and curvature. It describes the joint probability distribution of the two hinged edge fragments and forms the basis of the family of ‘Hinge’ features [ 2 ]. The feature has been extended to, e.g., co-occurrence features QuadHinge and CoHinge, which emphasize the handwriting’s curvature and shape information, respectively [ 8 ].

Next to Hinge features, other textural features have been used for historical manuscript dating, such as curvature-free and chain code features [ 9 , 10 ]. Additionally, features can be combined. The authors of Ref. [ 11 ], for instance, investigated combinations of Gabor filters [ 12 ], local binary patterns [ 13 ], and gray-level co-occurrence Matrices [ 14 ], showing an improved performance compared to the individual features on the MPS data set.

Grapheme-based features represent character shapes. They are obtained through the extraction of graphemes from a set of documents and used to train a clustering method. The cluster representations form a codebook, from which a probability distribution of grapheme usage is computed for each document to represent the handwriting styles.

One grapheme-based feature is Connected Component Contours (CO3) [ 15 ], which describes the shape of a fully connected contour fragment. This was extended to Fraglets [ 2 ], which parts the connected contours based on their minima to deal with cursive handwriting. Other extensions are k stroke and k contour fragments, which partition CO3 in k stroke and contour fragments, respectively [ 16 ]. In He et al. [ 17 ], Junclets was proposed, which represents the junctions in writing contours.

The MPS data set is often used in research on historical manuscript dating. He et al. specifically proposed several methods. In He et al. [ 1 ], dates were predicted through a combination of local and global Support Vector Regression on extracted Fraglets and Hinge features. Later the authors extended their work and proposed i.a. the grapheme-based features k contour fragments and Junclets. Moreover, the temporal pattern codebook was proposed in He et al. [ 18 ] for maintaining temporal information lost with the Self-Organizing Map (SOM) that was common use for training codebooks. Finally, He et al. compared various statistical feature-extraction methods for historical manuscript dating in He et al. [ 19 ].

A reason the MPS data set is often used is that it is relatively clean. However, it is not representative of other collections of historical manuscripts. Due to aging, they are usually more degraded, as is the case with the Dead Sea Scrolls. An initial framework for the style-based dating of these manuscripts using both statistical and grapheme-based features was proposed in Dhali et al. [ 20 ]. The Dead Sea Scrolls posed a challenge due to them being fragmented and containing eroded ink traces. Moreover, there are few labeled manuscripts available.

Next to statistical hand-crafted features, deep learning approaches have been proposed. These have applied transfer learning, i.e., fine-tuning pre-trained neural networks using new data on a different task than they were initially intended for. Transfer learning has the potential for historical manuscript dating as it requires less data than standard deep learning methods. In Wahlberg et al. [ 21 ], the Google ImageNet-network was fine-tuned on the SDHK collection. However, this was done using 11,000 images, which is large for a data set of historical manuscripts. Additionally, a group of pre-trained neural networks was fine-tuned on the 3267 images from the MPS data set in Hamid et al. [ 22 ]. Here, statistical methods were outperformed by the best-performing model.

While deep learning approaches show promising results, statistical methods are still relevant. To train a neural network, the manuscripts’ images need to be partitioned into patches, possibly leading to loss of information. To solve this problem, Hamid et al. [ 22 ] ensured that each patch contained “3–4 lines of text with 1.5–2 words per line” to extract the handwriting style. While this was a solution for the MPS data set, it may not be for smaller and more degraded collections, such as the Dead Sea Scrolls. In contrast, statistical feature extraction does not require image resizing and considers the handwriting style over the entire image.

More recently, studies have fused statistical features with those extracted using neural networks. Hierarchical fusion was proposed by Adam et al. [ 23 ]. Specifically, they combined Gabor filters, Histogram of Orient Gradients, and Hinge with features extracted from a pre-trained ResNet through joint sparse representation. This method outperformed the individual statistical and ResNet features on a collection of Arabic manuscripts.

This section will present the dating model along with data description, image processing, and feature extraction techniques.

The initial research used two datasets: MPS and EAA [ 3 ]. In connection to those, the current article extends to the FRC dataset. These three datasets are briefly presented in the following subsections.

Medieval Paleographical Scale (MPS) Collection

The current research uses the MPS data set [ 1 , 6 , 16 , 24 ]. Non-text content, such as seals, supporting backgrounds, and color calibrators, have been removed. This data set consequently provides relatively clean images. Yet, it also includes more degraded images and images where parts of a seal or ribbon remain present. The data set is publicly available via Zenodo. Footnote 3

The MPS data set contains 3267 images of charters collected from four cities signifying four corners of the medieval Dutch language area. Charters were commonly used to document legal or financial transactions or actions. Additionally, the manuscripts’ production dates have been recorded. For these charters, usually parchment and sometimes paper was used. Figure  1 shows an example image.

figure 1

A document image from the Medieval Paleographical Scale (MPS) collection. Image taken from Koopmans et al. [ 3 ]

The medieval charters date from 1300 CE to 1550 CE. Due to the evolution of handwriting being slow and gradual, documents from 11 quarter century key years with a margin of ± 5 years were included in the data set. Hence, the data set consists of images of charters from the medieval Dutch language area in the periods 1300 ± 5, 1325 ± 5, 1350 ± 5, up to 1550 ± 5. Table  1 contains the number of charters in each key year.

Early Aramaic and Additional (EAA) Manuscripts

In addition to the MPS data set, 30 images from the early Aramaic, Aramaic, and Hebrew manuscripts were used. For ease of referreing to this second dataset, EAA is used in the rest of the article, even though EAA contains Aramaic and Hebrew in addition to early Aramaic scripts. A list of the EAA images used in this study can be found in appendix (see Table  7 ). These images are publicly available through the Bodleian Libraries, University of Oxford, Footnote 4 the Khalili collections, Footnote 5 and the Leon Levy Digital. For the selected manuscripts from the EAA dataset, the dates were directly inferred from dates or events recorded in the manuscripts (i.e., internally dated). Their dates span from 456 BCE to 133 CE. In addition, the data set contains several degraded manuscripts with missing ink traces or only two or three lines of text. An example image is shown in Fig.  2 .

figure 2

An early Aramaic (EA) manuscript from the Bodleian Libraries, the University of Oxford (Pell. Aram. I). Image taken from Koopmans et al. [ 3 ]

French Royal Chancery (FRC) Registers

As a third data set, and extending our previous work in Koopmans et al. [ 3 ], the FRC data set was used [ 25 ]. This was a sub-set of 840 images of the French Royal Chancery collection, also known as Registres du Trésor des Chartes. A total of 28 volumes from this collection were used, or specifically, their first 30 pages. This was to avoid the increasing use of drawings, crossed-out texts, marginal notes, etc., that is often observed in later pages of manuscripts. The list of volumes used in the current research can be found in appendix (see Table  8 ).

Most volumes contained the years they were written in (the ground truth) on the first page. However, these pages also contained many artifacts and little handwriting. Hence, these pages were removed after the ground truths were determined. Additionally, empty pages were removed. This led to the removal of 48 images, resulting in a data set of 792 images. An example image from the FRC collection is shown in Fig.  3 .

figure 3

A document image from the FRC collection (JJ035, scan 14)

Preprocessing

Label refinement.

EAA The set of images from the EAA collections did not contain sufficient samples for each year. Therefore, the samples were manually classified based on historical periods identified by historians. Footnote 6 The time periods and the corresponding number of samples are shown in Table  2 .

The Persian period contained two groups of samples spread apart for more than 30 years. Under the assumption that handwriting styles changed during this time, these samples were split into two periods: the Early and Late Persian Periods. These were based on the samples’ production years and not on defined historical periods. Images from the upper bound of the year range in Table  2 were included in the classes. The manuscripts from the Roman Period were excluded as they contained insufficient samples. The images were relabeled according to the median of their corresponding year ranges.

FRC The set of images from the FRC collection, in contrast to the EAA collections, contained sufficient samples for each year. However, many of the volumes overlapped in the time ranges they were written or were too close to each other for distinguishable changes. For this reason, the images were manually classified into 4 categories, such that they contained sufficient variability to capture the handwriting style changes over a period. The images were relabeled according to the median of their corresponding year ranges. The resulting categories and their distributions are shown in Table  3 .

Data Augmentation

To augment the data such that new samples simulate a realistic variability of an author’s handwriting, the Imagemorph program [ 26 ] was used. The program applies random elastic rubber-sheet transforms to the data through local non-uniform distortions, meaning that transformations occur on the components of characters. Consequently, the Imagemorph algorithm can generate a large number of unique samples. For the augmented data to be realistic, a smoothing radius of 8 and a displacement factor of 1 were used, measured in units of pixels. As images of the MPS data set required high memory, three augmented images were generated per image. Since the EAA data sets were small, 15 images were generated per image. The FRC images were each augmented 10 times for the same reason.

Binarization

To extract only the handwriting, the ink traces in the images were extracted through binarization. This resulted in images with a white background representing the writing surface, and a black foreground representing the ink of the handwriting.

MPS Otsu thresholding [ 28 ] was used for binarizing the MPS images, as the MPS data set is relatively clean, and it has been successfully used in previous research with the data set [ 1 , 16 , 19 ]. Otsu thresholding is an intensity-based thresholding technique where the separability between the resulting gray values (black and white) is maximized. Figure  4 shows Fig.  1 after binarization.

figure 4

The binarized version of the image from Fig.  1 with Otsu thresholding. Image taken from Koopmans et al. [ 3 ]

EAA The EAA images were more difficult to binarize using threshold-based techniques. So, for the EAA images, we used BiNet: a deep learning-based method designed specifically to binarize historical manuscripts [ 27 ]. Figure  5 shows Fig.  2 after binarization.

figure 5

The binarized version of the image from Fig.  2 using BiNet [ 27 ]. Image taken from Koopmans et al. [ 3 ]

FRC The FRC images were partly binarized using BiNet, and partly using adaptive thresholding. Here, the threshold value is a Gaussian-weighted sum of the neighborhood, subtracted with a set constant C. The neighborhood size was set to ( \(45\times 45\) ), and a constant of \(C=23\) was used after experimentation. However, due to the resulting binarized documents containing many unwanted artifacts, further operations were applied for noise reduction. First, a median filter with a size of \(5\times 5\) was applied. Next, dilation was applied with a kernel size of ( \(7\times 7\) ) to emphasize the locations of the handwriting. After the dilation, a bitwise AND operator was used on the dilated and the initial binarized image. This removed the majority of the artifacts in the original binarized images. However, the ink contained many gaps in the letters, too. Hence, a closing operation and another median filter were applied, with kernel sizes ( \(2\times 2\) ) and ( \(3\times 3\) ), respectively. For simplicity, we will refer to this binarization process as adaptive thresholding.

A total of 398 images were binarized using adaptive thresholding, and 394 with BiNet. For 3 images (2 of category 1311 and 1 of 1319), the binarization quality was too low to obtain features for. This left 789 images of the FRC data set to be used for feature extraction. When using adaptive thresholding, the images were cropped before binarization was applied, as this enhanced the binarization quality. For the BiNet images, this was done after binarization to decrease the whitespace. From the images, 5% of the top and left sides were cut, and 10% and 20 % were cut from the bottom and right sides. Moreover, images were further manually cropped when much noise was present on the outer sides. Examples of images binarized with BiNet and adaptive thresholding are shown in Figs.  6 and 7 , respectively.

figure 6

Example image of the FRC collection binarized using BiNet model [ 27 ]

figure 7

Binarized image of Fig.  3 using adaptive thresholding

Feature Extraction

The handwriting styles of manuscripts were described by five textural features and one grapheme-based feature. Since the MPS, FRC, and the EAA data sets are written in different scripts and languages, features were chosen that are robust to this.

Textural Features

Textural-based feature-extraction methods contain statistical information on handwriting in a binarized image by considering its texture. Textural-based features capture handwriting attributes like slant, curvature, and the author’s pen grip, represented in a probability distribution.

He et al. proposed the joint feature distribution (JFD) principle in He and Schomaker [ 19 ], describing how new, more robust features can be created. Two groups of such features were identified: the spatial joint feature distribution (JFD-S) and the attribute joint feature distribution (JFD-A). The JFD-S principle derives new features by combining the same feature at adjacent locations, consequently capturing a larger area. The JFD-A principle derives new features from different features at the same location, consequently capturing multiple properties.

Hinge [ 2 ] is obtained by taking orientations \(\alpha\) and \(\beta\) with \(\alpha < \beta\) of two contour fragments attached at one pixel and computing their joint probability distribution. The Hinge feature captures the curvature and orientation in the handwriting. 23 angle bins were used for \(\alpha\) and \(\beta\) .

CoHinge [ 8 ] follows the JFD-S principle, combining two Hinge kernels at two different points \(x_i, x_j\) with a Manhattan distance l , and is described by:

This shows that the CoHinge kernel over contour fragments can be quantized into a 4D histogram. The number of bins for each orientation \(\alpha\) and \(\beta\) was set to 10.

QuadHinge [ 8 ] follows the JDF-A principle, combining the Hinge kernel with the fragment curvature measurement \(C(f_{\textrm{c}})\) . Although Hinge also captures curvature information, it focuses on the orientation due to the small lengths of the contour fragments or lengths of the hinge edges. The fragment curvature measurement is defined as:

\(F_{\textrm{c}}\) is a contour fragment with length s on an ink trace with endpoints \((x_1, y_1), (x_2, y_2)\) . In addition, the QuadHinge feature is scale-invariant due to agglomerating the kernel with multiple scales. The QuadHinge kernel can then be described through the Hinge kernel and the fragment curvature measurement on contour fragments \(F_1, F_2\) :

The number of bins of the orientations was set to 12, and that for the curvature to 6, resulting in a dimensionality of 5184.

DeltaHinge [ 29 ] is a rotation-invariant feature generalizing the Hinge feature by computing the first derivative of the Hinge kernel over a sequence of pixels along a contour. Consequently, it captures the curvature information of the handwriting contours. The Delta-n-Hinge kernel is defined as:

where n is the n th derivative of the Hinge kernel. When used for writer identification, performance decreased for \(n >1\) , implying that the feature’s ability to capture writing styles decreased. Hence, the current research used \(n = 1\) .

Triple Chain Code (TCC) [ 10 ] captures the curvature and orientation of the handwriting by combining chain codes at three different locations along a contour fragment. The chain code represents the direction of the next pixel, indicated by a number between 1 and 8. TCC is defined as:

where \({\text {CC}}(x_i)\) is the chain code at location \(x_i\) , and Manhattan distance \(l = 7\) .

Grapheme-Based Features

Grapheme-based features are allograph-level features that partially or fully overlap with allographs in handwriting, described by a statistical distribution. The handwriting style is represented by the probability distribution of the grapheme usage across a document, computed with a common codebook.

Junclets [ 17 ] represents the crossing points, i.e., junctions, in handwriting. Junctions are categorized into ‘L,’ ‘T,’ and ‘X’ junctions with 2, 3, and 4 branches, respectively. In different time periods, the angles between the branches, the number of branches, and the lengths of the branches can differ, which is captured by the junction representations. Compared to other grapheme-based features, this feature does not need segmentation or line detection methods. A junction is represented as the normalized stroke-length distribution of a reference point in the ink over a set of \(N = 120\) directions. The stroke lengths are computed with the Euclidean distance from a reference point in a direction \(n \in N\) until the edge of the ink. The feature is scale-invariant and captures the ink-width and stroke length.

Previous research commonly used the Self-Organizing Map (SOM) [ 30 ] unsupervised clustering method to train the codebook [ 19 ]. Using SOM, however, means losing temporal information of the input patterns. The partially supervised Self-Organizing Time Map (SOTM) [ 31 ] maintains this information. In He et al. [ 18 ], SOTM showed an improved performance for a grapheme-based feature compared to SOM. Hence, the codebook was trained with SOTM.

SOTM trains sub-codebooks \(D_t\) for each time period using the standard SOM [ 30 ], with handwriting patterns \(\Omega (t)\) from key year y ( t ). The key years for the MPS (in CE), EAA (in BCE), and FRC (in CE) data sets were defined as \(y(t) = \{1300, 1325, 1350,\ldots , 1550\}\) , \(y(t) = \{470, 365, 198 \}\) , and \(y(t) = \{1305, 1311, 1319, 1326 \}\) , respectively. The final codebook D is composed of the sub-codebooks \(D_t\) : \(D = \{D_1, D_2,\ldots , D_n\}\) , with n key years. To maintain the temporal information, the sub-codebooks are trained in ascending order. The initial sub-codebook \(D_1\) is randomly initialized as no prior information exists in the data set. The succeeding sub-codebooks are initialized with \(D_{t - 1}\) and then trained. Algorithm 1 shows the pseudo-code obtained from He et al. [ 18 ].

The Euclidean distance measure was used to train the sub-codebooks, as it significantly decreased training times compared to the commonly used \(\chi ^2\) -distance. Each sub-codebook was trained for 500 epochs, ensuring sufficient training occurred. The learning rate \(\alpha ^*\) decayed from \(\alpha = 0.99\) following Eq. ( 6 ). The sub-codebooks were trained on a computer cluster. Footnote 7

A historical manuscript’s feature vector was obtained by mapping its extracted graphemes to their most similar elements in the trained codebook, computed via the Euclidean distance, and forming a histogram. Finally, the normalized histogram formed the feature vector.

figure a

SOTM [ 3 , 18 ]

Post-processing

The feature vectors of all features were small decimal numbers, varying between \(10^{-2}\) and \(10^{-6}\) . To emphasize the differences between the feature vectors of feature type, the feature vectors were normalized between 0 and 1 based on the range of a feature’s feature vectors. A feature vector f is scaled according to the following equations:

Here, max and min are the maximum and minimum values across the whole set of feature vectors of a certain feature, and \(\max (f)\) and \(\min (f)\) are the maximum and minimum values of the feature vector f [ 32 ].

Historical manuscript dating can be regarded as a classification or a regression problem. As the MPS data set was divided into 11 non-overlapping classes (or key years), and the EAA and FRC data sets were partitioned into classes, it was regarded as a classification problem. Following previous research on the MPS data set [ 19 ], linear support vector machines (SVM) were used for date prediction with a one-versus-all strategy.

The mean absolute error (MAE) and the cumulative score (CS) are two commonly used metrics to evaluate model performance for historical manuscript dating. The MAE is defined as follows:

Here, \(y_i\) is a query document’s ground truth, and \(\bar{y_i}\) is its estimated year. N is the number of test documents. The CS is defined in Geng et al. [ 33 ] as

The CS describes the percentage of test images that are predicted with an absolute error e no higher than a number of years \(\alpha\) . At \(\alpha = 0\) years, the CS is equal to the accuracy.

For all data sets, CS with \(\alpha = 0\) years was used. Since paleographers generally consider an absolute error of 25 years acceptable, and the MPS set has key years spread apart by 25 years, CS with \(\alpha = 25\)  years was also used for the MPS data set.

Experiments

The MPS and FRC images were randomly split into a test and training set, containing 10% and 90% of the data, respectively. The EAA images were split into a test and training set of 5 and 23 images, respectively. 2 samples were included in the test set from classes 470 and 365 BCE each, and 1 image from class 198 BCE. The images were sorted based on their labels, and the first images of each class were selected for testing.

The models were tuned with stratified k-fold cross-validation for both data sets, as they were imbalanced. For the MPS and FRC data sets, \(k = 10\) . Since the training set of the EAA data set contained only four images from 198 BCE, \(k = 4\) for this set. To obtain the models to be tuned to specific validation sets in each iteration of k-fold cross-validation, hyper-parameters were selected using the mean results of the cross-validation across six random seeds, ranging from 0 to 250 with steps of 50. The set of values considered for the hyper-parameters were \(2^n, n = -7, -6, -5,\ldots , 10\) . During the process, the augmented images of those in the validation and test sets were excluded from the training.

Models were trained in two conditions. In the non-augmented condition, only non-augmented images were used, and in the augmented condition, both augmented and non-augmented images were used for training.

Codebook Different sub-codebook sizes can result in different model performances. Hence, various sub-codebook sizes were tested to obtain the optimal size for the Junclets feature. A codebook’s size is its total number of nodes, i.e., \(n_{\textrm{columns}} \cdot n_{\textrm{rows}}\) . The full codebook D is the concatenation of the sub-codebooks \(D_t\) , and thus, its size will be \({\text {size}}_{D_t} \cdot n_{\textrm{classes}}\) . We considered the set of sub-codebook sizes \(s = \{25, 100, 225, 400, 625, 900\}\) with \(n_{\textrm{columns}} = n_{\textrm{rows}}\) . These conditions were the same for all data sets. Since different codebook sizes result in different features, the sub-codebook sizes were determined based on the validation results of models trained on only non-augmented images.

The code used for the experiments and the SOTM is publicly available. Footnote 8

Five textural and one grapheme-based feature were used to explore the effects of data augmentation on the style-based dating of historical manuscripts. Linear SVMs were trained in ‘non-augmented’ and ‘augmented’ conditions and tuned using tenfold (MPS and FRC) and fourfold (EAA) cross-validation. The models were tested on a hold-out set containing only non-augmented data. The test set of the MPS and FRC data sets contained 10% of the data, and that of the EAA dataset contained 17.8% (5 images) of the data.

The models were evaluated with the MAE and CS with \(\alpha = 0\)  years (i.e., accuracy). In addition, the MPS data set was evaluated with CS with \(\alpha = 25\)  years.

Sub-codebook Size

An optimal sub-codebook size needed to be selected before investigating Junclets. Results of k-fold cross-validation for sub-codebook sizes 25, 100, 225, 400, 625, and 900 were evaluated on non-augmented data. For the FRC data set, no graphemes were extracted for a total of 395 images. From these, 96, 138, 126, and 35 images were part of the 1305, 1311, 1319, and 1316 categories, respectively. Hence, the models for Junclets could only be trained on 394 images.

Figures  8 and 9 show the MAE and CS for the MPS data set over sub-codebook size, respectively. The MAE shows a minimum sub-codebook size of 625. Moreover, CS with \(\alpha = 25\) and \(\alpha = 0\)  years show a maximum at sub-codebook size 625. Therefore, Junclets features were obtained with sub-codebooks of size 625 on the MPS data.

figure 8

MAE over sub-codebook size on non-augmented MPS data from tenfold cross-validation. Also presented in Koopmans et al. [ 3 ]

figure 9

CS with \(\alpha = 25\) and \(\alpha = 0\)  years over sub-codebook size on non-augmented MPS data from tenfold cross-validation. Also presented in Koopmans et al. [ 3 ]

figure 10

MAE over sub-codebook size on non-augmented EAA data from fourfold cross-validation. Also presented in Koopmans et al. [ 3 ]

Figure  10 displays the MAE over the sub-codebook size on validation results for the EAA data. The MAE decreases until the sub-codebook size is 225, after which it fluctuates. This is reflected in the CS with \(\alpha = 0\)  years (Fig.  11 ). In addition, the standard deviations for the MAE and CS ( \(\alpha = 0\) ) appear the smallest here. Hence, a sub-codebook size of 225 was chosen for the EAA data.

figure 11

CS with \(\alpha = 0\)  years over sub-codebook size on non-augmented EAA data from four fold cross-validation. Also presented in Koopmans et al. [ 3 ]

Figures  12 and 13 display the MAE and CS with \(\alpha =0\)  years for the FRC data set over the sub-codebook size, respectively. The MAE shows a minimum sub-codebook size of 100. Likewise, the CS shows a maximum for this size. Consequently, Junclets features were obtained using a sub-codebook size of 100 for the FRC data set.

figure 12

MAE over sub-codebook size on non-augmented FRC data from tenfold cross-validation

figure 13

CS with \(\alpha = 0\)  years over sub-codebook size on non-augmented FRC data from tenfold cross-validation

Augmentation

Figure  14 shows the MAE per feature across augmented and non-augmented conditions. All features except TCC display a decrease in the augmented condition compared to the non-augmented condition. TCC displayed an increase.

figure 14

MAE on MPS (unseen) test data across non-augmented and augmented conditions. Also presented in Koopmans et al. [ 3 ]

Figure  15 shows the CS with \(\alpha = 25\)  years for both non-augmented and augmented conditions. All features display an increase in the augmented condition compared to the non-augmented condition except for TCC and Hinge, which showed a decrease. Additionally, Junclets did not change in performance across conditions

figure 15

CS with \(\alpha = 25\)  years on MPS (unseen) test data across non-augmented and augmented conditions. Also presented in Koopmans et al. [ 3 ]

As displayed in Fig.  16 , all features showed an increase in CS with \(\alpha = 0\)  years in the augmented condition compared to the non-augmented condition except DeltaHinge. DeltaHinge feature showed no change in performance across conditions on test data.

figure 16

CS with \(\alpha = 0\)  years on MPS (unseen) test data across non-augmented and augmented conditions. Also presented in Koopmans et al. [ 3 ]

These results denote an overall increase in performance for all features, with the exception of TCC. However, the changes in performances are small. This is reflected in the validation results shown in Table  4 , where changes between the non-augmented and augmented conditions are insignificant. This is indicated by means of the measures in augmented conditions overlapping with the ranges denoted by the standard deviations corresponding to the non-augmented conditions.

EAA Collections

Figures  17 and 18 show the MAE and CS with \(\alpha = 0\)  years across all features for the EAA data set. Performance increased for Junclets in the augmented condition compared to the non-augmented condition, denoted by decreases in MAE and increases in accuracy. QuadHinge also showed an increase in performance, indicated by the decrease in MAE in the augmented condition. A decrease in performance for TCC, DeltaHinge, and Hinge features is denoted by an increase in MAE and a reduction in accuracy. CoHinge displayed no change across conditions on the test set.

figure 17

MAE on EAA (unseen) test data across non-augmented and augmented conditions. Also presented in Koopmans et al. [ 3 ]

figure 18

CS with \(\alpha = 0\)  years on EAA (unseen) test data across non-augmented and augmented conditions. Also presented in Koopmans et al. [ 3 ]

The validation results do not reflect these test results (Table  5 ), as Junclets and TCC displayed a decrease in performance with a reduction in mean MAE and a decrease in mean accuracy in the augmented condition compared to the non-augmented condition. However, DeltaHinge, QuadHinge, CoHinge, and Hinge showed the opposite. Additionally, standard deviations increased significantly in the augmented condition compared to the non-augmented condition.

Figures  19 and 20 show the MAE and CS with \(\alpha = 0\)  years across all features. In contrast to the MPS and EAA data sets, these results show a decrease in model performance for augmented images across all features, except Junclets. The Junclets feature showed an increase in performance, as denoted by the decrease in test MAE and increase in test CS with \(\alpha = 0\) .

figure 19

MAE on FRC (unseen) test data across non-augmented and augmented conditions

figure 20

CS with \(\alpha = 0\)  years on FRC (unseen) test data across non-augmented and augmented conditions

These results are partially reflected in the validation results, shown in Table  6 . The test results are mainly in line with the validation results, with the exception of the CoHinge feature, which displayed an overall increase in performance. However, this increase, along with that of Junclets, appears insignificant due to the measures’ means and corresponding standard deviations overlapping.

Significance

Statistical tests (ANOVA, [ 34 ]) were performed to see if the results showed significant improvements for all features.

MPS For the MPS data, the results from Junclets and DeltaHinge features were statistically significant for both MAE and CS, with p values much smaller than 0.005.

EAA The results on the EAA data did not show any significance for any of the feature extraction techniques.

FRC The results on the FRC data set showed a statistically significant difference ( \(p < 0.0005\) ) between the augmented and non-augmented conditions for the accuracy of the Hinge feature and the MAE of the CoHinge feature. As aforementioned, the Hinge feature displayed a decrease in model performance, while CoHinge displayed an increase in performance. The remaining features did not show statistically significant differences.

The current research explored how the style-based dating of historical manuscripts is affected by character-level data augmentation using images from the MPS, FRC, and EAA collections. Images were binarized and augmented through elastic morphing [ 26 ]. Linear SVMs were trained on five textural features and one grapheme-based feature. Experiments were conducted to determine optimal sub-codebook sizes for the grapheme-based Junclets feature, as it was obtained through mapping extracted junctions to codebooks trained with SOTM [ 31 ]. The SVMs were then trained on only non-augmented images and both non-augmented and augmented images and evaluation through the MAE and CS with \(\alpha\) -values of 0 and 25 years

Key Findings

Linear SVMs trained on test MPS data in the augmented condition showed an overall increased performance compared to the non-augmented condition for all features except TCC, which showed a decrease in performance. The changes in validation results were statistically significant for Junclets and DeltaHinge. Changes in validation results were insignificant for the remaining features, with the ranges of the standard deviations and mean overlapping across conditions.

The MPS images require much computer memory, and consequently, acquiring the features and models is time costly. Specifically, obtaining the Junclets features required several days. Hence, we generated only three augmented images per MPS image. If more images were generated, results might have shown a clearer or different picture of the effects of data augmentation on the style-based dating of historical manuscripts.

Another possible explanation for the aforementioned small changes in performance is that MPS images were augmented before binarization. Images were augmented using the Imagemorph program, which applies a Gaussian filter over local transformations. Applying this before binarization leads to increased influence from the image’s background, and, consequently, to less severe distortions. The distortions in MPS images were noticeable, yet they might have been too light to produce samples with natural within-writer variability. Whether this significantly affected the results is uncertain and should be considered in the future.

Test results of models trained on EAA images showed increased performance in the augmented condition compared to the non-augmented condition for Junclets and QuadHinge. Models for TCC, DeltaHinge, and Hinge showed a decrease, and CoHinge showed no change in performance. This is not reflected in the validation results (Table  5 ). Instead, these results show a decreased performance for Junclets and TCC in the augmented condition compared to the non-augmented condition, and an increase in performance for all other features.

An explanation for these results is the increase in standard deviations across all features for models trained in the augmented conditions compared to models trained in the non-augmented condition. This indicates models were less robust to new data in the augmented condition, which may have led to diverging test results. Moreover, due to the small size of the data set, overfitting likely occurred, as suggested by the differences between test results and validation results within the conditions (e.g., QuadHinge).

The models trained on EAA data could have been less robust in the augmented condition due to the extracted features possibly following nonlinear patterns. The SVMs, however, used linear kernels. While linear SVMs worked well on the MPS images, these are written in Roman script. The EAA data are written in Hebrew. Moreover, handwriting in different geographical locations might change at different rates. Data augmentation could have emphasized the nonlinear patterns in Hebrew, making the linear SVMs too rigid.

Textural features showed a decrease in performance on test results, while the grapheme-based Junclets feature showed an increase in performance. These findings are in agreement with the validation results, except for CoHinge. CoHinge showed a statistically significant difference in accuracy between non-augmented and augmented conditions, indicating its increase in performance observed in the validation results was significant. On the other hand, the Hinge feature showed a decrease in performance and displayed a statistically significant difference between the conditions.

While the MPS and FRC data sets are both written in the Roman script, the results are opposing. The results on the MPS data set showed an overall increase in model performance when images were augmented, and those on the FRC images showed an overall decrease. One of the main differences between these data sets is in the pre-processing. As with the EAA images, FRC images were augmented after binarization. Additionally, the binarization of FRC images used two different methods, of which the adaptive thresholding method often resulted in thinned or broken characters. The ’n’ and ’m’ characters, in particular, were often disrupted.

Another possible cause for the decrease in the performance of models trained on augmented FRC data is that the augmentation method might introduce too much variety within time periods. While the elastic morphing algorithm is designed to mimic within-writer variability, the relatively low quality of some of the binarized images might have caused too much distortion on the textural level.

Future Research

Characteristics differ between scripts, possibly leading to differing distributions of extracted features. Similarly, different attributes of handwriting are captured by individual features. Consequently, features might follow differing temporal trends (nonlinear or linear). The current research only used linear models and did not consider the differences between both features and scripts. This could have led to a decrease in model performance for augmented data. Therefore, nonlinear models should be studied to optimize performance on individual features and scripts.

Often data sets do not contain equal numbers of manuscripts or writing samples from time periods. The current study did not consider data balancing, for which data augmentation is also a common approach. Rather than generating equal numbers of samples through data augmentation for each image, choosing the number of samples to generate with a focus on data balancing can lead to fewer misclassifications of minority classes and therefore improve overall model performances.

Some data sets purposed for historical manuscript dating can contain classes in which most samples are produced by one writer. This can pose an issue as models trained on these data risk to learn to distinguish writer-specific traits in handwriting rather than those representative of a particular period or year. Hence, it is important to simulate variability between writers and within time periods when augmenting data. This might lead to more robust models than when simulating a realistic within-writer variability.

In “ Related works ” section, we presented previous research which showed deep learning approaches outperformed statistical features on the MPS data set. Moreover, a fusion of statistical features and deep learning approaches has been shown to improve performance [ 23 ]. It would be interesting to investigate whether data augmentation might positively affect historical manuscript dating using these methods on smaller and more heavily degraded documents, such as the EAA collections. Additionally, applying such approaches to individual characters similar to grapheme-based statistical features might bypass the issues of limited data and loss of information due to image resizing.

Data availability

The datasets used in this article are publicly available from their original websites and/or Zenodo repository. All relevant links are shared in the footnotes 1,3,4,5, and 6. For the repository, the DOI is added.

French Royal Chancery, Paris, Archives nationales de France, JJ35-JJ211, shared with us in the HIMANIS project ( https://eadh.org/projects/himanis ) [ 4 ].

https://sok.riksarkivet.se/SDHK .

https://zenodo.org/record/1194357#.YrLU-OxBy3I . https://doi.org/10.5281/zenodo.1194357 .

https://digital.bodleian.ox.ac.uk/ .

https://www.khalilicollections.org/all-collections/aramaic-documents/ .

https://www.deadseascrolls.org.il/learn-about-the-scrolls/ .

https://wiki.hpc.rug.nl/peregrine/start .

https://github.com/Lisa-dk/manuscript-dating-sn.git .

He S, Samara P, Burgers J, Schomaker L. Towards style-based dating of historical documents. In: 14th International conference on frontiers in handwritten recognition. IEEE; 2014. https://doi.org/10.1109/ICFHR.2014.52 .

Bulacu ML, Schomaker LRB. Text-independent writer identification and verification using textural and allographic features. IEEE Trans Pattern Anal Mach Intell. 2007;29(4):701–17. https://doi.org/10.1109/TPAMI.2007.1009 .

Article   Google Scholar  

Koopmans L, Dhali M, Schomaker L. The effects of character-level data augmentation on style-based dating of historical manuscripts. In: Proceedings of the 12th international conference on pattern recognition applications and methods—ICPRAM, vol 1. 2023, pp. 124–35. https://doi.org/10.5220/0011699500003411 ( SciTePress ).

Stutzmann D, Moufflet J-F, Hamel S. La recherche en plein texte dans les sources manuscrites médiévales : enjeux et perspectives du projet HIMANIS pour l’édition électronique. Médiévales. 2017;73(73):67–96. https://doi.org/10.4000/medievales.8198 .

Bulacu M, Brink A, Van Der Zant T, Schomaker L. Recognition of handwritten numerical fields in a large single-writer historical collection. In: 2009 10th International conference on document analysis and recognition. IEEE; 2009, pp. 808–812.

He S, Schomaker L, Samara P, Burgers J. MPS data set with images of medieval charters for handwriting-style based dating of manuscripts. https://doi.org/10.5281/zenodo.1194357 .

Shor P, Manfredi M, Bearman GH, Marengo E, Boydston K, Christens-Barry WA. The leon levy dead sea scrolls digital library: the digitization project of the dead sea scrolls. J East Mediterr Archaeol Herit Stud. 2014;2(2):71–89. https://doi.org/10.5325/jeasmedarcherstu.2.2.0071 .

He S, Schomaker L. Co-occurrence features for writer identification. In: Proceedings of international conference on frontiers in handwriting recognition, ICFHR. Institute of Electrical and Electronics Engineers Inc; 2017, pp. 78–83. https://doi.org/10.1109/ICFHR.2016.0027 .

He S, Schomaker L. Writer identification using curvature-free features. Pattern Recognit. 2017;63:451–64. https://doi.org/10.1016/j.patcog.2016.09.044 .

Siddiqi I, Vincent N. Text independent writer recognition using redundant writing patterns with contour-based orientation and curvature features. Pattern Recognit. 2010;43(11):3853–65. https://doi.org/10.1016/j.patcog.2010.05.019 .

Hamid A, Bibi M, Siddiqi I, Moetesum M. Historical manuscript dating using textural measures. In: 2018 International conference on frontiers of information technology (FIT). 2018, pp. 235–240 . https://doi.org/10.1109/FIT.2018.00048 .

Fogel I, Sagi D. Gabor filters as texture discriminator. Biol Cybern. 1989;61(2):103–13. https://doi.org/10.1007/BF00204594 .

Heikkilä M, Pietikäinen M, Schmid C. Description of interest regions with local binary patterns. Pattern Recognit. 2009;42(3):425–36. https://doi.org/10.1016/j.patcog.2008.08.014 .

Haralick R.M, Shanmugam K, Dinstein I. Textural features for image classification. IEEE Trans Syst Man Cybernet SMC. 1973;3(6):610–21. https://doi.org/10.1109/TSMC.1973.4309314 .

Schomaker L, Bulacu M. Automatic writer identification using connected-component contours and edge-based features of uppercase western script. IEEE Trans Pattern Anal Mach Intell. 2004;26(6):787–98.

He S, Samara P, Burgers J, Schomaker L. Image-based historical manuscript dating using contour and stroke fragments. Pattern Recognit. 2016;58:159–71. https://doi.org/10.1016/j.patcog.2016.03.032 .

He S, Wiering M, Schomaker L. Junction detection in handwritten documents and its application to writer identification. Pattern Recognit. 2015;48(12):4036–48. https://doi.org/10.1016/j.patcog.2015.05.022 .

He S, Samara P, Burgers J, Schomaker L. Historical manuscript dating based on temporal pattern codebook. Comput Vis Image Underst. 2016;152:167–75. https://doi.org/10.1016/j.cviu.2016.08.008 .

He S, Schomaker L. Beyond OCR: multi-faceted understanding of handwritten document characteristics. Pattern Recognit. 2017;63:321–33. https://doi.org/10.1016/j.patcog.2016.09.017 .

Dhali MA, Jansen CN, de Wit JW, Schomaker L. Feature-extraction methods for historical manuscript dating based on writing style development. Pattern Recognit Lett. 2020;131:413–20. https://doi.org/10.1016/j.patrec.2020.01.027 .

Wahlberg F, Wilkinson T, Brun A. Historical manuscript production date estimation using deep convolutional neural networks. In: 2016 15th International conference on frontiers in handwriting recognition (ICFHR). 2016, pp. 205–210 . https://doi.org/10.1109/ICFHR.2016.0048 .

Hamid A, Bibi M, Moetesum M, Siddiqi I. Deep learning based approach for historical manuscript dating. In: 2019 International conference on document analysis and recognition (ICDAR). 2019, pp. 967–972 . https://doi.org/10.1109/ICDAR.2019.00159 .

Adam K, Al-ma’adeed S, Akbari Y. Hierarchical fusion using subsets of multi-features for historical Arabic manuscript dating. J Imaging. 2022;8(3):60. https://doi.org/10.3390/jimaging8030060 .

He S, Samara P, Burgers J, Schomaker L. A multiple-label guided clustering algorithm for historical document dating and localization. IEEE Trans Image Process. 2016;25(11):5252–65. https://doi.org/10.1109/TIP.2016.2602078 .

Article   MathSciNet   Google Scholar  

Schomaker L. Monk-search and annotation tools for handwritten manuscripts. 2023. http://monk.hpc.rug.nl/ . Accessed 08 July 2023.

Bulacu M, Brink A, Zant T, Schomaker L. Recognition of handwritten numerical fields in a large single-writer historical collection In: This is a peer-reviewed conference paper on an important international conference series, ICDAR; 2009 10th international conference on document analysis and recognition; conference date: 26-07-2009 Through 29-07-2009. 2009, pp. 808–812 . https://doi.org/10.1109/ICDAR.2009.8 .

Dhali M, Wit J, Schomaker L. Binet: degraded-manuscript binarization in diverse document textures and layouts using deep encoder–decoder networks. ArXiv; 2019, pp. 26, 15 figures, 11 tables.

Otsu N. A threshold selection method from gray-level histograms. IEEE Trans Syst Man Cybern. 1979;9(1):62–6. https://doi.org/10.1109/TSMC.1979.4310076 .

He S, Schomaker L. Delta-n hinge: rotation-invariant features for writer identification. In: 22th International conference on pattern recognition (ICPR). IEEE (The Institute of Electrical and Electronics Engineers); 2014, pp. 2023–2028. https://doi.org/10.1109/ICPR.2014.353 .

Kohonen T. The self-organizing map. Proc IEEE. 1990;78(9):1464–80. https://doi.org/10.1109/5.58325 .

Sarlin P. Self-organizing time map: an abstraction of temporal multivariate patterns. Neurocomputing. 2013;99:496–508. https://doi.org/10.1016/j.neucom.2012.07.011 .

Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E. Scikit-learn: machine learning in Python. J Mach Learn Res. 2011;12:2825–30.

MathSciNet   Google Scholar  

Geng X, Zhou Z-H, Smith-Miles K. Automatic age estimation based on facial aging patterns. IEEE Trans Pattern Anal Mach Intell. 2007;29(12):2234–40. https://doi.org/10.1109/TPAMI.2007.70733 .

Cuevas A, Febrero M, Fraiman R. An ANOVA test for functional data. Comput Stat Data Anal. 2004;47(1):111–22.

Download references

Acknowledgements

The study for this article collaborated with several research outcomes from the European Research Council (EU Horizon 2020) project: The Hands that Wrote the Bible: Digital Palaeography and Scribal Culture of the Dead Sea Scrolls (HandsandBible 640497), principal investigator: Mladen Popović. Furthermore, for the high-resolution, multi-spectral images of the Dead Sea Scrolls, we are grateful to the Israel Antiquities Authority (IAA), courtesy of the Leon Levy Dead Sea Scrolls Digital Library; photographer: Shai Halevi. Additionally, we express our gratitude to the Bodleian Libraries, the University of Oxford, the Khalili collections, and the Staatliche Museen zu Berlin (photographer: Sandra Steib) for the early Aramaic images. We also thank Petros Samara for collecting the Medieval Paleographical Scale (MPS) dataset for the Dutch NWO project. For the inventories of the French Royal Chancery (FRC) registers, we thank the HIMANIS project (HIstorical MANuscript Indexing for user-controlled Search); photography credits: Paris, Archives Nationales. Finally, we thank the Center for Information Technology of the University of Groningen for their support and for providing access to the Peregrine high-performance computing cluster.

Author information

Authors and affiliations.

Department of Artificial Intelligence, University of Groningen, Nijenborgh 4, 9747 AG, Groningen, Groningen, The Netherlands

Lisa Koopmans, Maruf A. Dhali & Lambert Schomaker

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Lisa Koopmans .

Ethics declarations

Conflict of interest.

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article is part of the topical collection “Recent Trends on Pattern Recognition Applications and Methods”. Guest edited by Ana Fred, Maria De Marsico and Gabriella Sanniti di Baja.

See Tables  7 and 8 .

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Koopmans, L., Dhali, M.A. & Schomaker, L. Performance Analysis of Handwritten Text Augmentation on Style-Based Dating of Historical Documents. SN COMPUT. SCI. 5 , 397 (2024). https://doi.org/10.1007/s42979-024-02688-6

Download citation

Received : 12 June 2023

Accepted : 05 February 2024

Published : 04 April 2024

DOI : https://doi.org/10.1007/s42979-024-02688-6

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Data augmentation
  • Document analysis
  • Historical manuscript dating
  • Self-organizing maps
  • Neural networks
  • Support vector machines

Advertisement

  • Find a journal
  • Publish with us
  • Track your research

IMAGES

  1. What is Data Analysis ?

    methods of data analysis in historical research

  2. A Step by Step Guide to Doing Historical Research

    methods of data analysis in historical research

  3. 5 Steps of the Data Analysis Process

    methods of data analysis in historical research

  4. PPT

    methods of data analysis in historical research

  5. PPT

    methods of data analysis in historical research

  6. 7 Steps of Data Analysis Process

    methods of data analysis in historical research

VIDEO

  1. Qualitative Data Analysis Procedures in Linguistics

  2. NVIVO 14 Training Day-13: Thematic & Content Analysis

  3. The United States’ Long History of destabilizing Nations

  4. QDA Software

  5. Qualitative Data Analysis

  6. Infinity Pool: Lifestyles of the Rich and the Famous

COMMENTS

  1. Historical Research

    Data Analysis Methods. Content analysis: This involves analyzing the content of written or visual material, such as books, newspapers, or photographs, to identify patterns and themes. Content analysis can be used to identify changes in cultural values and beliefs over time. ... Use of qualitative methods: Historical research often uses ...

  2. Historical Research Approaches to the Analysis of ...

    Historical research methods and approaches can improve understanding of the most appropriate techniques to confront data and test theories in internationalisation research. A critical analysis of all "texts" (sources), time series analyses, comparative methods across time periods and space, counterfactual analysis and the examination of outliers are shown to have the potential to improve ...

  3. Data Analysis

    This chapter provides illustrated guidelines for analyzing historical data in social work and social welfare. It argues that the analysis of historical data is really an interpretation, or a reinterpretation, of obtainable materials. In looking at history, researchers do not develop new data, but rather rearrange existing data according to a ...

  4. What are the Most Important Statistical Ideas of the Past 50 Years?

    There is a long history of such ideas using methods of path analysis, ... we can connect research in statistical methods to trends in the application of statistics within science and engineering. An entire series of articles could be written just on this topic. ... Statistical Analysis of Network Data: Methods and Models, New York: Springer ...

  5. Data analysis in oral history: A new approach in historical research

    Suggested data analysis method in oral history. In order to clarify the data obtained by the researcher through oral history interviews, a historical research methodology is required to produce a good narrative, along with the application of a proper analysis. For this purpose, a four-stage method is introduced and adopted.

  6. PDF Methods of Analysis Historical Analysis

    History-as-record consists of the documents and memorials pertaining to history-as-actuality on which written-history is or should be based. (p. 5) All three lend themselves to cliché yet despite this familiarity, or perhaps because of this, non-historians struggle with historical understanding and analysis. History teachers consistently

  7. Data Analysis in Research: Types & Methods

    Definition of research in data analysis: According to LeCompte and Schensul, research data analysis is a process used by researchers to reduce data to a story and interpret it to derive insights. The data analysis process helps reduce a large chunk of data into smaller fragments, which makes sense. Three essential things occur during the data ...

  8. A Pragmatic Guide to Qualitative Historical Analysis in the Study of

    The phrase "qualitative historical analysis" denotes a methodological approach that employs qualitative instead of quantitative measurement and the use of. primary historical documents or historians' interpretations thereof in service of. theory development and testing. This phrase is not meant to delineate a new.

  9. PDF Historical Analysis: Using the Past to Design the Future

    additional historical data from study participants. Finally, we describe how our in-home study was structured to leverage historical awareness. 3.1 Historical Analysis A history is an account of some past event or combination of events. Historical analysis is, therefore, a method of discovering, from records and accounts, what happened in the ...

  10. Chapter 16. Archival and Historical Research

    This chapter introduces archival methods of data collection. We begin by exploring in more detail why and when archival methods should be employed and with what limitations. We then discuss the importance of special collections and archives as potential gold mines for social science research.

  11. Comprehensive Guide to Historical Research Methods

    Historical research is a meticulous process that involves the exploration and analysis of past events, societies, and cultures. This comprehensive guide aims to assist researchers and historians in navigating the intricate landscape of historical research. From understanding different methodologies to utilizing archival resources effectively ...

  12. Methods of historical data analysis and criticism in historical research

    and evaluation of the content of historical documents. Methods of analysis and criticism of historical data Analysis and criticism of the form of historical data Formal criticism is aimed at determining the authenticity of historical data. When dealing with a historical document, the historian's first job is to determine whether the

  13. On the Methods and Methodologies of Historical Studies in ...

    The methods and methodologies associated with historical studies have reflected the changes in the larger field of history. Basic building blocks of historical inquiry, these methods allow historians to reconstruct past events, to corroborate their data, and to report key findings and historical analysis to define the field.

  14. Methods of historical data analysis and criticism in historical research

    historical research, historical sources play a decisive role in the quality of historical research products. History is reflected in historical sources and through historical documents, historians can learn about history. In order to obtain an objective and reliable source of historical data to reconstruct the past, historians must adhere to the principles and methods of analyzing and ...

  15. Preparing to Collect Historical Data

    Abstract. This chapter outlines the preliminary phases of collecting historical evidence for social work and social welfare history research. Data collection (or acquisition) can be characterized as a strategy — a course of action for assembling data from various sources and then enriching that data so as to create valuable and reusable ...

  16. Data analysis in oral history: A new approach in historical research

    Results: A systematic method of data analysis for oral history research, based on common qualitative data analysis methods, has been suggested as the result of this article. Conclusions: This new technique is equipped with measures that would assist qualitative researchers in the nursing field and other disciplines regarding analysis of ...

  17. What is data analysis? Methods, techniques, types & how-to

    A method of data analysis that is the umbrella term for engineering metrics and insights for additional value, direction, and context. By using exploratory statistical evaluation, data mining aims to identify dependencies, relations, patterns, and trends to generate advanced knowledge.

  18. Data Analysis for Historical Research: A Guide

    In order to apply data analysis to historical research, one should follow a few steps. First, define a research question that is clear, specific, and relevant to the topic and discipline. Then ...

  19. Historical data: where to find them, how to use them

    Abstract. The use of historical data has become a standard tool in economics, serving three main purposes: to examine the influence of the past on current economic outcomes; to use unique natural experiments to test modern economic theories; and to use modern economic theories to refine our understanding of important historical events.

  20. Home

    Historical research involves the following steps: Identify an idea, topic or research question. Conduct a background literature review. Refine the research idea and questions. Determine that historical methods will be the method used. Identify and locate primary and secondary data sources. Evaluate the authenticity and accuracy of source materials.

  21. Historical Data Sources

    Collection: Oxford Scholarship Online. Historical research sources, a category whose dimensions defy measurement, date back to pre-history and are continually evolving within and across nations, and beyond. The idea of beginning to investigate these sources can evoke strong personal reactions: excitement, frustration, astonishment.

  22. Historical Analysis

    Historical methods, ranging from research based on interview data and oral history to archival and textual materials, are important tools for understanding media policy. Historical analysis brings into focus larger and longer-term processes and patterns. It reveals power structures that elude more cursory analyses of contemporary policy debates.

  23. Research on shipping statistics method based on AIS big data mining

    In order to solve the problems of traditional shipping statistics, this paper puts forward the method of shipping statistics based on AIS big data, and gives complete technical process and technical scheme including big data platform construction, data access, data cleaning, data warehouse construction, navigation event analysis, voyage number mining and generation of statistical index and ...

  24. Performance Analysis of Handwritten Text Augmentation on ...

    The presented research extends previous research that explored the effects of data augmentation by elastic morphing on the dating of historical manuscripts. Linear support vector machines were trained on k-fold cross-validation on textural and grapheme-based features extracted from the Medieval Paleographical Scale, early Aramaic manuscripts ...