moon cycles

Time Series Analysis and Forecasting: Examples, Approaches, and Tools

  • 14 min read
  • Business ,   Data Science ,   UX Design
  • 21 Mar, 2022
  • 1 Comment Share

What are time series forecasting and analysis?

What is time series analysis, trends, seasons, cycles, and irregularities.

medicine sales graph

Source: Forecasting: Principles & Practice, Rob J Hyndman, 2014

Trends and seasonality are clearly visible

Time series forecasting and analysis: examples and uses cases

Demand forecasting for retail, procurement, and dynamic pricing, price prediction for customer-facing apps and better user experience.

price predictor

Source: Fareboom.com

The engine has 75 percent confidence that the fares will rise soon

Forecasting pandemic spread, diagnosis, and medication planning in healthcare

Anomaly detection for fraud detection, cyber security, and predictive maintenance.

case study on time series

Finding anomalies in time series data. Source: Neptune.ai

Approaches to time series forecasting

“Prediction is very difficult, especially if it’s about the future.”

Nils Bohr, Nobel laureate in Physics

adding stationarity

Bringing stationarity to data

Traditional machine learning methods

Stream learning approach, ensemble methods.

forecast workflow

Source: Our quest for robust time series forecasting at scale , Eric Tassone and Farzan Rohani, 2017, Forecast procedure in Google

Tools and services used for time series forecasting

Facebook’s prophet.

facebook prophet workflow

Source: Forecasting at Scale, Sean J. Taylor and Benjamin Letham, 2017

Google’s TensorFlow, BigQuery, and Vertex AI

Amazon forecast.

case study on time series

Amazon time series forecasting algorithms, compare. Source: AWS

Azure Time Series Insights for IoT Data

Time series forecasting will become more automated in the future.

Decoding Time Series Analysis: A Comprehensive Guide

Data Science

Date : 03/18/2024

Explore the fundamentals of time series analysis, including its definition, key components, and practical applications in forecasting trends and detecting outliers.

AUTHOR - FOLLOW Sabhyata Azad Manager, Data Science

AUTHOR - FOLLOW Soumyadeep Maiti Director, Data Science

Img-Reboot

Like the blog

Table of contents, components of time series:, key properties of time series data:, outliers in time series data:, example use case - using z-score method:, identifying and treating subsequent outliers:, identifying and treating real-life data outliers:, importance of domain knowledge:, conclusion:.

A time series is a sequence of data points collected and ordered chronologically over time. It is characterized by its indexing in time, distinguishing it from other types of datasets.

Time series metrics represent data tracked at regular intervals, such as inventory sold in a store from one day to the next. In investing, time series tracks the movement of chosen data points, like a security's price, over a specified time with regularly recorded intervals.

Statistically, time series data is analyzed in two primary ways: to draw conclusions about the impact of one or more variables on a specific variable of interest across time or to predict future trends.

The components of a time series include trends, seasonal variations, cyclic variations, and random or irregular movements. These elements collectively contribute to the pattern observed in the time series data. 

  • Sample Tracking Over Time: Time series data tracks a sample over successive periods, providing insights into the evolution of variables.
  • Influence of Factors: Time series allows observing factors influencing specific variables over time, contributing to a comprehensive understanding of patterns.
  • Use in Analysis: Time series analysis is valuable for examining how assets, securities, or economic variables change over time. It aids in both fundamental and technical analysis.
  • Forecasting Methods: Forecasting methods using time series data predict future values, contributing to decision-making processes in various fields.

In essence, time series data captures the evolution of variables over time, and its analysis, including forecasting methods, is instrumental in understanding and predicting patterns in diverse domains such as finance, economics, and inventory management. 

Types of outliers:

Anomalies in time series data, called outliers or novelties, represent data points or patterns that deviate significantly from the expected or normal behavior. These anomalies can arise for various reasons, including errors, unusual events, or changes in the underlying process being monitored. These can be categorized into two broad groups -  

Unwanted Data:

Unwanted data in time series refers to unintentional irregularities or disturbances that often stem from technical issues or errors in the data collection process. These anomalies are typically considered noise and can result from various sources, such as measurement errors, sensor malfunctions, or inaccuracies during data entry. Unwanted data is characterized by its randomness and lack of meaningful information. If not addressed appropriately, these anomalies can introduce time series analysis and interpretation inaccuracies. Detection and removal of unwanted data anomalies are essential, and statistical techniques and filtering methods are commonly employed to ensure the reliability of the time series data for further analysis. 

Event of Interest:

In contrast, events of interest involve purposeful deviations in the time series that are actively sought or considered highly meaningful. These anomalies represent occurrences that are specifically targeted for detection and analysis. Events of interest anomalies may be triggered by planned events, special occurrences, or phenomena that hold particular importance in the time series context. Unlike unwanted data anomalies, events of interest anomalies are often crucial for decision-making or gaining insights into the underlying processes being monitored. Detection methods for these anomalies are tailored to identify specific patterns or deviations deemed necessary, making them the focal point of the analysis. The intentional pursuit of events of interest anomalies is driven by the desire to uncover and understand specific occurrences within the time series data that have particular significance or relevance to the analysis or application at hand. 

1. Point Outlier in Time Series Data:

A point outlier in time series data refers to an individual data point significantly deviating from the expected or normal pattern within the series. This deviation can manifest as an unusually high or low value compared to the surrounding data points. Point outliers are often characterized by their isolated nature, representing a singular instance of divergence from the overall trend.

  • Isolation: Point outliers stand alone in their deviation from the expected pattern, isolated from neighboring data points.
  • Magnitude: These outliers exhibit a substantial magnitude of difference compared to the surrounding values.
  • Single Occurrence: Point outliers are typically singular occurrences, influencing only one specific data point within the time series.
  • Z-Score: Identify points with z-scores beyond a certain threshold.
  • Grubbs' Test: Apply statistical tests to detect outliers in univariate datasets.
  • Visual Inspection: Plotting the time series allows for visual identification of individual points deviating significantly from the overall trend.

2. Subsequent Outlier in Time Series Data:

A subsequent outlier in time series data refers to a pattern of anomalous behavior that extends beyond a single data point. Unlike point outliers, subsequent outliers involve a sequence of consecutively deviant data points, indicating a sustained deviation from the expected trend.

  • Sequential Nature: Subsequent outliers manifest as a sequence of data points that collectively deviate from the expected pattern.
  • Duration: These outliers may persist for an extended period, influencing a series of consecutive time points.
  • Aggregate Effect: The impact of subsequent outliers is cumulative, affecting the overall trend of the time series.
  • Moving Average or Exponential Smoothing: Identify deviations from smoothed trends that persist over multiple time points.
  • Change Point Detection Algorithms: Algorithms designed to detect shifts or changes in the underlying distribution of the time series.
  • Cluster Analysis: Identify clusters or patterns of anomalous behavior over consecutive time points.

Identifying and Treating Point Outliers: 

Identifying and treating point outliers in time series data is crucial for ensuring the accuracy and reliability of analyses. Here are a few methods to get started with.

1. Z-Score Method:

  • Calculate the z-score for each data point using the mean and standard deviation of the time series.
  • Identify points with z-scores beyond a certain threshold (e.g., 2 or 3) as potential outliers.
  • Once identified, assess the context and nature of the outlier.
  • Consider imputing the outlier with a more typical value or removing it, depending on the impact on the analysis and the underlying reasons for the outlier.

2. Tukey's Fences (Interquartile Range Method):

  • Calculate the interquartile range (IQR) of the time series.
  • Define lower and upper fences as Q1 - k * IQR and Q3 + k * IQR, respectively (typically, k is set to 1.5).
  • Identify points outside the fences as potential outliers.
  • Similar to the Z-Score method, assess the nature and context of the outlier.
  • Consider replacing or removing the outlier based on the impact and the goals of the analysis.

3. Moving Average or Exponential Smoothing:

  • Smooth the time series using a moving average or exponential smoothing technique.
  • Compare each data point with its smoothed value to identify deviations.
  • Points significantly deviating from the smoothed trend may be considered outliers.
  • Analyze the outliers in the context of the smoothed trend.
  • Adjust the time series by imputing or removing outliers, considering the impact on the overall analysis.

Consider a financial time series tracking daily stock prices. An unexpected surge in stock price might be identified using the Z-Score method:

  • The stock price experienced a sudden spike that seemed unusual compared to historical data.
  • Calculate the z-score for each daily stock price.
  • Identify days with z-scores exceeding a threshold (e.g., 2 or 3).
  • Investigate the context of the outlier (e.g., news, events).
  • If the surge is deemed anomalous and not justified by known factors, consider adjusting or removing the outlier in the analysis.

In this use case, the Z-Score method helps pinpoint days where stock prices deviate significantly from the expected behavior, enabling a more informed decision on how to treat these point outliers in the financial time series.

Identifying and treating subsequent outliers in time series data involves detecting patterns of sustained abnormal behavior. Here are a few that can be started with.

1. Moving Average or Exponential Smoothing:

  • Analyze deviations of each data point from the smoothed trend over consecutive time points.
  • Identify periods where the deviation persists as subsequent outliers.
  • Examine the duration and impact of the subsequent outliers.
  • Adjust the time series by imputing or removing the outliers based on their influence on the overall trend.

2. Change Point Detection Algorithms:

  • Utilize algorithms designed to detect shifts or changes in the underlying distribution of the time series.
  • Identify points where the distribution changes are significant, which indicates the presence of subsequent outliers.
  • Assess the context of detected change points and their impact on the time series.
  • Modify the time series by addressing or mitigating the effects of the subsequent outliers, depending on their significance.

3. Cluster Analysis:

  • Apply cluster analysis techniques to identify clusters or patterns of anomalous behavior over consecutive time points.
  • Identify clusters that represent sustained deviations from the expected pattern.
  • Examine the characteristics and duration of the identified clusters.
  • Adjust the time series by addressing or removing the clusters of subsequent outliers based on their impact on the analysis.

Example Use Case - using Moving Average or Exponential Smoothing:

Consider energy consumption time series data for a smart building. An extended period of unusually high energy consumption might be identified through moving average or exponential smoothing:

  • Energy consumption shows a sustained increase over several consecutive days.
  • Smooth the time series using a moving average or exponential smoothing.
  • Identify periods where the actual energy consumption consistently deviates from the smoothed trend.
  • Investigate the reasons behind the prolonged increase in energy consumption.
  • Modify the time series by addressing or mitigating the effects of the subsequent outliers, ensuring a more accurate representation of the building's energy usage over time.

In this use case, the moving average or exponential smoothing method helps identify and address a sustained increase in energy consumption, allowing for a more informed treatment of the subsequent outliers in the smart building's time series data.

Data can have any point, subsequent, or combined outliers in real life. Relying on a single methodology may often produce ineffective and unreliable outputs. An ensemble approach leverages the collective intelligence of multiple models, making it more adaptive to various patterns and ensuring a more accurate identification of anomalies. This synergistic collaboration improves the overall accuracy and increases the reliability of anomaly detection systems, making them better suited to handle the complexity and diversity of real-world time series datasets.

Importance of Domain Knowledge:Along with ensemble, there are methods in time series analysis that are versatile in their ability to identify point outliers and subsequent outliers. These methods are designed to capture deviations from the expected pattern, whether they occur as isolated data points or manifest as sustained abnormal behavior over consecutive time points. Moving Average or Exponential Smoothing and Change Point Detection Algorithms are two such methods, among many. 

In the realm of time series data analysis, the significance of domain knowledge cannot be overstated, particularly in the treatment of outliers. While statistical methods offer valuable insights, integrating domain expertise elevates the process by providing a contextual understanding of the subject matter. Domain experts bring a wealth of knowledge that aids in distinguishing genuine anomalies from expected variations, assessing data quality, and understanding the impact of outliers on analyses. Their insights extend to recognizing seasonal and cyclical patterns, incorporating awareness of external factors, and optimizing treatment strategies based on the specific goals and constraints within the domain. This collaborative approach ensures that the identification and treatment of outliers align with the intricacies of the domain, resulting in more informed and contextually relevant outlier treatment strategies that contribute to effective decision-making processes. 

In conclusion, a time series is a powerful and versatile tool for capturing the evolution of variables over time, providing insights into patterns, trends, and influencing factors. Its components collectively contribute to the observed pattern, including trends, seasonal variations, cyclic variations, and random movements. Time series data plays a pivotal role in various analyses, aiding in fundamental and technical approaches, and its forecasting methods contribute to decision-making processes in diverse fields. The presence of outliers in time series data, categorized into unwanted data anomalies and events of interest anomalies, necessitates careful identification and treatment. Point outliers, represented by individual data points deviating significantly from the expected pattern, and subsequent outliers, reflecting sustained abnormal behavior, require distinct methodologies for detection and treatment. Utilizing methods such as Z-Score, Tukey's Fences, Moving Average, Exponential Smoothing, Change Point Detection Algorithms, and Cluster Analysis allows a comprehensive approach to identify and treat outliers effectively. Moreover, acknowledging the real-life complexity of data, an ensemble of anomaly detection methods and versatile techniques capable of handling both point and subsequent outliers offer robust solutions for accurate analyses in diverse applications.

Detailed Case Study

Enabled Data-Ops on Cloud for a North American Telecom Giant

Learn how a Tredence client integrated all its data into a single data lake with our 4-phase migration approach, saving $50K/month! Reach out to us to know more.

Img-Reboot

MIGRATING LEGACY APPLICATIONS TO A MODERN SUPPLY CHAIN PLATFORM FOR A LEADING $15 BILLION WATER, SANITATION, AND INFECTION PREVENTION SOLUTIONS PROVIDER

Decoding anomaly detection in panel data.

our categories

Consumer Goods

Industrials

Travel & Hospitality

Ready to talk?

Join forces with our data science and AI leaders to navigate your toughest challenges.

recommended articles

Blog Image

Insights into Anomaly Detection Within Cross-Sectional Data

Blog Image

Navigating AI Transparency: Evaluating Explainable AI Systems for Reliable and Transparent AI

Blog Image

Reinforcement Learning: The Secret Sauce for Solving Complex Problems

Thank you for a like!

Stay informed and up-to-date with the most recent trends in data science and AI.

Share this article

case study on time series

We use cookies for the best user experience on our website, including to personalize content & offerings. By clicking “Accept Cookies” you agree to our use of cookies. For further details please visit our Privacy Policy and Cookies Policy

Financial Services

Travel and Hospitality

AI Consulting

Data Engineering

Supply Chain

Customer Analytics

CX Management

Industry X.0

Support Solutions

Supply Chain Control Tower

Test and Learn Platform (TALP)

On-shelf Availability (OSA)

Customer Cosmos

Revenue Growth Management (RGM)

Sustainability Analytics

Data & AI 101

Client success, life at tredence, csr framework, certifications, follow us on.

Advanced Epidemiological Analysis

Chapter 3 time series / case-crossover studies.

We’ll start by exploring common characteristics in time series data for environmental epidemiology. In the first half of the class, we’re focusing on a very specific type of study—one that leverages large-scale vital statistics data, collected at a regular time scale (e.g., daily), combined with large-scale measurements of a climate-related exposure, with the goal of estimating the typical relationship between the level of the exposure and risk of a health outcome. For example, we may have daily measurements of particulate matter pollution for a city, measured daily at a set of Environmental Protection Agency (EPA) monitors. We want to investigate how risk of cardiovascular mortality changes in the city from day to day in association with these pollution levels. If we have daily counts of the number of cardiovascular deaths in the city, we can create a statistical model that fits the exposure-response association between particulate matter concentration and daily risk of cardiovascular mortality. These statistical models—and the type of data used to fit them—will be the focus of the first part of this course.

3.1 Readings

The required readings for this chapter are:

  • Bhaskaran et al. ( 2013 ) Provides an overview of time series regression in environmental epidemiology.
  • Vicedo-Cabrera, Sera, and Gasparrini ( 2019 ) Provides a tutorial of all the steps for a projecting of health impacts of temperature extremes under climate change. One of the steps is to fit the exposure-response association using present-day data (the section on “Estimation of Exposure-Response Associations” in the paper). In this chapter, we will go into details on that step, and that section of the paper is the only required reading for this chapter. Later in the class, we’ll look at other steps covered in this paper. Supplemental material for this paper is available to download by clicking http://links.lww.com/EDE/B504 . You will need the data in this supplement for the exercises for class.

The following are supplemental readings (i.e., not required, but may be of interest) associated with the material in this chapter:

  • B. Armstrong et al. ( 2012 ) Commentary that provides context on how epidemiological research on temperature and health can help inform climate change policy.
  • Dominici and Peng ( 2008c ) Overview of study designs for studying climate-related exposures (air pollution in this case) and human health. Chapter in a book that is available online through the CSU library.
  • B. Armstrong ( 2006 ) Covers similar material as Bhaskaran et al. ( 2013 ) , but with more focus on the statistical modeling framework
  • Gasparrini and Armstrong ( 2010 ) Describes some of the advances made to time series study designs and statistical analysis, specifically in the context of temperature
  • Basu, Dominici, and Samet ( 2005 ) Compares time series and case-crossover study designs in the context of exploring temperature and health. Includes a nice illustration of different referent periods, including time-stratified.
  • B. G. Armstrong, Gasparrini, and Tobias ( 2014 ) This paper describes different data structures for case-crossover data, as well as how conditional Poisson regression can be used in some cases to fit a statistical model to these data. Supplemental material for this paper is available at https://bmcmedresmethodol.biomedcentral.com/articles/10.1186/1471-2288-14-122#Sec13 .
  • Imai et al. ( 2015 ) Typically, the time series study design covered in this chapter is used to study non-communicable health outcomes. This paper discusses opportunities and limitations in applying a similar framework for infectious disease.
  • Dominici and Peng ( 2008b ) Heavier on statistics. Describes some of the statistical challenges of working with time series data for air pollution epidemiology. Chapter in a book that is available online through the CSU library.
  • Lu and Zeger ( 2007 ) Heavier on statistics. This paper shows how, under conditions often common for environmental epidemiology studies, case-crossover and time series methods are equivalent.
  • Gasparrini ( 2014 ) Heavier on statistics. This provides the statistical framework for the distributed lag model for environmental epidemiology time series studies.
  • Dunn and Smyth ( 2018 ) Introduction to statistical models, moving into regression models and generalized linear models. Chapter in a book that is available online through the CSU library.
  • James et al. ( 2013 ) General overview of linear regression, with an R coding “lab” at the end to provide coding examples. Covers model fit, continuous, binary, and categorical covariates, and interaction terms. Chapter in a book that is available online through the CSU library.

3.2 Time series and case-crossover study designs

In the first half of this course, we’ll take a deep look at how researchers can study how environmental exposures and health risk are linked using time series studies . Let’s start by exploring the study design for this type of study, as well as a closely linked study design, that of case-crossover studies .

It’s important to clarify the vocabulary we’re using here. We’ll use the terms time series study and case-crossover study to refer specifically to a type of study common for studying air pollution and other climate-related exposures. However, both terms have broader definitions, particularly in fields outside environmental epidemiology. For example, a time series study more generally refers to a study where data is available for the same unit (e.g., a city) for multiple time points, typically at regularly-spaced times (e.g., daily). A variety of statistical methods have been developed to apply to gain insight from this type of data, some of which are currently rarely used in the specific fields of air pollution and climate epidemiology that we’ll explore here. For example, there are methods to address autocorrelation over time in measurements—that is, that measurements taken at closer time points are likely somewhat correlated—that we won’t cover here and that you won’t see applied often in environmental epidemiology studies, but that might be the focus of a “Time Series” course in a statistics or economics department.

In air pollution and climate epidemiology, time series studies typically begin with study data collected for an aggregated area (e.g., city, county, ZIP code) and with a daily resolution. These data are usually secondary data, originally collected by the government or other organizations through vital statistics or other medical records (for the health data) and networks of monitors for the exposure data. In the next section of this chapter, we’ll explore common characteristics of these data. These data are used in a time series study to investigate how changes in the daily level of the exposure is associated with risk of a health outcome, focusing on the short-term period. For example, a study might investigate how risk of respiratory hospitalization in a city changes in relationship with the concentration of particulate matter during the week or two following exposure. The study period for these studies is often very long (often a decade or longer), and while single-community time series studies can be conducted, many time series studies for environmental epidemiology now include a large set of communities of national or international scope.

The study design essentially compares a community with itself at different time points—asking if health risk tends to be higher on days when exposure is higher. By comparing the community to itself, the design removes many challenges that would come up when comparing one community to another (e.g., is respiratory hospitalization risk higher in city A than city B because particulate matter concentrations are typically higher in city A?). Communities differ in demographics and other factors that influence health risk, and it can be hard to properly control for these when exploring the role of environmental exposures. By comparison, demographics tend to change slowly over time (at least, compared to a daily scale) within a community.

One limitation, however, is that the study design is often best-suited to study acute effects, but more limited in studying chronic health effects. This is tied to the design and traditional ways of statistically modeling the resulting data. Since a community is compared with itself, the design removes challenges in comparing across communities, but it introduces new ones in comparing across time. Both environmental exposures and rates of health outcomes can have strong patterns over time, both across the year (e.g., mortality rates tend to follow a strong seasonal pattern, with higher rates in winter) and across longer periods (e.g., over the decade or longer of a study period). These patterns must be addressed through the statistical model fit to the time series data, and they make it hard to disentangle chronic effects of the exposure from unrelated temporal patterns in the exposure and outcome, and so most time series studies will focus on the short-term (or acute) association between exposure and outcome, typically looking at a period of at most about a month following exposure.

The term case-crossover study is a bit more specific than time series study , although there has been a strong movement in environmental epidemiology towards applying a specific version of the design, and so in this field the term often now implies this more specific version of the design. Broadly, a case-crossover study is one in which the conditions at the time of a health outcome are compared to conditions at other times that should otherwise (i.e., outside of the exposure of interest) be comparable. A case-crossover study could, for example, investigate the association between weather and car accidents by taking a set of car accidents and investigating how weather during the car accident compared to weather in the same location the week before.

One choice in a case-crossover study design is how to select the control time periods. Early studies tended to use a simple method for this—for example, taking the day before, or a day the week before, or some similar period somewhat close to the day of the outcome. As researchers applied the study design to large sets of data (e.g., all deaths in a community over multiple years), they noticed that some choices could create bias in estimates. As a result, most environmental epidemiology case-crossover studies now use a time-stratified approach to selecting control days. This selects a set of control days that typically include days both before and after the day of the health outcome, and are a defined set of days within a “stratum” that should be comparable in terms of temporal trends. For daily-resolved data, this stratum typically will include all the days within a month, year, and day of week. For example, one stratum of comparable days might be all the Mondays in January of 2010. These stratums are created throughout the study period, and then days are only compared to other days within their stratum (although, fortunately, there are ways you can apply a single statistical model to fit all the data for this approach rather than having to fit code stratum-by-stratum over many years).

When this is applied to data at an aggregated level (e.g., city, county, or ZIP code), it is in spirit very similar to a time series study design, in that you are comparing a community to itself at different time points. The main difference is that a time series study uses statistical modeling to control from potential confounding from temporal patterns, while a case-crossover study of this type instead controls for this potential confounding by only comparing days that should be “comparable” in terms of temporal trends, for example, comparing a day only to other days in the same month, year, and day of week. You will often hear that case-crossover studies therefore address potential confounding for temporal patterns “by design” rather than “statistically” (as in time series studies). However, in practice (and as we’ll explore in this class), in environmental epidemiology, case-crossover studies often are applied to aggregated community-level data, rather than individual-level data, with exposure assumed to be the same for everyone in the community on a given day. Under these assumptions, time series and case-crossover studies have been determined to be essentially equivalent (and, in fact, can use the same study data), only with slightly different terms used to control for temporal patterns in the statistical model fit to the data. Several interesting papers have been written to explore differences and similarities in these two study designs as applied in environmental epidemiology ( Basu, Dominici, and Samet 2005 ; B. G. Armstrong, Gasparrini, and Tobias 2014 ; Lu and Zeger 2007 ) .

These types of study designs in practice use similar datasets. In earlier presentations of the case-crossover design, these data would be set up a bit differently for statistical modeling. More recent work, however, has clarified how they can be modeled similarly to when using a time series study design, allowing the data to be set up in a similar way ( B. G. Armstrong, Gasparrini, and Tobias 2014 ) .

Several excellent commentaries or reviews are available that provide more details on these two study designs and how they have been used specifically investigate the relationship between climate-related exposures and health ( Bhaskaran et al. 2013 ; B. Armstrong 2006 ; Gasparrini and Armstrong 2010 ) . Further, these designs are just two tools in a wider collection of study designs that can be used to explore the health effects of climate-related exposures. Dominici and Peng ( 2008c ) provides a nice overview of this broader set of designs.

3.3 Time series data

Let’s explore the type of dataset that can be used for these time series–style studies in environmental epidemiology. In the examples in this chapter, we’ll be using data that comes as part of the Supplemental Material in one of this chapter’s required readings, ( Vicedo-Cabrera, Sera, and Gasparrini 2019 ) . Follow the link for the supplement for this article and then look for the file “lndn_obs.csv.” This is the file we’ll use as the example data in this chapter.

These data are saved in a csv format (that is, a plain text file, with commas used as the delimiter), and so they can be read into R using the read_csv function from the readr package (part of the tidyverse). For example, you can use the following code to read in these data, assuming you have saved them in a “data” subdirectory of your current working directory:

This example dataset shows many characteristics that are common for datasets for time series studies in environmental epidemiology. Time series data are essentially a sequence of data points repeatedly taken over a certain time interval (e.g., day, week, month etc). General characteristics of time series data for environmental epidemiology studies are:

  • Observations are given at an aggregated level. For example, instead of individual observations for each person in London, the obs data give counts of deaths throughout London. The level of aggregation is often determined by geopolitical boundaries, for example, counties or ZIP codes in the US.
  • Observations are given at regularly spaced time steps over a period. In the obs dataset, the time interval is day. Typically, values will be provided continuously over that time period, with observations for each time interval. Occasionally, however, the time series data may only be available for particular seasons (e.g., only warm season dates for an ozone study), or there may be some missing data on either the exposure or health outcome over the course of the study period.
  • Observations are available at the same time step (e.g., daily) for (1) the health outcome, (2) the environmental exposure of interest, and (3) potential time-varying confounders. In the obs dataset, the health outcome is mortality (from all causes; sometimes, the health outcome will focus on a specific cause of mortality or other health outcomes such as hospitalizations or emergency room visits). Counts are given for everyone in the city for each day ( all column), as well as for specific age categories ( all_0_64 for all deaths among those up to 64 years old, and so on). The exposure of interest in the obs dataset is temperature, and three metrics of this are included ( tmean , tmin , and tmax ). Day of the week is one time-varying factor that could be a confounder, or at least help explain variation in the outcome (mortality). This is included through the dow variable in the obs data. Sometimes, you will also see a marker for holidays included as a potential time-varying confounder, or other exposure variables (temperature is a potential confounder, for example, when investigating the relationship between air pollution and mortality risk).
  • Multiple metrics of an exposure and / or multiple health outcome counts may be included for each time step. In the obs example, three metrics of temperature are included (minimum daily temperature, maximum daily temperature, and mean daily temperature). Several counts of mortality are included, providing information for specific age categories in the population. The different metrics of exposure will typically be fit in separate models, either as a sensitivity analysis or to explore how exposure measurement affects epidemiological results. If different health outcome counts are available, these can be modeled in separate statistical models to determine an exposure-response function for each outcome.

3.4 Exploratory data analysis

When working with time series data, it is helpful to start with some exploratory data analysis. This type of time series data will often be secondary data—it is data that was previously collected, as you are re-using it. Exploratory data analysis is particularly important with secondary data like this. For primary data that you collected yourself, following protocols that you designed yourself, you will often be very familiar with the structure of the data and any quirks in it by the time you are ready to fit a statistical model. With secondary data, however, you will typically start with much less familiarity about the data, how it was collected, and any potential issues with it, like missing data and outliers.

Exploratory data analysis can help you become familiar with your data. You can use summaries and plots to explore the parameters of the data, and also to identify trends and patterns that may be useful in designing an appropriate statistical model. For example, you can explore how values of the health outcome are distributed, which can help you determine what type of regression model would be appropriate, and to see if there are potential confounders that have regular relationships with both the health outcome and the exposure of interest. You can see how many observations have missing data for the outcome, the exposure, or confounders of interest, and you can see if there are any measurements that look unusual. This can help in identifying quirks in how the data were recorded—for example, in some cases ground-based weather monitors use -99 or -999 to represent missing values, definitely something you want to catch and clean-up in your data (replacing with R’s NA for missing values) before fitting a statistical model!

The following applied exercise will take you through some of the questions you might want to answer through this type of exploratory analysis. In general, the tidyverse suite of R packages has loads of tools for exploring and visualizing data in R. The lubridate package from the tidyverse , for example, is an excellent tool for working with date-time data in R, and time series data will typically have at least one column with the timestamp of the observation (e.g., the date for daily data). You may find it worthwhile to explore this package some more. There is a helpful chapter in Wickham and Grolemund ( 2016 ) , https://r4ds.had.co.nz/dates-and-times.html , as well as a cheatsheet at https://evoldyn.gitlab.io/evomics-2018/ref-sheets/R_lubridate.pdf . For visualizations, if you are still learning techniques in R, two books you may find useful are Healy ( 2018 ) (available online at https://socviz.co/ ) and Chang ( 2018 ) (available online at http://www.cookbook-r.com/Graphs/ ).

Applied: Exploring time series data

Read the example time series data into R and explore it to answer the following questions:

  • What is the study period for the example obs dataset? (i.e., what dates / years are covered by the time series data?)
  • Are there any missing dates (i.e., dates with nothing recorded) within this time period? Are there any recorded dates where health outcome measurements are missing? Any where exposure measurements are missing?
  • Are there seasonal trends in the exposure? In the outcome?
  • Are there long-term trends in the exposure? In the outcome?
  • Is the outcome associated with day of week? Is the exposure associated with day of week?

Based on your exploratory analysis in this section, talk about the potential for confounding when these data are analyzed to estimate the association between daily temperature and city-wide mortality. Is confounding by seasonal trends a concern? How about confounding by long-term trends in exposure and mortality? How about confounding by day of week?

Applied exercise: Example code

In the obs dataset, the date of each observation is included in a column called date . The data type of this column is “Date”—you can check this by using the class function from base R:

Since this column has a “Date” data type, you can run some mathematical function calls on it. For example, you can use the min function from base R to get the earliest date in the dataset and the max function to get the latest.

You can also run the range function to get both the earliest and latest dates with a single call:

This provides the range of the study period for these data. One interesting point is that it’s not a round set of years—instead, the data ends during the summer of the last study year. This doesn’t present a big problem, but is certainly something to keep in mind if you’re trying to calculate yearly averages of any values for the dataset. If you’re getting the average of something that varies by season (e.g., temperature), it could be slightly weighted by the months that are included versus excluded in the partial final year of the dataset. Similarly, if you group by year and then count totals by year, the number will be smaller for the last year, since only part of the year’s included. For example, if you wanted to count the total deaths in each year of the study period, it will look like they go down a lot the last year, when really it’s only because only about half of the last year is included in the study period:

case study on time series

  • Are there any missing dates within this time period? Are there any recorded dates where health outcome measurements are missing? Any where exposure measurements are missing?

There are a few things you should check to answer this question. First (and easiest), you can check to see if there are any NA values within any of the observations in the dataset. This helps answer the second and third parts of the question. The summary function will provide a summary of the values in each column of the dataset, including the count of missing values ( NA s) if there are any:

Based on this analysis, all observations are complete for all dates included in the dataset. There are no listings for NA s for any of the columns, and this indicates no missing values in the dates for which there’s a row in the data.

However, this does not guarantee that every date between the start date and end date of the study period are included in the recorded data. Sometimes, some dates might not get recorded at all in the dataset, and the summary function won’t help you determine when this is the case. One common example in environmental epidemiology is with ozone pollution data. These are sometimes only measured in the warm season, and so may be shared in a dataset with all dates outside of the warm season excluded.

There are a few alternative explorations you can do to check this. Perhaps the easiest is to check the number of days between the start and end date of the study period, and then see if the number of observations in the dataset is the same:

This indicates that there is an observation for every date over the study period, since the number of observations should be one more than the time difference. In the next question, we’ll be plotting observations by time, and typically this will also help you see if there are large chunks of missing dates in the data.

You can use a simple plot to visualize patterns over time in both the exposure and the outcome. For example, the following code plots a dot for each daily temperature observation over the study period. The points are set to a smaller size ( size = 0.5 ) and plotted with some transparency ( alpha = 0.5 ) since there are so many observations.

case study on time series

There is (unsurprisingly) clear evidence here of a strong seasonal trend in mean temperature, with values typically lowest in the winter and highest in the summer.

You can plot the outcome variable in the same way:

case study on time series

Again, there are seasonal trends, although in this case they are inversed. Mortality tends to be highest in the winter and lowest in the summer. Further, the seasonal pattern is not equally strong in all years—some years it has a much higher winter peak, probably in conjunction with severe influenza seasons.

Another way to look for seasonal trends is with a heatmap-style visualization, with day of year along the x-axis and year along the y-axis. This allows you to see patterns that repeat around the same time of the year each year (and also unusual deviations from normal seasonal patterns).

For example, here’s a plot showing temperature in each year, where the observations are aligned on the x-axis by time in year. We’re using the doy —which stands for “day of year” (i.e., Jan 1 = 1; Jan 2 = 2; … Dec 31 = 365 as long as it’s not a leap year) as the measure of time in the year. We’ve reversed the y-axis so that the earliest years in the study period start at the top of the visual, then later study years come later—this is a personal style, and it would be no problem to leave the y-axis as-is. We’ve used the viridis color scale for the fill, since that has a number of features that make it preferable to the default R color scale, including that it is perceptible for most types of color blindness and be printed out in grayscale and still be correctly interpreted.

case study on time series

From this visualization, you can see that temperatures tend to be higher in the summer months and lower in the winter months. “Spells” of extreme heat or cold are visible—where extreme temperatures tend to persist over a period, rather than randomly fluctuating within a season. You can also see unusual events, like the extreme heat wave in the summer of 2003, indicated with the brightest yellow in the plot.

We created the same style of plot for the health outcome. In this case, we focused on mortality among the oldest age group, as temperature sensitivity tends to increase with age, so this might be where the strongest patterns are evident.

case study on time series

For mortality, there tends to be an increase in the winter compared to the summer. Some winters have stretches with particularly high mortality—these are likely a result of seasons with strong influenza outbreaks. You can also see on this plot the impact of the 2003 heat wave on mortality among this oldest age group—an unusual spot of light green in the summer.

Some of the plots we created in the last section help in exploring this question. For example, the following plot shows a clear pattern of decreasing daily mortality counts, on average, over the course of the study period:

case study on time series

It can be helpful to add a smooth line to help detect these longer-term patterns, which you can do with geom_smooth :

case study on time series

You could also take the median mortality count across each year in the study period, although you should take out any years without a full year’s worth of data before you do this, since there are seasonal trends in the outcome:

case study on time series

Again, we see a clear pattern of decreasing mortality rates in this city over time. This means we need to think carefully about long-term time patterns as a potential confounder. It will be particularly important to think about this if the exposure also has a strong pattern over time. For example, air pollution regulations have meant that, in many cities, there may be long-term decreases in pollution concentrations over a study period.

The data already has day of week as a column in the data ( dow ). However, this is in a character data type, so it doesn’t have the order of weekdays encoded (e.g., Monday comes before Tuesday). This makes it hard to look for patterns related to things like weekend / weekday.

We could convert this to a factor and encode the weekday order when we do it, but it’s even easier to just recreate the column from the date column. We used the wday function from the lubridate package to do this—it extracts weekday as a factor, with the order of weekdays encoded (using a special “ordered” factor type):

We looked at the mean, median, and 25th and 75th quantiles of the mortality counts by day of week:

Mortality tends to be a bit higher on weekdays than weekends, but it’s not a dramatic difference.

We did the same check for temperature:

In this case, there does not seem to be much of a pattern by weekday.

You can also visualize the association using boxplots:

case study on time series

You can also try violin plots—these show the full distribution better than boxplots, which only show quantiles.

case study on time series

All these reinforce that there are some small differences in weekend versus weekday patterns for mortality. There isn’t much pattern by weekday with temperature, so in this case weekday is unlikely to be a confounder (the same is not true with air pollution, which often varies based on commuting patterns and so can have stronger weekend/weekday differences). However, since it does help some in explaining variation in the health outcome, it might be worth including in our models anyway, to help reduce random noise.

Exploratory data analysis is an excellent tool for exploring your data before you begin fitting a statistical model, and you should get in the habit of using it regularly in your research. Dominici and Peng ( 2008a ) provides another walk-through of exploring this type of data, including some more advanced tools for exploring autocorrelation and time patterns.

3.5 Statistical modeling for a time series study

Now that we’ve explored the data typical of a time series study in climate epidemiology, we’ll look at how we can fit a statistical model to those data to gain insight into the relationship between the exposure and acute health effects. Very broadly, we’ll be using a statistical model to answer the question: How does the relative risk of a health outcome change as the level of the exposure changes, after controlling for potential confounders?

In the rest of this chapter and the next chapter, we’ll move step-by-step to build up to the statistical models that are now typically used in these studies. Along the way, we’ll discuss key components and choices in this modeling process. The statistical modeling is based heavily on regression modeling, and specifically generalized linear regression. To help you get the most of this section, you may find it helpful to review regression modeling and generalized linear models. Some resources for that include Dunn and Smyth ( 2018 ) and James et al. ( 2013 ) .

One of the readings for this week, Vicedo-Cabrera, Sera, and Gasparrini ( 2019 ) , includes a section on fitting exposure-response functions to describe the association between daily mean temperature and mortality risk. This article includes example code in its supplemental material, with code for fitting the model to these time series data in the file named “01EstimationERassociation.r.” Please download that file and take a look at the code.

The model in the code may at first seem complex, but it is made up of a number of fairly straightforward pieces (although some may initially seem complex):

  • The model framework is a generalized linear model (GLM)
  • This GLM is fit assuming an error distribution and a link function appropriate for count data
  • The GLM is fit assuming an error distribution that is also appropriate for data that may be overdispersed
  • The model includes control for day of the week by including a categorical variable
  • The model includes control for long-term and seasonal trends by including a spline (in this case, a natural cubic spline ) for the day in the study
  • The model fits a flexible, non-linear association between temperature and mortality risk, also using a spline
  • The model fits a flexible non-linear association between temperature on a series of preceeding days and current day and mortality risk on the current day using a distributed lag approach
  • The model jointly describes both of the two previous non-linear associations by fitting these two elements through one construct in the GLM, a cross-basis term

In this section and the next chapter, we will work through the elements, building up the code to get to the full model that is fit in Vicedo-Cabrera, Sera, and Gasparrini ( 2019 ) .

Fitting a GLM to time series data

The generalized linear model (GLM) framework unites a number of types of regression models you may have previously worked with. One basic regression model that can be fit within this framework is a linear regression model. However, the framework also allows you to also fit, among others, logistic regression models (useful when the outcome variable can only take one of two values, e.g., success / failure or alive / dead) and Poisson regression models (useful when the outcome variable is a count or rate). This generalized framework brings some unity to these different types of regression models. From a practical standpoint, it has allowed software developers to easily provide a common interface to fit these types of models. In R, the common function call to fit GLMs is glm .

Within the GLM framework, the elements that separate different regression models include the link function and the error distribution. The error distribution encodes the assumption you are enforcing about how the errors after fitting the model are distributed. If the outcome data are normally distributed (a.k.a., follow a Gaussian distribution), after accounting for variance explained in the outcome by any of the model covariates, then a linear regression model may be appropriate. For count data—like numbers of deaths a day—this is unlikely, unless the average daily mortality count is very high (count data tend to come closer to a normal distribution the further their average gets from 0). For binary data—like whether each person in a study population died on a given day or not—normally distributed errors are also unlikely. Instead, in these two cases, it is typically more appropriate to fit GLMs with Poisson and binomial “families,” respectively, where the family designation includes an appropriate specification for the variance when fitting the model based on these outcome types.

The other element that distinguishes different types of regression within the GLM framework is the link function. The link function applies a transformation on the combination of independent variables in the regression equation when fitting the model. With normally distributed data, an identity link is often appropriate—with this link, the combination of independent variables remain unchanged (i.e., keep their initial “identity”). With count data, a log link is often more appropriate, while with binomial data, a logit link is often used.

Finally, data will often not perfectly adhere to assumptions. For example, the Poisson family of GLMs assumes that variance follows a Poisson distribution (The probability mass function for Poisson distribution \(X \sim {\sf Poisson}(\mu)\) is denoted by \(f(k;\mu)=Pr[X=k]= \displaystyle \frac{\mu^{k}e^{-\mu}}{k!}\) , where \(k\) is the number of occurences, and \(\mu\) is equal to the expected number of cases). With this distribution, the variance is equal to the mean ( \(\mu=E(X)=Var(X)\) ). With real-life data, this assumption is often not valid, and in many cases the variance in real life count data is larger than the mean. This can be accounted for when fitting a GLM by setting an error distribution that does not require the variance to equal the mean—instead, both a mean value and something like a variance are estimated from the data, assuming an overdispersion parameter \(\phi\) so that \(Var(X)=\phi E(X)\) . In environmental epidemiology, time series are often fit to allow for this overdispersion. This is because if the data are overdispersed but the model does not account for this, the standard errors on the estimates of the model parameters may be artificially small. If the data are not overdispersed ( \(\phi=1\) ), the model will identify this when being fit to the data, so it is typically better to prefer to allow for overdispersion in the model (if the size of the data were small, you may want to be parsimonious and avoid unneeded complexity in the model, but this is typically not the case with time series data).

In the next section, you will work through the steps of developing a GLM to fit the example dataset obs . For now, you will only fit a linear association between mean daily temperature and mortality risk, eventually including control for day of week. In later work, especially the next chapter, we will build up other components of the model, including control for the potential confounders of long-term and seasonal patterns, as well as advancing the model to fit non-linear associations, distributed by time, through splines, a distributed lag approach, and a cross-basis term.

Applied: Fitting a GLM to time series data

In R, the function call used to fit GLMs is glm . Most of you have likely covered GLMs, and ideally this function call, in previous courses. If you are unfamiliar with its basic use, you will want to refresh yourself on this topic—you can use some of the resources noted earlier in this section and in the chapter’s “Supplemental Readings” to do so.

  • Fit a GLM to estimate the association between mean daily temperature (as the independent variable) and daily mortality count (as the dependent variable), first fitting a linear regression. (Since the mortality data are counts, we will want to shift to a different type of regression within the GLM framework, but this step allows you to develop a simple glm call, and to remember where to include the data and the independent and dependent variables within this function call.)
  • Change your function call to fit a regression model in the Poisson family.
  • Change your function call to allow for overdispersion in the outcome data (daily mortality count). How does the estimated coefficient for temperature change between the model fit for #2 and this model? Check both the central estimate and its estimated standard error.
  • Change your function call to include control for day of week.
  • Fit a GLM to estimate the association between mean daily temperature (as the independent variable) and daily mortality count (as the dependent variable), first fitting a linear regression.

This is the model you are fitting:

\(Y_{t}=\beta_{0}+\beta_{1}X1_{t}+\epsilon\)

where \(Y_{t}\) is the mortality count on day \(t\) , \(X1_{t}\) is the mean temperature for day \(t\) and \(\epsilon\) is the error term. Since this is a linear model we are assuming a Gaussian error distribution \(\epsilon \sim {\sf N}(0, \sigma^{2})\) , where \(\sigma^{2}\) is the variance not explained by the covariates (here just temperature).

To do this, you will use the glm call. If you would like to save model fit results to use later, you assign the output a name as an R object ( mod_linear_reg in the example code). If your study data are in a dataframe, you can specify these data in the glm call with the data parameter. Once you do this, you can use column names directly in the model formula. In the model formula, the dependent variable is specified first ( all , the column for daily mortality counts for all ages, in this example), followed by a tilde ( ~ ), followed by all independent variables (only tmean in this example). If multiple independent variables are included, they are joined using + . We’ll see an example when we start adding control for confounders later.

Once you have fit a model and assigned it to an R object, you can explore it and use resulting values. First, the print method for a regression model gives some summary information. This method is automatically called if you enter the model object’s name at the console:

More information is printed if you run the summary method on the model object:

Make sure you are familiar with the information provided from the model object, as well as how to interpret values like the coefficient estimates and their standard errors and p-values. These basic elements should have been covered in previous coursework (even if a different programming language was used to fit the model), and so we will not be covering them in great depth here, but instead focusing on some of the more advanced elements of how regression models are commonly fit to data from time series and case-crossover study designs in environmental epidemiology. For a refresher on the basics of fitting statistical models in R, you may want to check out Chapters 22 through 24 of Wickham and Grolemund ( 2016 ) , a book that is available online, as well as Dunn and Smyth ( 2018 ) and James et al. ( 2013 ) .

Finally, there are some newer tools for extracting information from model fit objects. The broom package extracts different elements from these objects and returns them in a “tidy” data format, which makes it much easier to use the output further in analysis with functions from the “tidyverse” suite of R packages. These tools are very popular and powerful, and so the broom tools can be very useful in working with output from regression modeling in R.

The broom package includes three main functions for extracting data from regression model objects. First, the glance function returns overall data about the model fit, including the AIC and BIC:

The tidy function returns data at the level of the model coefficients, including the estimate for each model parameter, its standard error, test statistic, and p-value.

Finally, the augment function returns data at the level of the original observations, including the fitted value for each observation, the residual between the fitted and true value, and some measures of influence on the model fit.

One way you can use augment is to graph the fitted values for each observation after fitting the model:

case study on time series

For more on the broom package, including some excellent examples of how it can be used to streamline complex regression analyses, see Robinson ( 2014 ) . There is also a nice example of how it can be used in one of the chapters of Wickham and Grolemund ( 2016 ) , available online at https://r4ds.had.co.nz/many-models.html .

A linear regression is often not appropriate when fitting a model where the outcome variable provides counts, as with the example data, since such data often don’t follow a normal distribution. A Poisson regression is typically preferred.

For a count distribution were \(Y \sim {\sf Poisson(\mu)}\) we typically fit a model such as

\(g(Y)=\beta_{0}+\beta_{1}X1\) , where \(g()\) represents the link function, in this case a log function so that \(log(Y)=\beta_{0}+\beta_{1}X1\) . We can also express this as \(Y=exp(\beta_{0}+\beta_{1}X1)\) .

In the glm call, you can specify this with the family parameter, for which “poisson” is one choice.

One thing to keep in mind with this change is that the model now uses a non-identity link between the combination of independent variable(s) and the dependent variable. You will need to keep this in mind when you interpret the estimates of the regression coefficients. While the coefficient estimate for tmean from the linear regression could be interpreted as the expected increase in mortality counts for a one-unit (i.e., one degree Celsius) increase in temperature, now the estimated coefficient should be interpreted as the expected increase in the natural log-transform of mortality count for a one-unit increase in temperature.

You can see this even more clearly if you take a look at the association between temperature for each observation and the expected mortality count fit by the model. First, if you look at the fitted values without transforming, they will still be in a state where mortality count is log-transformed. You can see by looking at the range of the y-scale that these values are for the log of expected mortality, rather than expected mortality (compare, for example, to the similar plot shown from the first model, which was linear), and that the fitted association for that transformation , not for untransformed mortality counts, is linear:

case study on time series

You can use exponentiation to transform the fitted values back to just be the expected mortality count based on the model fit. Once you make this transformation, you can see how the link in the Poisson family specification enforced a curved relationship between mean daily temperature and the untransformed expected mortality count.

case study on time series

For this model, we can interpret the coefficient for the temperature covariate as the expected log relative risk in the health outcome associated with a one-unit increase in temperature. We can exponentiate this value to get an estimate of the relative risk:

If you want to estimate the confidence interval for this estimate, you should calculate that before exponentiating.

In the R glm call, there is a family that is similar to Poisson (including using a log link), but that allows for overdispersion. You can specify it with the “quasipoisson” choice for the family parameter in the glm call:

When you use this family, there will be some new information in the summary for the model object. It will now include a dispersion parameter ( \(\phi\) ). If this is close to 1, then the data were close to the assumed variance for a Poisson distribution (i.e., there was little evidence of overdispersion). In the example, the overdispersion is around 5, suggesting the data are overdispersed (this might come down some when we start including independent variables that explain some of the variation in the outcome variable, like long-term and seasonal trends).

If you compare the estimates of the temperature coefficient from the Poisson regression with those when you allow for overdispersion, you’ll see something interesting:

The central estimate ( estimate column) is very similar. However, the estimated standard error is larger when the model allows for overdispersion. This indicates that the Poisson model was too simple, and that its inherent assumption that data were not overdispersed was problematic. If you naively used a Poisson regression in this case, then you would estimate a confidence interval on the temperature coefficient that would be too narrow. This could cause you to conclude that the estimate was statistically significant when you should not have (although in this case, the estimate is statistically significant under both models).

Day of week is included in the data as a categorical variable, using a data type in R called a factor. You are now essentially fitting this model:

\(log(Y)=\beta_{0}+\beta_{1}X1+\gamma^{'}X2\) ,

where \(X2\) is a categorical variable for day of the week and \(\gamma^{'}\) represents a vector of parameters associated with each category.

It is pretty straightforward to include factors as independent variables in calls to glm : you just add the column name to the list of other independent variables with a + . In this case, we need to do one more step: earlier, we added order to dow , so it would “remember” the order of the week days (Monday before Tuesday, etc.). However, we need to strip off this order before we include the factor in the glm call. One way to do this is with the factor call, specifying ordered = FALSE . Here is the full call to fit this model:

When you look at the summary for the model object, you can see that the model has fit a separate model parameter for six of the seven weekdays. The one weekday that isn’t fit (Sunday in this case) serves as a baseline —these estimates specify how the log of the expected mortality count is expected to differ on, for example, Monday versus Sunday (by about 0.03), if the temperature is the same for the two days.

You can also see from this summary that the coefficients for the day of the week are all statistically significant. Even though we didn’t see a big difference in mortality counts by day of week in our exploratory analysis, this suggests that it does help explain some variance in mortality observations and will likely be worth including in the final model.

The model now includes day of week when fitting an expected mortality count for each observation. As a result, if you plot fitted values of expected mortality versus mean daily temperature, you’ll see some “hoppiness” in the fitted line:

case study on time series

This is because each fitted value is also incorporating the expected influence of day of week on the mortality count, and that varies across the observations (i.e., you could have two days with the same temperature, but different expected mortality from the model, because they occur on different days).

If you plot the model fits separately for each day of the week, you’ll see that the line is smooth across all observations from the same day of the week:

case study on time series

Wrapping up

At this point, the coefficient estimates suggests that risk of mortality tends to decrease as temperature increases. Do you think this is reasonable? What else might be important to build into the model based on your analysis up to this point?

Timescale is a reliable PostgreSQL cloud optimized for your business workloads.

Time series and analytics

PostgreSQL, but faster. Built for lightning-fast ingest and querying of time-based data.

Vector (AI/ML)

PostgreSQL engineered for fast search with high recall on millions of vector embeddings.

Dynamic PostgreSQL

PostgreSQL managed services with the benefits of serverless, but none of the problems.

Industries that rely on us

Timescale benchmarks

We're in your corner even during the trial phase. Contact us to discuss your use case with a Timescale technical expert.

Timescale Docs

Start using and integrating Timescale for your demanding data needs.

Timescale is PostgreSQL, but faster. Learn the PostgreSQL basics and scale your database performance to new heights.

Subscribe to the Timescale Newsletter

By submitting, you acknowledge Timescale's Privacy Policy

The Ultimate Guide to Time-Series Analysis (With Examples and Applications)

What is time-series analysis.

Time-series analysis is a statistical technique that deals with time-series data, or trend analysis. It involves the identification of patterns, trends, seasonality, and irregularities in the data observed over different time periods. This method is particularly useful for understanding the underlying structure and pattern of the data.

When performing time-series analysis, you will use a mathematical set of tools to look into time-series data and learn not only what happened but also when and why it happened.

Time-series analysis vs. time-series forecasting

While both time-series analysis and time-series forecasting are powerful tools that developers can harness to glean insights from data over time, they each have specific strengths, limitations, and applications.

Time-series analysis isn't about predicting the future; instead, it's about understanding the past. It allows developers to decompose data into its constituent parts—trend, seasonality, and residual components. This can help identify any anomalies or shifts in the pattern over time.

Key methodologies

Key methodologies used in time-series analysis include moving averages, exponential smoothing, and decomposition methods. Methods such as Autoregressive Integrated Moving Average (ARIMA) models also fall under this category—but more on that later.

On the other hand, time-series forecasting uses historical data to make predictions about future events. The objective here is to build a model that captures the underlying patterns and structures in the time-series data to predict future values of the series.

‌‌Use Cases for Time-Series Analysis

The “time” element in time-series data means that the data is ordered by time. Time series data refers to a sequence of data points or observations recorded at specific time intervals. This data type is commonly used in various fields to analyze trends, patterns, and behaviors over time. Check out our earlier blog post to learn more and see examples of time-series data .

A typical example of time-series data is stock prices or a stock market index. However, even if you’re not into financial and algorithmic trading, you probably interact daily with time-series data.

A time series analysis graph with the bitcoin price in USD

Here are some other examples of time-series data that can be used for time-series analysis:

  • IoT and sensor data : Monitoring and analyzing sensor data from devices, machinery, or infrastructure to predict maintenance needs, optimize performance, and detect anomalies.
  • Weather forecasting : Utilizing historical weather data to forecast future meteorological conditions, such as temperature, precipitation, and wind patterns.
  • E-commerce and retail : Tracking sales data over time to identify seasonal trends, forecast demand, and optimize inventory management and pricing strategies.
  • Healthcare : Analyzing patient vital signs, medical records, and treatment outcomes to improve healthcare delivery, disease surveillance, and patient care.
  • Energy consumption : Studying electricity or energy usage patterns to optimize consumption, forecast demand, and support energy efficiency initiatives.
  • Manufacturing and supply chain : Monitoring production processes, inventory levels, and supply chain data to enhance operational efficiency and demand forecasting.
  • Web traffic and user behavior : Analyzing website traffic, user engagement metrics, and customer behavior patterns to enhance digital marketing strategies and user experience.

As you can see, time-series data is part of many of your daily interactions, whether you're driving your car through a digital toll or receiving smartphone notifications about the weather forecast or that you should walk more. If you're working with observability, monitoring different systems to track their performance and ensure they run smoothly, you're also working with time-series data. And if you have a website where you track customer or user interactions (event data), guess what? You're also a time-series analysis use case.

To illustrate this in more detail, let’s look at the example of health apps—we'll refer back to this example throughout this blog post.

A Real-World Example of Time-Series Analysis

If you open a health app on your phone, you will see all sorts of categories, from step count to noise level or heart rate. By clicking on “show all data” in any of these categories, you will get an almost endless scroll (depending on when you bought the phone) of step counts, which were timestamped when the data was sampled. ‌‌

A smartphone screen representing a step count app, a real use case for time series analysis

This is the raw data of the step count time series. Remember, this is just one of many parameters sampled by your smartphone or smartwatch. While many parameters don’t mean much to most people (yes, I’m looking at you, heart rate variability), when combined with other data, these parameters can give you estimations on overall quantifiers, such as cardio fitness. ‌‌

To achieve this, you need to connect the time-series data into one large dataset with two identifying variables—time and type of measurement. This is called panel data . Separating it by type gives you multiple time series, while picking one particular point in time gives you a snapshot of everything about your health at a specific moment, like what was happening at 7:45 a.m.

Why Should You Use Time-Series Analysis?

Now that you’re more familiar with time-series data, you may wonder what to do with it and why you should care. So far, we’ve been mostly just reading off data—how many steps did I take yesterday? Is my heart rate okay?

But time-series analysis can help us answer more complex or future-related questions, such as forecasting. When did I stop walking and catch the bus yesterday? Is exercise making my heart stronger?

To answer these, we need more than just reading the step counter at 7:45 a.m.—we need time-series analysis. Time-series analysis happens when we consider part or the entire time series to see the “bigger picture.” We can do this manually in straightforward cases: for example, by looking at the graph that shows the days when you took more than 10,000 steps this month. ‌‌

But if you wanted to know how often this occurs or on which days, that would be significantly more tedious to do by hand. Very quickly, we bump into problems that are too complex to tackle without using a computer, and once we have opened that door, a seemingly endless stream of opportunities emerges. We can analyze everything, from ourselves to our business, and make them far more efficient and productive than ever.

T‌‌he four components of time-series analysis

To correctly analyze time-series data, we need to look to the four components of a time series:

  • Trend : this is a long-term movement of the time series, such as the decreasing average heart rate of workouts as a person gets fitter.
  • Seasonality : regular periodic occurrences within a time interval smaller than a year (e.g., higher step count in spring and autumn because it’s not too cold or too hot for long walks).
  • Cyclicity : repeated fluctuations around the trend that are longer in duration than irregularities but shorter than what would constitute a trend. In our walking example, this would be a one-week sightseeing holiday every four to five months.
  • Irregularity : short-term irregular fluctuations or noise, such as a gap in the sampling of the pedometer or an active team-building day during the workweek.

Time-series analysis visualization with example data in black, trend in red, and the trend with seasonality in blue

Let’s go back to our health app example. One thing you may see immediately, just by looking at a time-series analysis chart, is whether your stats are trending upward or downward. That indicates whether your stats are generally improving or not. By ignoring the short-term variations, it's easier to see if the values rise or decline within a given time range. This is the first of the four components of a time series—trend.

Limitations of Time-Series Analysis

If you’re performing time-series analysis, it can be helpful to decompose it into these four elements to explain results and make predictions. Trend and seasonality are deterministic, whereas cyclicity and irregularities are not.

Therefore, you first need to eliminate random events to know what can be understood and predicted. Nothing is perfect, and to be able to capture the full power of time-series analysis without abusing the technique and obtaining incorrect results and conclusions, it’s essential to address and understand its limitations. ‌‌

Sample sets

Generalizations from a single or small sample of subjects must be made very carefully (e.g., finding the time a customer is most likely running requires analyzing the run frequencies of many customers). Predicting future values may be impossible if the data hasn’t been prepared well, and even then, there can always be new irregularities in the future. ‌‌

Forecasting is usually only stable when you consider the near future. Remember how inaccurate the weather forecast can be when you look it up 10 days in advance. Time-series analysis will never allow you to make exact predictions, only probability distributions of specific values. For example, you can never be sure that a health app user will take more than 10,000 steps on Sunday, only that it is highly likely that they will do it or that you’re 95 % certain they will.

‌‌Types of Time-Series Analysis

Time to dive deeper into how time-series analysis can extract information from time-series data. To do this, let’s divide time-series analysis into five distinct types.

Exploratory analysis

An exploratory analysis is helpful when you want to describe what you see and explain why you see it in a given time series. It essentially entails decomposing the data into trend, seasonality, cyclicity, and irregularities. ‌‌

Once the series is decomposed, we can explain what each component represents in the real world and even, perhaps, what caused it. This is not as easy as it may seem and often involves spectral decomposition to find any specific frequencies of recurrences and autocorrelation analysis to see if current values depend on past values.

Decomposition of a used car sales data set:: four line graphs representing observed, trend, seasonal, and residual data

Curve fitting

Since time series is a discrete set, you can always tell exactly how many data points it contains. But what if you want to know the value of your time-series parameter at a point in time that is not covered by your data? ‌‌

To answer this question, we have to supplement our data with a continuous set—a curve. You can do this in several ways, including interpolation and regression. The former is an exact match for parts of the given time series and is mostly useful for estimating missing data points. On the other hand, the latter is a “best-fit” curve, where you have to make an educated guess about the form of the function to be fitted (e.g., linear) and then vary the parameters until your best-fit criteria are satisfied. ‌‌

Simple linear regression model example

What constitutes a “best-fit” situation depends on the desired outcome and the particular problem. Using regression analysis, you also obtain the best-fit function parameters that can have real-world meaning, for example, post-run heart rate recovery as an exponential decay fit parameter. In regression, we get a function that describes the best fit to our data even beyond the last record opening the door to extrapolation predictions.

Forecasting

Statistical inference is the process of generalization from sample to whole. It can be done over time in time-series data, giving way to future predictions or forecasting: from extrapolating regression models to more advanced techniques using stochastic simulations and machine learning. If you want to know more, check out our article about time-series forecasting .

A line graph predicting taxicab pickups in Times Square with TimescaleDB (source)

Classification and segmentation

Time-series classification is the process of identifying the categories or classes of an outcome variable based on time-series data. In other words, it's about associating each time-series data with one label or class.

For instance, you might use time-series classification to categorize server performance into 'Normal' or 'Abnormal' based on CPU usage data collected over time. The goal here is to create a model that can accurately predict the class of new, unseen time-series data.

Classification models commonly used include decision trees, nearest neighbor classifiers, and deep learning models. These models can handle the temporal dependencies present in time-series data, making them ideal for this task.

Time-series segmentation , on the other hand, involves breaking down a time series into a series of segments, each representing a specific event or state. The objective is to simplify the time-series data by representing it as a sequence of more manageable segments.

For example, in analyzing website traffic data, you might segment the data into periods of 'High,' 'Medium,' and 'Low' activity. This segmentation can provide simpler, more interpretable insights into your data.

Segmentation methods can be either top-down, where the entire series is divided into segments, or bottom-up, where individual data points are merged into segments. Each method has its strengths and weaknesses, and the choice depends on the nature of your data and your specific requirements.

As you may have already guessed, problems rarely require just one type of analysis. Still, it is crucial to understand the various types to appreciate each aspect of the problem correctly and formulate a good strategy for addressing it.

Visualization and Examples—Run, Overlapping, and Separated Charts

There are many ways to visualize a time series and certain types of its analysis . A run chart is the most common choice for simple time series with one parameter, essentially just data points connected by lines. ‌‌

However, there are usually several parameters you would like to visualize at once. You have two options in this case: overlapping or separated charts. Overlapping charts display multiple series on a single pane, whereas separated charts show individual series in smaller, stacked, and aligned charts, as seen below.

Time-series analysis - overlapping chart with two y-axes

Let’s take a look at three different real-world examples illustrating what we’ve learned so far. To keep things simple and best demonstrate the analysis types, the following examples will be single-parameter series visualized by run charts.

Electricity demand in Australia

Stepping away from our health theme, let's explore the time series of Australian monthly electricity demand in the figures below. Visually, it is immediately apparent there is a positive trend, as one would expect with population growth and technological advancement. ‌‌

Second, there is a pronounced seasonality to the data, as demand in winter will not be the same as in summer. An autocorrelation analysis can help us understand this better. Fundamentally, this checks the correlation between two points separated by a time delay or lag. ‌‌

As we can see in the autocorrelation function (ACF) graph, the highest correlation comes with a delay of exactly 12 months (implying a yearly seasonality), and the lowest with a half-year separation since electricity consumption is highly dependent on the time of year (air-conditioning, daylight hours, etc.). ‌‌

Since the underlying data has a trend (it isn’t stationary), as the lag increases, the ACF dies down since the two points are further and further apart, with the positive trend separating them more each year. These conclusions can become increasingly non-trivial when data spans less intuitive variables.

Monthly electricity in demand in Australia showing its seasonality

Boston Marathon winning times

Back to our health theme from the more exploratory previous example, let’s look at the winning times of the Boston Marathon. The aim here is different: we don’t particularly care why the winning times are such. We want to know whether they have been trending and where we can expect them to go. ‌‌

To do this, we need to fit a curve and assess its predictions. But how to know which curve to choose? There is no universal answer to this; however, even visually, you can eliminate a lot of options. In the figure below, we show you four different choices of fitted curves:‌‌

‌‌‌‌1. A linear fit

f(t) = at + b

‌‌2. A piecewise linear fit, which is just several linear fit segments spliced together

3. An exponential fit

f(t) = ae bt + c ‌

4. A cubic spline fit that’s like a piecewise linear fit where the segments are cubic polynomials that have to join smoothly‌

f(t) = at 3 + bt 2 + ct + d

‌‌Looking at the graph, it’s clear that the linear and exponential options aren’t a good fit. It boils down to the cubic spline and the piecewise linear fits. In fact, both are useful, although for different questions. ‌‌

The cubic spline is visually the best historical fit, but in the future (purple section), it trends upward in an intuitively unrealistic way, with the piecewise linear actually producing a far more reasonable prediction. Therefore, one has to be very careful when using good historical fits for prediction, which is why understanding the underlying data is extremely important when choosing forecasting models.

Time-series data on the Boston Marathon’s winning times with different fitted curves and their forecasts (source)

Electrocardiogram analysis

As a final example to illustrate the classification and segmentation types of problems, take a look at the following graph. Imagine wanting to train a machine to recognize certain heart irregularities from electrocardiogram (ECG) readings. ‌‌

First, this is a segmentation problem, as you need to split each ECG time series into sequences corresponding to one heartbeat cycle. The dashed red lines in the diagram are the splittings of these cycles. Having done this on both regular and irregular readings, this becomes a classification problem—the algorithm should now analyze other ECG readouts and search for patterns corresponding to either a regular or irregular heartbeat.

ECG time series segmented into heartbeat cycles (source)

Challenges in Handling Time-Series Data

Although time-series data offers valuable insights, it also presents unique challenges that need to be addressed during analysis.

Dealing with missing values

Time-series data often contains missing or incomplete values, which can adversely affect the accuracy of analysis and modeling. To handle missing values, various techniques like interpolation or imputation can be applied, depending on the nature of the data and the extent of missingness.

Overcoming noise in time-series data

Noise refers to random fluctuations or irregularities in time-series data, which can obscure the underlying patterns and trends. Filtering techniques, such as moving averages or wavelet transforms, can help reduce noise and extract the essential information from the data.

Learn More About Time-Series Analysis

This was just a glimpse of what time-series analysis offers. By now, you should know that time-series data is ubiquitous. To measure the constant change around you for added efficiency and productivity (whether in life or business), you need to go for it and start analyzing it .‌‌

I hope this article has piqued your interest, but nothing compares to trying it out yourself. And for that, you need a robust database to handle the massive time-series datasets. Try Timescale , a modern, cloud-native relational database platform for time series that will give you reliability, fast queries, and the ability to scale infinitely to understand better what is changing, why, and when .‌‌

Continue your time-series journey:

  • What Is Time-Series Data? (With Examples)
  • What Is Time-Series Forecasting?
  • Time-Series Database: An Explainer
  • A Guide on Data Analysis on PostgreSQL
  • What Is a Time-Series Graph With Examples
  • What Is a Time-Series Plot, and How Can You Create One
  • Get Started With TimescaleDB With Our Tutorials
  • How to Write Better Queries for Time-Series Data Analysis With Custom SQL Functions
  • Speeding Up Data Analysis With TimescaleDB and PostgreSQL
  • Clay Grewcoe

Clay Grewcoe

Related posts

Best Practices for Time-Series Data Modeling: Single or Multiple Partitioned Table(s) a.k.a. Hypertables

Best Practices for Time-Series Data Modeling: Single or Multiple Partitioned Table(s) a.k.a. Hypertables

Time-series data is relentless, so you know you’ll have to create one or more partitioned tables (a.k.a. Timescale hypertables) to store it. Learn how to choose the best data modeling option for your use case—single or multiple hypertables.

How We Made Data Aggregation Better and Faster on PostgreSQL With TimescaleDB 2.7

How We Made Data Aggregation Better and Faster on PostgreSQL With TimescaleDB 2.7

They’re so fast we can’t catch up! Check out our benchmarks with two datasets to learn how we used continuous aggregates to make queries up to 44,000x faster, while requiring 60 % less storage (on average).

Announcing the New Timescale, and a New Vision for the Future of Database Services in the Cloud

Announcing the New Timescale, and a New Vision for the Future of Database Services in the Cloud

Timescale provides a balance between a familiar developer platform and a flexible database for modern applications while retaining the ease and scalability of modern cloud services.

tableau.com is not available in your region.

case study on time series

  • Software Discovery & Ideation
  • Software Development Audit & Consulting
  • Vendor Audit & Vendor Transfer
  • Scale-up Proposition
  • No-Code / Low-code Development
  • Software Integrations & Development
  • Web Development Services
  • Cross-platform App Development
  • Fintech Software Development
  • Retail Software Development
  • eCommerce Software Development
  • Proptech Software Development
  • Healthcare Software Development
  • POS Development
  • UI/UX & Product Design
  • MVP Development
  • Node js Development
  • ERP Development

Latest Posts

Today, more and more companies are adopting headless architecture to meet customer expectations for smooth...

Your start-up idea is undeniably important, but it gets more appealing to potential investors when...

  • AI Feasibility Studies & Solution Audit
  • AI Proof of Concept
  • AI Adoption Framework
  • DataOps Pipelines, Data Governance & MLOps
  • Pre-built AI/ML Models Assessment & Implementation
  • Model Development – AI Research
  • Image Recognition
  • Time Series Forecasting
  • Augmented Reality
  • AI in Healthcare
  • Car Damage Recognition
  • Product Recognition

The crystal balls have been replaced by algorithms, making price forecasting an efficient asset for...

In a world where the only constant is change, our ability to predict the future...

  • Build Operate Transfer
  • Team Augmentation
  • Dedicated Development Team
  • Vendor Transfer Management
  • AI/ML Engineering Teams
  • No-Code/Low-Code Teams
  • Standard Software Engineering
  • Data Engineering Teams
  • Salesforce & Veeva Engineering Teams
  • Healthcare & Pharma

The efficiency of a business venture hinges heavily on its technical team’s performance. The project’s...

Manoeuvring through the tech, regulatory, and financial space of small or medium-sized business management often...

  • Case Studies
  • Partnership Program

Time series forecasting and its practical use cases

case study on time series

  • December 6, 2023
  • Artificial Intelligence
  • Reviewed by Tomas Masek
  • Edited by Andrius Keblas

case study on time series

Related Articles

Decision analytics in Finance: How ML (AI) boosts your intelligence

Big Data Analytics in Fintech – Benefits and Importance

Complimentary Consultation

We will explore how you can optimise your digital solutions and software development needs.

Table of Contents

In a world where the only constant is change, our ability to predict the future is an invaluable asset. Today, we can precisely forecast the ebb and flow of stock markets, anticipate the strain on healthcare systems during a pandemic, or accurately project the demand for the season’s hottest fashion item.

This capability extends far beyond mere guesswork; it’s rooted in the scientific method of time series forecasting . But what exactly is time series forecasting, and why has it become a linchpin in various industries? Let’s see it in more detail!

What is time series forecasting?

Time series forecasting covers a variety of statistical and data science techniques focused on predicting variables that evolve over time. This involves identifying patterns within the data and using obtained insights to make short- or long-term predictions about these changes based on the established patterns.

For example, in the retail sector, the decision-making process regarding inventory stocking and distribution relies heavily on anticipated regional demand trends. In cloud computing, future service and infrastructure utilisation predictions are key to strategic capacity planning.

Furthermore, workforce allocation depends on projected workloads in settings like warehouses, call centres, and manufacturing plants. The evolution in forecasting methods has been notable in recent years, transitioning from traditional, model-based approaches assisted by computers to modern, automated strategies rooted in data analysis .

Time series forecasting is a technique with use cases spanning a variety of real-world scenarios, including but not limited to:

  • Forecasting COVID-19 spread
  • Predicting Bitcoin prices
  • Estimating airport flight schedules
  • Anticipating daily wind power generation
  • Predicting avalanches at resorts
  • Projecting web traffic for a website, etc.

What is time series analysis?

Moving from forecasting to analysis, let’s clarify the distinction. Time series analysis involves dissecting data to understand the ‘what’ and ‘why’ of historical changes.

It’s less about predicting the future and more about understanding past patterns. In contrast, forecasting is more future-oriented, concentrating on predicting what is likely to occur next.

Future trends - Past Values - linear regression model - previous data - weather forecasting - data scientists

What is time-series data?

In various domains like business, economics, medicine, and diverse scientific disciplines, time series data commonly display patterns, including trends, seasonal variations, irregular cycles, and sporadic shifts in variability.

The goals of analysing these series typically involve extrapolating the dynamic patterns in the data to forecast future events, assessing the impact of known external interventions, and uncovering any unexpected interventions.

Patterns of time series forecasting

There are several characteristics of time-series data that need to be taken into account when modelling time-series data:

  • A trend component describes the way the variable changes over long periods of time. Often, a trend is described as ‘changing direction’ when it shifts from an increasing to a decreasing pattern or vice versa.
  • A seasonal pattern in time series is evident when the data is influenced by periodic factors, such as specific times of the year or days of the week. A fixed and predictable frequency characterises this seasonality. For example, the monthly sales of anti-diabetic drugs demonstrate such a seasonal trend. This can be partially attributed to variations in drug costs towards the year’s end, reflecting how external factors can induce seasonality in data.
  • Cycles in time series data are long-term patterns characterised by a waveform and recurring nature, related to seasonal patterns but without a fixed duration. Unlike seasonality, cycles don’t adhere to a strict time frame and can vary in length. A classic example of this is seen in business cycles, which typically include phases of growth, recession, and recovery.
  • Irregular components in time series data arise from unforeseen events like natural disasters. These components are typically unpredictable and do not follow a regular pattern or cycle, distinguishing them from other more systematic elements in the data.

Introduction to time series and forecasting: key components

Time series analysis comprises six primary components – these steps are the cornerstone of time series data analysis, resulting in accurate further forecasting.

  • Data Preparation is the foundation of any time series analysis. This stage entails gathering relevant historical time series data and scrutinising it for accuracy. At this point, it is important to identify and address any missing values or outliers in the dataset. Moreover, the data is evaluated for stationarity, a fundamental assumption in time series modelling. If the data is not stationary, appropriate transformations stabilise its statistical properties over time, thereby preparing it for further analysis.
  • Exploratory Data Analysis facilitates an initial understanding of the data’s patterns and characteristics. This phase employs statistical techniques to delve into trends, seasonality, correlations, and cyclical behaviours inherent in the data. Visual tools, such as time series plots, autocorrelation function (ACF), and partial autocorrelation function (PACF) graphs, are extensively utilised for a preliminary examination of these underlying elements. EDA also involves identifying and documenting any unusual observations or anomalies, setting the stage for more detailed analysis.
  • Model Identification involves choosing the most appropriate time series forecasting model that aligns with the data’s characteristics. There are the best time series forecasting models such as AR (Autoregressive), MA (Moving Average), ARMA (Autoregressive Moving Average), etc that are guided by the patterns identified during the exploratory analysis phase. Model identification also involves determining the necessary number of terms and the appropriate lag structure for the model. This stage heavily relies on statistical tests to ascertain the most suitable model that fits the data effectively.

Predict future values - training data - recurrent neural networks - forecasting time series data - moving average model

4. Model Forecasting involves applying the identified model to historical data to forecast one or more future periods. The accuracy of these forecasts is critically evaluated using key metrics on test datasets. This evaluation phase helps to assess the model’s performance and reliability in predicting future data points. Additionally, if necessary, supplementary variables can be incorporated into the model to improve the accuracy of the forecast.

5. Model Maintenance , the final step in the time series analysis, focuses on the ongoing assessment of the model’s performance. As fresh actual data becomes available, it is essential to periodically re-evaluate the model to ensure it remains accurate and relevant. This may involve updating the model to accommodate new trends, patterns, or changes in its forecasting environment. Regular maintenance ensures that the model provides reliable and effective forecasts, adapting as necessary to evolving conditions and behaviours over time.

6. Application Domains – the concepts and models developed through time series analysis can be broadly applied across numerous fields that exhibit time dependency. This versatility allows for various applications, such as demand forecasting in business, sales prediction, analysing stock market trends, and numerous other areas. The adaptability of time series analysis to different domains underscores its importance in making informed, predictive decisions based on temporal data patterns.

What are the time series forecasting methods?

Time series analysis employs numerous statistical models and techniques to identify patterns and make predictions using data over time.

  • Moving average is suitable for smoothing out long-term trends and identifying the direction values move.
  • Exponential smoothing works best with univariate data with a systematic trend or seasonal component. It prioritises recent observations, allowing for more dynamic adjustments.
  • Autoregression, good for short-term forecasting, uses past observations as inputs for a regression equation to forecast future values.
  • Decomposition splits a time series into its core components—trend, seasonality, and residuals—to improve understanding and prediction accuracy.
  • Time Series Clustering categorises data points based on similarity, helping to identify archetypes or trends in sequential data.
  • Wavelet Analysis helps to analyse non-stationary time series data and identify patterns across various scales or resolutions.
  • Intervention Analysis evaluates the influence of external events on a time series, such as the result of a policy change or a marketing campaign.

Numerous time series forecasting models have arisen from statistics, econometrics, signal processing, and machine learning.

Linear models such as ARIMA, distinguished by their simplicity and interpretability, offer valuable tools for analysis. At the same time, using deep learning for time series forecasting can cope with complex time series that cannot usually be handled by other machine learning techniques.

This rich diversity of modelling techniques equips practitioners with a versatile toolkit to effectively address a broad spectrum of temporal data challenges.

Examples of time series forecasting

Now, let’s explore how time series forecasting is applied in different sectors.

In light of the prevailing enthusiasm surrounding FinTech and AI, there has been a notable surge in the analysis and prediction of financial time-series data about global financial markets. Thus, businesses employ time series forecasting and analysis for:

  • Stock market assessment : By analysing historical data, models can forecast future stock performance, assisting investors in making informed decisions.
  • Risk management : Time series forecasting helps assess and manage risks by predicting market volatility, credit risk, and liquidity risk, enabling financial institutions to develop more efficient strategies to mitigate potential losses.
  • Algorithmic trading : Automated trading systems use time series analysis to predict future market fluctuations and execute trades, accordingly, optimising for maximum profits.
  • Economic forecasting : The prediction of economic indicators such as inflation rates, GDP growth, and unemployment rates assists policymakers and businesses in strategic planning.
  • Portfolio management : Asset managers use time series forecasting to optimise asset allocation, balance risks, and maximise returns for investment portfolios.
  • Loan and credit scoring : Financial institutions predict credit risk by analysing repayment histories and financial behaviours over time, improving the accuracy of loan approvals and interest rate determination.
  • Demand forecasting in banking services : Banks and financial institutions forecast the demand for various financial products and services, which helps in resource allocation and service optimisation.
  • Fraud detection : By analysing transaction time series, fintech companies can detect unusual patterns indicative of fraudulent activity, enhancing security measures.
  • Interest rate prediction : Predicting interest rate changes is essential for various financial activities, including bond investing and mortgage lending.
  • Budgeting and financial planning : Businesses use time series forecasting for revenue and expense predictions, facilitating budgeting and financial planning.

Time series analysis is widely employed in many fields, including retail, to forecast future events based on historical data, which results in:

  • Enhanced inventory management : The precision of sales forecasts empowers the businesses to uphold optimal inventory levels, thereby mitigating the occurrences of stockouts and overstock situations.
  • Improved resource allocation : By leveraging dependable sales forecasts, the company can allocate resources, including personnel and marketing efforts, more efficiently.
  • Informed decision-making : The application of time series analysis yields invaluable insights into sales patterns and trends, furnishing the company with the capability to make informed decisions and proactively adapt to evolving market dynamics.
  • Profitability : Through the optimisation of inventory management and resource allocation, the organisation can curtail expenses and drive increased sales, ultimately resulting in increased profitability.
  • Revenue prediction : Retail businesses forecast revenue trends to plan budgets and make investment decisions.
  • Store performance analysis : Time series data helps compare performance across different time periods and store locations, identifying trends and areas for improvement.
  • Workforce management : Retailers can forecast busy periods and staff, accordingly, ensuring efficient customer service without overspending on labour.
  • Promotional planning : By analysing past sales data during promotions, retailers can forecast the impact of future promotional activities on sales and stock levels.
  • Trend analysis : Identifying long-term trends in consumer behaviour, product popularity, and market dynamics helps in strategic planning.

Within medical applications, time series forecasting models have demonstrated notable success, covering:

  • Disease outbreak prediction : Forecasting the outbreak and spread of diseases helps prepare healthcare systems for increased demand and implementing preventive measures.
  • Patient admission rates : Predicting hospital admission rates enables better staffing and resource allocation, ensuring adequate patient care without overburdening healthcare facilities.
  • Drug demand forecasting : Accurately predicting the demand for various medications helps manage pharmacy inventory, reduce wastage, and ensure critical drug availability.
  • Resource allocation : Forecasting the need for medical equipment, beds, and other resources helps hospitals manage their resources more efficiently and effectively.
  • Public health planning : Predicting trends in public health issues (like obesity or smoking rates) aids in planning and implementing public health initiatives.
  • Epidemiological studies : Time series analysis is crucial in tracking the progression of diseases over time, aiding in epidemiological research and public health strategies.
  • Staffing and scheduling : Forecasting patient flow enables hospitals and clinics to optimise staffing schedules, ensuring adequate healthcare provision at all times.
  • Medical research and clinical trials : Time series data can be used to track the progress and outcomes of clinical trials and medical research over time.
  • Preventive healthcare : Analysing trends in patient data helps identify risk factors for diseases, enabling early intervention and preventive healthcare measures.
  • Healthcare expenditure forecasting : Predicting future healthcare costs helps governments and insurance companies in budgeting and policymaking.
  • Telehealth demand prediction : As telehealth becomes more prevalent, forecasting its demand helps in resource allocation and technology investment.

To wrap up, time series forecasting is not just about crunching numbers or plotting graphs; it’s about understanding the past to make informed predictions about the future.

Our journey through the landscape of time series forecasting reveals it has proven to be a key player in strategic planning across diverse industries. From predicting stock market trends to managing retail inventories and forecasting health crises, its applications are as varied as they are impactful.

When used effectively, it’s a tool that can offer significant insights and guide better decision-making.

As a time series analysis and forecasting service provider, Altamira is dedicated to helping you make data-driven decisions.

Our advanced algorithms and niche expertise can be a valuable asset for you in anticipating trends, optimising resource allocation, and making strategic decisions for a future-ready business. Contact us to learn more about predictive analytics opportunities and our cutting-edge time series forecasting solutions.

Leave a Comment Cancel reply

Why you can trust altamira.

At Altamira, trust is built on expertise. We deliver content that addresses our industry's core challenges because we understand them deeply. We aim to provide you with relevant insights and knowledge that go beyond the surface, empowering you to overcome obstacles and achieve impactful results. Apart from the insights, tips, and expert overviews, we are committed to becoming your reliable tech partner, putting transparency, IT expertise, and Agile-driven approach first.

Sign up for the latest Altamira news

Latest articles, headless architecture: what you need to know, top 8 mobile app development technologies for your start-up, how to organise a software development team, looking forward to your message.

  • Our experts will get back to you within 24h for free consultation.
  • All information provided is kept confidential and under NDA.

case study on time series

Bratislava, Slovakia

Kyiv, Ukraine

US: +1 347 305 10 63

SK: +421 948 656 863

Altamira.ai © Copyright 2009 – 2024

case study on time series

  • Open access
  • Published: 30 April 2022

A tutorial on the case time series design for small-area analysis

  • Antonio Gasparrini 1 , 2  

BMC Medical Research Methodology volume  22 , Article number:  129 ( 2022 ) Cite this article

4758 Accesses

18 Citations

1 Altmetric

Metrics details

The increased availability of data on health outcomes and risk factors collected at fine geographical resolution is one of the main reasons for the rising popularity of epidemiological analyses conducted at small-area level. However, this rich data setting poses important methodological issues related to modelling complexities and computational demands, as well as the linkage and harmonisation of data collected at different geographical levels.

This tutorial illustrated the extension of the case time series design, originally proposed for individual-level analyses on short-term associations with time-varying exposures, for applications using data aggregated over small geographical areas. The case time series design embeds the longitudinal structure of time series data within the self-matched framework of case-only methods, offering a flexible and highly adaptable analytical tool. The methodology is well suited for modelling complex temporal relationships, and it provides an efficient computational scheme for large datasets including longitudinal measurements collected at a fine geographical level.

The application of the case time series for small-area analyses is demonstrated using a real-data case study to assess the mortality risks associated with high temperature in the summers of 2006 and 2013 in London, UK. The example makes use of information on individual deaths, temperature, and socio-economic characteristics collected at different geographical levels. The tutorial describes the various steps of the analysis, namely the definition of the case time series structure and the linkage of the data, as well as the estimation of the risk associations and the assessment of vulnerability differences. R code and data are made available to fully reproduce the results and the graphical descriptions.

Conclusions

The extension of the case time series for small-area analysis offers a valuable analytical tool that combines modelling flexibility and computational efficiency. The increasing availability of data collected at fine geographical scales provides opportunities for its application to address a wide range of epidemiological questions.

Peer Review reports

Introduction

The field of epidemiology has experienced profound changes in the last decade, with the fast development of data science methods and technologies. Modern monitoring devices, for instance remote sensing instruments or mobile wearables [ 1 ], provide real-time measurements of a variety of risk factors with unparalleled coverage, quantity, and precision. Similarly, advancements in linkage procedures [ 2 ], together with improved computational capabilities, storage, and accessibility [ 3 ], offer epidemiologists rich and high-quality data to investigate health risks.

The availability of data on health outcomes and exposures with increased resolution is the main driver of the rising popularity of epidemiological analyses at small-area level [ 4 ]. Originally developed in spatial analysis, small-area methods have been then extended for spatio-temporal data to analyse observations collected longitudinally [ 5 , 6 ]. Similarly to traditional studies based on aggregated data, these investigations often make use of administratively collected information, usually more available to researchers and less sensitive to confidentiality restrictions. Nonetheless, these studies provide a richer data framework, merging information gathered from various sources at multiple geographical levels. The aggregation of information at finer spatial scales makes small-area studies less prone to ecological fallacies affecting traditional investigations using large-scale aggregations, and the availability of more detailed data can inform about more complex epidemiological mechanisms. Still, this context poses non-trivial practical and methodological problems, for instance high computational requirements related to the size of the data, and modelling issues due to their complexity [ 7 ].

The case time series (CTS) design is a methodology recently proposed for epidemiological analyses of short-term risks associated with time-varying exposures [ 8 ]. The design combines the modelling flexibility of time series models with the self-matched structure of case-only methods [ 9 ], providing a suitable framework for complex longitudinal data. Originally illustrated in individual-level analyses, the CTS design can be easily adapted for studies using data aggregated over small areas. This extension makes available a flexible methodology applicable for a wide range of research topics.

In this contribution, we provide a tutorial on the application of the CTS design for the analysis of small-area data. The tutorial describes several steps, including data gathering and linkage, modelling of epidemiological associations, and definition of effect summaries and outputs. The associated with non-optimal temperature in London, United Kingdom. The example is fully reproducible, with data and code in the R software available in a GitHub repository.

The case time series data structure

The real-data example is based on a dataset published by the Office of National Statistics (ONS), reporting the deaths that occurred in London in the summer period (June to August) of two years, 2006 and 2013. The data are aggregated by day of occurrence across 983 middle layer super output areas (MSOAs), small census-based aggregations with approximately 7,200 residents each. The dataset includes the death counts for both the age group 0–74 and 75 and older, which are combined in total numbers of daily deaths for this analysis. The paragraph below describes how these data must be formatted in a CTS structure.

The CTS design is based on the definition of cases , representing observational units for which data are longitudinally collected. The design involves the definition of case-specific series of continuous sequential observations. In the applications of the original article presenting the methodology [ 8 ], cases were represented by subjects, but the design can be extended by defining the observational units as small geographical areas. In this example, the process implies the aggregation of the mortality data in MSOA-specific daily series of mortality counts, including days with no death. It is worth noting that the design is similarly applicable with different types of health outcomes, for instance continuous variables obtained by averaging measurements within each area.

The mortality series derived for five of the 983 MSOAs in the summer of 2006 are displayed in Fig.  1 (top panel). Each MSOA is characterised by no more than one or a few daily deaths, with most of the days totalling none. The data can be then aggregated further by summing across all MSOAs, thus defining a single daily mortality series for the whole area of London, shown in Fig.  1 (bottom panel). These fully aggregated data will be used later to compare the results of the CTS methodology with a traditional time series analysis.

figure 1

Daily series of deaths for all causes in the period June–August 2006 in five random MSOAs (top panel) and aggregated across all the 983 MSOAs of London (bottom panel)

The definition of the geographical units depends both on the research question and practical considerations. The areas should be representative of exposure and health risk processes, in addition to being consistent with the resolution of the available data. Choosing finely aggregated areas can better capture underlying associations in the presence of small-scale dependencies, but would pointlessly inflate the computational demand in the presence of low-resolution exposure data or risk mechanisms acting at wider spatial scales.

Linking high-resolution exposure data

In this setting, one of the important advantages of the CTS design is the use of exposure measurements assigned to small areas (each of them representing a case), rather than averaging their values across large regions. The same applies to potential co-exposures or time-varying factors acting as confounders, that can be collected at the same small-area scale. Researchers have nowadays access to a variety of resources to retrieve high-resolution measurements of a multitude of risk factors across large populations. These resources include clinical and health databases, census and administrative data, consumer and marketing company data, and measurement networks, among others [ 3 ].

Environmental studies, for instance, can now rely on climate re-analysis and atmospheric emission-dispersion models that offer full coverage and high-resolution measures for a number of environmental stressors. In this case study, we extracted temperature data from the HadUK-Grid product developed by the Met Office [ 10 ]. This database includes daily values of minimum and maximum temperature on a 1 × 1 km grid across the United Kingdom. These data were averaged to derive mean daily temperature values and linked with the mortality series.

The linkage process consists in spatially aligning the two sources of information, namely the polygons defining the 983 MSOAs and the intersecting grid cells with corresponding temperature data. Figure  2 displays the two spatial structures, with the average summer temperature in the two years in each of the grid cells overlayed by the MSOA boundaries. The maps show the spatial differences in temperature within the areas of London, with higher values in more densely urbanised zones.

figure 2

Average summer temperature (°C) in 2006 (left) and 2013 (right) in a 1 × 1 km grid of the London area, with superimposed the boundaries of the 983 MSOAs

The alignment procedure is carried out using GIS techniques to compute the area-weighted average of the cells intersecting each MSOA, with weights proportional to the intersection areas. This step creates MSOA-specific daily series of temperatures that can be linked with the mortality data. The results are illustrated in Fig.  3 , which show the temperature distribution in three consecutive days in July 2006, demonstrating the differential temporal changes of temperature across areas of the city. The same linkage process can be applied to other exposures or confounders, each potentially defined over different spatial boundaries.

figure 3

Mean temperature in three consecutive days (13–15 July 2006) across the 983 MSOAs of London

An important advantage of the CTS design is the possibility to use data disaggregated at smaller scales, thus capturing differential changes in exposure across space and time, compared to traditional analyses using a single aggregated series that rely entirely on temporal contrasts. Even in the absence of measurement errors in both disaggregated and aggregated analysis, the former is therefore expected to result in more precise estimates. In this specific example, though, the gain in precision can be limited, as Fig.  3 indicates that the temporal variation seems to dominate compared to spatial differences. The two components of variation can be quantified by the average between-day and between-MSOA standard deviations in temperature, respectively. Results confirm the visual impression, with a temporal deviation of 3.0 °C compared to 0.4 °C of the spatial one.

Main analysis

The CTS design allows the application of flexible modelling techniques developed for time series analysis, but without requiring the aggregation of the data in a single series. The modelling framework is based on regression models with the following general form:

The model in Eq. 1  has a classical time series form, with outcomes \({y}_{it}\) collected along time \(t\) modelled through multiple regression terms [ 11 ]. Specific functions can be used to define the association with the exposure of interest \(x\) , potentially including delayed effects through the inclusion of lagged values  \(x_{t-\ell}\)  along lag period  \(\ell=0,\dots,L\) . Other terms can be represented by functions modelling the underlying temporal trends using multiple transformations of \(t\) , and potential time-varying predictors \(z\) . The main difference from traditional time series models is in the presence of multiple series for cases represented by the index \(i\) . In particular, cases define matched risk sets , with intercepts \({\xi }_{i}\) expressing baseline risks varying across observational units. The risk sets can be stratified further by defining different intercepts \({\xi }_{i(k)}\) for each time stratum \(k\) , thus modelling within-case variations in risk. The regression is efficiently performed using fixed-effects estimators available for different outcome families [ 12 , 13 ].

In our illustrative example, \({y}_{it}\) represents daily death counts for each of the \(i=1,\dots ,983\) MSOAs. The risk association with temperature \(x\) is modelled through a distributed lag non-linear model (DLNM) with a cross-basis term [ 14 ]. This bi-dimensional parametrisation is obtained using natural cubic splines defining the exposure–response (two knots at the 50 th and 90 th temperature percentiles) and lag-response (one knot at lag 1 over lag period 0–3) relationships. The other terms are two functions of time \(t\) , specifically natural cubic splines of day of the year with 3 degrees of freedom and an interaction with year indicators to model differential seasonal effects in 2006 and 2013, plus indicators for day of the week. Risk sets are defined by MSOA/year/month strata indicators \({\xi }_{i(k)}\) , allowing within-MSOA variation in baseline risks in addition to common trends captured by the temporal terms in Eq. 1  above. The model is fitted using a fixed-effects regression model with a quasi-Poisson family to account for overdispersion.

Results are displayed in Fig.  4 , which shows the overall cumulative exposure–response curve (dark gold) expressing the temperature-mortality association. The curve indicates an increase in mortality risks above 16 \(^\circ\) C, the optimal value corresponding minimum mortality temperature (MMT). The left tail of the curve suggests an increased risk also for relatively cold temperatures experienced during the summer period.

figure 4

Exposure–response relationships representing the temperature-mortality risk cumulated within lag 0–3 estimated using the CTS model on data disaggregated by MSOAs (dark gold) and from the standard time series model with the aggregated data (green)

The CTS model can be compared to a standard time series analysis performed by aggregating the data in single mortality (Fig.  1 , bottom panel) and temperature series, the latter obtained by averaging the daily values across MSOAs. The model is specified using the same terms and parameterisation as above. The estimated relationship is added to Fig.  4 (green curve). The aggregated analysis reports the association over a narrower range, as local extreme temperatures are averaged out (see Fig.  3 ), and indicates slightly lower risks, in particular failing to capture the residual cold effects. As anticipated, there seems to be little gain in statistical precision from the CTS model, given that in this example the temperature variation is mainly driven by day-to-day variation more than by spatial differences.

Assessing differentials in vulnerability

The analysis can be extended by introducing additional terms in the model of Eq. 1 , for instance to control for confounders or investigate effect modifications. Associations with time-varying factors can be specified in the usual way through main and interaction terms included directly in the model. In contrast, the conditional framework of fixed-effects regression removes effects associated with time-invariant factors, which are absorbed in the intercepts \({\xi }_{i(k)}\) [ 12 ]. This ensures that potential confounding from such terms is controlled for by design, but has the drawback that their main effects cannot be estimated. Still, interactions with time-invariant terms can be specified to model differential health risks across small areas. In our case study, we apply this method to investigate vulnerability to extreme temperature depending on socio-economic status, represented by the index of multiple deprivation (IMD).

As mentioned above, small-area studies can rely on information collected at different geographical levels, but this requires all the variables to be re-aligned over the same spatial structure, as shown for mortality and temperature above. In this example. IMD scores (defined from 0 as the most deprived to 1 as the least deprived) were originally collected at the smallest census level, the lower super-output areas (LSOAs). Therefore, this information is first re-aligned by averaging the values by MSOA.

The model is then extended by specifying a linear interaction between the cross-basis of temperature and the IMD score. The results are shown in Fig.  5 , which displays the overall cumulative exposure–response curves predicted for low (in blue) and high (red) IMD scores, with values set at the inter-quartile range. The graph suggests little evidence of differential risks by deprivation, as confirmed by the likelihood ratio test (accounting for overdispersion) that returns a p -value of 0.73. It is worth noting, however, that this lack of evidence can be explained by the limited statistical power due to the short study period (two summers).

figure 5

Exposure–response relationships representing the temperature-mortality risk cumulated within lag 0–3 predicted for less (blue) and more (red) deprived areas, defined by the inter-quartile range of the IMD score

This contribution presents a tutorial on the extension of the CTS design for the analysis of small-area data. The tutorial illustrates the analytical steps using a real-data example, and it discusses practical issues, for instance linkage procedures and data analysis, as well as methodological aspects. The case study uses publicly available datasets with data and R code documented and made available in a GitHub repository. The example is therefore fully reproducible and can be easily adapted to other settings for epidemiological analyses using small-area data.

The main feature of the CTS design is the embedment of flexible time series methods within a self-matched framework based on multiple observational units. This setting offers strong control for both time-invariant and time-varying confounding as well as the possibility to model complex temporal relationships using finely disaggregated data. These aspects are demonstrated in the case study illustrated above. Specifically, the stratification of the baseline risk removes structural differences between MSOAs, while allowing control for area-specific temporal variations on top of common trends modelled through interactions between splines terms and year indicators. Likewise, the time series structure lends itself neatly to the application of distributed lag linear and non-linear models to define complex exposure-lag-response relationships. Finally, the design can improve the characterisation of the association of interest by providing both spatial and temporal contrasts. This is demonstrated in the case study example, where we show how the case time series framework can account for local exposure differences, for instance due to heat island effects, and allows investigating geographical variations in vulnerability.

The advantages of small-area studies, when compared to more traditional approaches based on largely aggregated data, are obvious. First, measurements of health outcomes and risk factors at a small scale are expected to represent more appropriately risk association mechanisms and to provide better control for confounding, thus reducing potential biases that affect ecological studies [ 7 ]. Even in the absence of classical measurement error, whereby the aggregated exposure value is a valid proxy of the true population average, small-area studies can reduce the Berkson-type error and therefore increase the statistical power [ 15 ]. As discussed in the example above, the gain in precision is proportional to the geographical differences in exposure across the study area relative to temporal variations.

The CTS design can be compared to other approaches previously used for epidemiological analyses using small-area data. Traditionally, spatial and spatio-temporal analyses are performed using Bayesian hierarchical models [ 6 ]. These methods provide a powerful framework that accounts for spatial correlations and allows geographically-varying risks, but they present high computational demands that pose limits in the analysis of large datasets and/or complex associations. In contrast, the CTS design offers a flexible and computationally efficient scheme to analyse temporal dependencies while removing entirely potential biases linked to between-area comparisons. As an alternative approach, other studies have replicated two-stage designs developed in multi-city investigations to small-area analyses [ 16 , 17 ]. However, this method encounters estimation issues in the presence of sparse information due to finely disaggregated data, and for instance it would be unfeasible for the analysis of MSOAs in the illustrative example (see Fig.  1 ). Conversely, the CTS design sets no limit to data disaggregation, being applicable with the same structure to individual-level analyses. This aspect is shared by the case-crossover design, a popular methodology previously proposed in small-area analysis [ 18 , 19 ]. In fact, the CTS methodology can replicate exactly the matching structure of the case-crossover scheme [ 20 ], while allowing a more flexible control for temporal trends and modelling of temporal relationships, as demonstrated in the illustrative case study.

Some limitations must be acknowledged. First, similarly to traditional time series methods, the CTS design is only applicable to study short-term risk associations with time-varying exposures, and cannot be used to assess long-term health effects. Likewise, its application in small-area studies is still based on aggregated data and it essentially retains an ecological nature. However, the extreme stratification can prevent some of the associated biases, and it is worth noting that the CTS methodology can be seamlessly applied to individual-level data, when these are available. Finally, its time series structure is ideal for modelling complex temporal dependencies and trends, but presents limitations in capturing spatially correlated and varying risks.

In conclusion, the CTS methodology represents a valuable analytical tool analysis of small-area data. The framework is highly adaptable to various data settings, and it offers flexible features for modelling complex temporal patterns while controlling for time-varying factors and trends. The availability of data collected at small-area level provides opportunities for its application in a variety of epidemiological investigations of risk associations.

Availability of data and materials

The data, software and code for replicating the analysis and complete set of results are made fully available in a GitHub repository ( https://github.com/gasparrini/CTS-smallarea ). The original data, at the time of writing, were publicly available from online resources. Specifically, the number of daily deaths by MSOAs of London in the summers of 2006 and 2013 was published by ONS ( link ); the geographical boundaries of the MSOAs and the lookup table between LSOAs and MSOAs (for the 2011 census) were available at the Open Geography Portal of ONS ( link ) and the data Open Data portal of GOV.UK ( link ); the gridded daily temperature data temperature data in the HadUK-Grid database from the Met Office were extracted from the Centre for Environmental Data Analysis (CEDA) archive ( link ); the IMD scores by LSOAs (for the year 2015) were provided at GOV.UK ( link ). Additional information on the linkage procedure with the original resources to obtain the final data, as well as the use of the R scripts, are provided in the GitHub repository.

Abbreviations

  • Case time series

Distributed lag non-linear model

Middle layer super output area

Lower layer super output area

Index of multiple deprivation

Office for National Statistics

Reis S, Seto E, Northcross A, Quinn NWT, Convertino M, Jones RL, et al. Integrating modelling and smart sensors for environmental and human health. Environ Model Softw. 2015;74:238–46.

Article   Google Scholar  

Harron KL, Doidge JC, Knight HE, Gilbert RE, Goldstein H, Cromwell DA, et al. A guide to evaluating linkage quality for the analysis of linked data. Int J Epidemiol. 2017;46(5):1699–710.

Hodgson S, Fecht D, Gulliver J, Iyathooray Daby H, Piel FB, Yip F, et al. Availability, access, analysis and dissemination of small-area data. Int J Epidemiol. 2020;49(Suppl 1):i4–14.

Fecht D, Cockings S, Hodgson S, Piel FB, Martin D, Waller LA. Advances in mapping population and demographic characteristics at small-area levels. Int J Epidemiol. 2020;49(Suppl 1):i15–25.

Meliker JR, Sloan CD. Spatio-temporal epidemiology: principles and opportunities. Spat Spatio-Temporal Epidemiol. 2011;2(1):1–9.

Blangiardo M, Cameletti M, Baio G, Rue H. Spatial and spatio-temporal models with R-INLA. Spat Spatio-Temporal Epidemiol. 2013;4:33–49.

Piel FB, Fecht D, Hodgson S, Blangiardo M, Toledano M, Hansell AL, et al. Small-area methods for investigation of environment and health. Int J Epidemiol. 2020;49(2):686–99.

Gasparrini A. 2021. The case time series design. Epidemiology. 2021;32(6):829-37.

Mostofsky E, Coull BA, Mittleman MA. Analysis of observational self-matched data to examine acute triggers of outcome events with abrupt onset. Epidemiology. 2018;29(6):804–16.

Met Office, Hollis D, McCarthy M, Kendon M, Legg T, Simpson I. HadUK-Grid gridded climate observations on a 1km grid over the UK, v1. 0.1. 0 (1862–2018). Centre for Environmental Data Analysis, 2019.

Bhaskaran K, Gasparrini A, Hajat S, Smeeth L, Armstrong B. Time series regression studies in environmental epidemiology. Int J Epidemiol. 2013;42(4):1187–95.

Gunasekara FI, Richardson K, Carter K, Blakely T. Fixed effects analysis of repeated measures data. Int J Epidemiol. 2013;43(1):264–9.

Allison PD. Fixed Effects Regression Models. US: SAGE Publications Inc; 2009.

Gasparrini A, Armstrong B, Kenward MG. Distributed lag non-linear models. Stat Med. 2010;29(21):2224–34.

Article   CAS   Google Scholar  

Armstrong BG. Effect of measurement error on epidemiological studies of environmental and occupational exposures. Occup Environ Med. 1998;55(10):651.

Benmarhnia T, Kihal-Talantikite W, Ragettli MS, Deguen Se, ,verine. Small-area spatiotemporal analysis of heatwave impacts on elderly mortality in Paris: A cluster analysis approach. Science of The Total Environment. 2017;592:288–94.

Zafeiratou S, Analitis A, Founda D, Giannakopoulos C, Varotsos KV, Sismanidis P, et al. Spatial variability in the effect of high ambient temperature on mortality: an analysis at municipality level within the Greater Athens area. Int J Environ Res Public Health. 2019;16(19):3689.

Bennett JE, Blangiardo M, Fecht D, Elliott P, Ezzati M. Vulnerability to the mortality effects of warm temperature in the districts of England and Wales. Nat Clim Chang. 2014;4(4):269.

Stafoggia M, Bellander T. Short-term effects of air pollutants on daily mortality in the Stockholm county - A spatiotemporal analysis. Environ Res. 2020;188: 109854.

Armstrong BG, Gasparrini A, Tobias A. Conditional Poisson models: a flexible alternative to conditional logistic case cross-over analysis. BMC Med Res Methodol. 2014;14(1):122.

Download references

Acknowledgements

Not applicable.

This work was supported by the Medical Research Council-UK (Grant ID: MR/R013349/1).

Author information

Authors and affiliations.

Department of Public Health, Environments and Society, London School of Hygiene and Tropical Medicine (LSHTM), 15-17 Tavistock Place, London, WC1H 9SH, UK

Antonio Gasparrini

Centre for Statistical Methodology, London School of Hygiene & Tropical Medicine (LSHTM), Keppel Street, London, WC1E 7HT, UK

You can also search for this author in PubMed   Google Scholar

Contributions

AG is the sole author of this article. The author(s) read and approved the final manuscript.

Corresponding author

Correspondence to Antonio Gasparrini .

Ethics declarations

Ethics approval and consent to participate, consent for publication, competing interests.

The authors declare that they have no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Gasparrini, A. A tutorial on the case time series design for small-area analysis. BMC Med Res Methodol 22 , 129 (2022). https://doi.org/10.1186/s12874-022-01612-x

Download citation

Received : 12 February 2022

Accepted : 12 April 2022

Published : 30 April 2022

DOI : https://doi.org/10.1186/s12874-022-01612-x

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Time series
  • Distributed lag models
  • Study design
  • Temperature

BMC Medical Research Methodology

ISSN: 1471-2288

case study on time series

Introduction to Environmental Data Science

16 time series case studies, 16.1 loney meadow flux data.

At the beginning of this chapter, we looked at an example of a time series in flux tower measurements of northern Sierra meadows, such as in Loney Meadow where during the 2016 season a flux tower was used to capture CO 2 flux and related micrometeorological data.

We also captured multispectral imagery using a drone, allowing for creating high-resolution (5-cm pixel) imagery of the meadow in false color (with NIR as red, red as green, green as blue), useful for showing healthy vegetation (as red) and water bodies (as black).

Loney Meadow False Color image from drone-mounted multispectral camera, 2017

Figure 16.1: Loney Meadow False Color image from drone-mounted multispectral camera, 2017

Flux tower installed at Loney Meadow, 2016. Photo credit: Darren Blackburn

Figure 16.2: Flux tower installed at Loney Meadow, 2016. Photo credit: Darren Blackburn

The flux tower data were collected at a high frequency for eddy covariance processing where 3D wind speed data are used to model the movement of atmospheric gases, including CO 2 flux driven by photosynthesis and respiration processes. Note that the sign convention of CO 2 flux is that positive flux is release to the atmosphere, which might happen when less photosynthesis is happening but respiration and other CO 2 releases continue, while a negative flux might happen when more photosynthesis is capturing more CO 2 .

A spreadsheet of 30-minute summaries from 17 May to 6 September can be found in the igisci extdata folder as "meadows/LoneyMeadow_30minCO2fluxes_Geog604.xls" , and includes data on photosynthetically active radiation (PAR), net radiation (Qnet), air temperature, relative humidity, soil temperature at 2 and 10 cm depth, wind direction, wind speed, rainfall, and soil volumetric water content (VWC). There’s clearly a lot more we can do with these data (see Blackburn, Oliphant, and Davis ( 2021 ) ), but we’ll look at CO 2 flux and PAR using some of the same methods we’ve just explored.

First we’ll test read in the data (I’d encourage you to also look at the spreadsheet in Excel [but don’t change it] to see how it’s organized) …

… and see that just as with the Bugac and Hungary data, it has half-hour readings and the second line of the file has measurement units. There are multiple ways of dealing with that, but this time we’ll capture the variable names then add them back after removing the first two rows:

The time unit we’ll want to use for time series is going to be days, and we can also then look at the data over time, and a group_by-summarize process by days will give us a generalized picture of changes over the collection period reflecting phenological changes from first exposure after snowmelt through the maximum growth period and through the major senescence period of late summer. We’ll look at a faceted graph from a pivot_long table.

Facet plot with free y scale of Loney flux tower parameters

Figure 16.3: Facet plot with free y scale of Loney flux tower parameters

Now we’ll build a time series for CO 2 for an 8-day period over the summer solstice, using the start time and frequency (there’s also a time stamp, but this was easier, since I knew the data had no gaps):

Loney CO~2~ decomposition by day, 8-day period at summer solstice

Figure 16.4: Loney CO 2 decomposition by day, 8-day period at summer solstice

Finally, we’ll create a couple ensemble average plots from all of the data, with sd error bars similar to what we did for Manaus, and with cowplot used again to compare two ensemble plots:

Loney meadow CO~2~ and PAR ensemble averages

Figure 16.5: Loney meadow CO 2 and PAR ensemble averages

We can explore the Loney meadow data further, maybe comparing multiple ensemble averages, relating variables (like the example here):

Loney CO2 flux vs Qnet

Figure 16.6: Loney CO2 flux vs Qnet

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • BMC Med Res Methodol

Logo of bmcmrm

A tutorial on the case time series design for small-area analysis

Antonio gasparrini.

1 Department of Public Health, Environments and Society, London School of Hygiene and Tropical Medicine (LSHTM), 15-17 Tavistock Place, London, WC1H 9SH UK

2 Centre for Statistical Methodology, London School of Hygiene & Tropical Medicine (LSHTM), Keppel Street, London, WC1E 7HT UK

Associated Data

The data, software and code for replicating the analysis and complete set of results are made fully available in a GitHub repository ( https://github.com/gasparrini/CTS-smallarea ). The original data, at the time of writing, were publicly available from online resources. Specifically, the number of daily deaths by MSOAs of London in the summers of 2006 and 2013 was published by ONS ( link ); the geographical boundaries of the MSOAs and the lookup table between LSOAs and MSOAs (for the 2011 census) were available at the Open Geography Portal of ONS ( link ) and the data Open Data portal of GOV.UK ( link ); the gridded daily temperature data temperature data in the HadUK-Grid database from the Met Office were extracted from the Centre for Environmental Data Analysis (CEDA) archive ( link ); the IMD scores by LSOAs (for the year 2015) were provided at GOV.UK ( link ). Additional information on the linkage procedure with the original resources to obtain the final data, as well as the use of the R scripts, are provided in the GitHub repository.

The increased availability of data on health outcomes and risk factors collected at fine geographical resolution is one of the main reasons for the rising popularity of epidemiological analyses conducted at small-area level. However, this rich data setting poses important methodological issues related to modelling complexities and computational demands, as well as the linkage and harmonisation of data collected at different geographical levels.

This tutorial illustrated the extension of the case time series design, originally proposed for individual-level analyses on short-term associations with time-varying exposures, for applications using data aggregated over small geographical areas. The case time series design embeds the longitudinal structure of time series data within the self-matched framework of case-only methods, offering a flexible and highly adaptable analytical tool. The methodology is well suited for modelling complex temporal relationships, and it provides an efficient computational scheme for large datasets including longitudinal measurements collected at a fine geographical level.

The application of the case time series for small-area analyses is demonstrated using a real-data case study to assess the mortality risks associated with high temperature in the summers of 2006 and 2013 in London, UK. The example makes use of information on individual deaths, temperature, and socio-economic characteristics collected at different geographical levels. The tutorial describes the various steps of the analysis, namely the definition of the case time series structure and the linkage of the data, as well as the estimation of the risk associations and the assessment of vulnerability differences. R code and data are made available to fully reproduce the results and the graphical descriptions.

Conclusions

The extension of the case time series for small-area analysis offers a valuable analytical tool that combines modelling flexibility and computational efficiency. The increasing availability of data collected at fine geographical scales provides opportunities for its application to address a wide range of epidemiological questions.

Introduction

The field of epidemiology has experienced profound changes in the last decade, with the fast development of data science methods and technologies. Modern monitoring devices, for instance remote sensing instruments or mobile wearables [ 1 ], provide real-time measurements of a variety of risk factors with unparalleled coverage, quantity, and precision. Similarly, advancements in linkage procedures [ 2 ], together with improved computational capabilities, storage, and accessibility [ 3 ], offer epidemiologists rich and high-quality data to investigate health risks.

The availability of data on health outcomes and exposures with increased resolution is the main driver of the rising popularity of epidemiological analyses at small-area level [ 4 ]. Originally developed in spatial analysis, small-area methods have been then extended for spatio-temporal data to analyse observations collected longitudinally [ 5 , 6 ]. Similarly to traditional studies based on aggregated data, these investigations often make use of administratively collected information, usually more available to researchers and less sensitive to confidentiality restrictions. Nonetheless, these studies provide a richer data framework, merging information gathered from various sources at multiple geographical levels. The aggregation of information at finer spatial scales makes small-area studies less prone to ecological fallacies affecting traditional investigations using large-scale aggregations, and the availability of more detailed data can inform about more complex epidemiological mechanisms. Still, this context poses non-trivial practical and methodological problems, for instance high computational requirements related to the size of the data, and modelling issues due to their complexity [ 7 ].

The case time series (CTS) design is a methodology recently proposed for epidemiological analyses of short-term risks associated with time-varying exposures [ 8 ]. The design combines the modelling flexibility of time series models with the self-matched structure of case-only methods [ 9 ], providing a suitable framework for complex longitudinal data. Originally illustrated in individual-level analyses, the CTS design can be easily adapted for studies using data aggregated over small areas. This extension makes available a flexible methodology applicable for a wide range of research topics.

In this contribution, we provide a tutorial on the application of the CTS design for the analysis of small-area data. The tutorial describes several steps, including data gathering and linkage, modelling of epidemiological associations, and definition of effect summaries and outputs. The associated with non-optimal temperature in London, United Kingdom. The example is fully reproducible, with data and code in the R software available in a GitHub repository.

The case time series data structure

The real-data example is based on a dataset published by the Office of National Statistics (ONS), reporting the deaths that occurred in London in the summer period (June to August) of two years, 2006 and 2013. The data are aggregated by day of occurrence across 983 middle layer super output areas (MSOAs), small census-based aggregations with approximately 7,200 residents each. The dataset includes the death counts for both the age group 0–74 and 75 and older, which are combined in total numbers of daily deaths for this analysis. The paragraph below describes how these data must be formatted in a CTS structure.

The CTS design is based on the definition of cases , representing observational units for which data are longitudinally collected. The design involves the definition of case-specific series of continuous sequential observations. In the applications of the original article presenting the methodology [ 8 ], cases were represented by subjects, but the design can be extended by defining the observational units as small geographical areas. In this example, the process implies the aggregation of the mortality data in MSOA-specific daily series of mortality counts, including days with no death. It is worth noting that the design is similarly applicable with different types of health outcomes, for instance continuous variables obtained by averaging measurements within each area.

The mortality series derived for five of the 983 MSOAs in the summer of 2006 are displayed in Fig.  1 (top panel). Each MSOA is characterised by no more than one or a few daily deaths, with most of the days totalling none. The data can be then aggregated further by summing across all MSOAs, thus defining a single daily mortality series for the whole area of London, shown in Fig.  1 (bottom panel). These fully aggregated data will be used later to compare the results of the CTS methodology with a traditional time series analysis.

An external file that holds a picture, illustration, etc.
Object name is 12874_2022_1612_Fig1_HTML.jpg

Daily series of deaths for all causes in the period June–August 2006 in five random MSOAs (top panel) and aggregated across all the 983 MSOAs of London (bottom panel)

The definition of the geographical units depends both on the research question and practical considerations. The areas should be representative of exposure and health risk processes, in addition to being consistent with the resolution of the available data. Choosing finely aggregated areas can better capture underlying associations in the presence of small-scale dependencies, but would pointlessly inflate the computational demand in the presence of low-resolution exposure data or risk mechanisms acting at wider spatial scales.

Linking high-resolution exposure data

In this setting, one of the important advantages of the CTS design is the use of exposure measurements assigned to small areas (each of them representing a case), rather than averaging their values across large regions. The same applies to potential co-exposures or time-varying factors acting as confounders, that can be collected at the same small-area scale. Researchers have nowadays access to a variety of resources to retrieve high-resolution measurements of a multitude of risk factors across large populations. These resources include clinical and health databases, census and administrative data, consumer and marketing company data, and measurement networks, among others [ 3 ].

Environmental studies, for instance, can now rely on climate re-analysis and atmospheric emission-dispersion models that offer full coverage and high-resolution measures for a number of environmental stressors. In this case study, we extracted temperature data from the HadUK-Grid product developed by the Met Office [ 10 ]. This database includes daily values of minimum and maximum temperature on a 1 × 1 km grid across the United Kingdom. These data were averaged to derive mean daily temperature values and linked with the mortality series.

The linkage process consists in spatially aligning the two sources of information, namely the polygons defining the 983 MSOAs and the intersecting grid cells with corresponding temperature data. Figure  2 displays the two spatial structures, with the average summer temperature in the two years in each of the grid cells overlayed by the MSOA boundaries. The maps show the spatial differences in temperature within the areas of London, with higher values in more densely urbanised zones.

An external file that holds a picture, illustration, etc.
Object name is 12874_2022_1612_Fig2_HTML.jpg

Average summer temperature (°C) in 2006 (left) and 2013 (right) in a 1 × 1 km grid of the London area, with superimposed the boundaries of the 983 MSOAs

The alignment procedure is carried out using GIS techniques to compute the area-weighted average of the cells intersecting each MSOA, with weights proportional to the intersection areas. This step creates MSOA-specific daily series of temperatures that can be linked with the mortality data. The results are illustrated in Fig.  3 , which show the temperature distribution in three consecutive days in July 2006, demonstrating the differential temporal changes of temperature across areas of the city. The same linkage process can be applied to other exposures or confounders, each potentially defined over different spatial boundaries.

An external file that holds a picture, illustration, etc.
Object name is 12874_2022_1612_Fig3_HTML.jpg

Mean temperature in three consecutive days (13–15 July 2006) across the 983 MSOAs of London

An important advantage of the CTS design is the possibility to use data disaggregated at smaller scales, thus capturing differential changes in exposure across space and time, compared to traditional analyses using a single aggregated series that rely entirely on temporal contrasts. Even in the absence of measurement errors in both disaggregated and aggregated analysis, the former is therefore expected to result in more precise estimates. In this specific example, though, the gain in precision can be limited, as Fig.  3 indicates that the temporal variation seems to dominate compared to spatial differences. The two components of variation can be quantified by the average between-day and between-MSOA standard deviations in temperature, respectively. Results confirm the visual impression, with a temporal deviation of 3.0 °C compared to 0.4 °C of the spatial one.

Main analysis

The CTS design allows the application of flexible modelling techniques developed for time series analysis, but without requiring the aggregation of the data in a single series. The modelling framework is based on regression models with the following general form:

The model in Eq. 1  has a classical time series form, with outcomes y it collected along time t modelled through multiple regression terms [ 11 ]. Specific functions can be used to define the association with the exposure of interest x , potentially including delayed effects through the inclusion of lagged values  x t - ℓ  along lag period  ℓ = 0 , ⋯ , L . Other terms can be represented by functions modelling the underlying temporal trends using multiple transformations of t , and potential time-varying predictors z . The main difference from traditional time series models is in the presence of multiple series for cases represented by the index i . In particular, cases define matched risk sets , with intercepts ξ i expressing baseline risks varying across observational units. The risk sets can be stratified further by defining different intercepts ξ i ( k ) for each time stratum k , thus modelling within-case variations in risk. The regression is efficiently performed using fixed-effects estimators available for different outcome families [ 12 , 13 ].

In our illustrative example, y it represents daily death counts for each of the i = 1 , ⋯ , 983 MSOAs. The risk association with temperature x is modelled through a distributed lag non-linear model (DLNM) with a cross-basis term [ 14 ]. This bi-dimensional parametrisation is obtained using natural cubic splines defining the exposure–response (two knots at the 50 th and 90 th temperature percentiles) and lag-response (one knot at lag 1 over lag period 0–3) relationships. The other terms are two functions of time t , specifically natural cubic splines of day of the year with 3 degrees of freedom and an interaction with year indicators to model differential seasonal effects in 2006 and 2013, plus indicators for day of the week. Risk sets are defined by MSOA/year/month strata indicators ξ i ( k ) , allowing within-MSOA variation in baseline risks in addition to common trends captured by the temporal terms in Eq. 1  above. The model is fitted using a fixed-effects regression model with a quasi-Poisson family to account for overdispersion.

Results are displayed in Fig.  4 , which shows the overall cumulative exposure–response curve (dark gold) expressing the temperature-mortality association. The curve indicates an increase in mortality risks above 16 ∘ C, the optimal value corresponding minimum mortality temperature (MMT). The left tail of the curve suggests an increased risk also for relatively cold temperatures experienced during the summer period.

An external file that holds a picture, illustration, etc.
Object name is 12874_2022_1612_Fig4_HTML.jpg

Exposure–response relationships representing the temperature-mortality risk cumulated within lag 0–3 estimated using the CTS model on data disaggregated by MSOAs (dark gold) and from the standard time series model with the aggregated data (green)

The CTS model can be compared to a standard time series analysis performed by aggregating the data in single mortality (Fig.  1 , bottom panel) and temperature series, the latter obtained by averaging the daily values across MSOAs. The model is specified using the same terms and parameterisation as above. The estimated relationship is added to Fig.  4 (green curve). The aggregated analysis reports the association over a narrower range, as local extreme temperatures are averaged out (see Fig.  3 ), and indicates slightly lower risks, in particular failing to capture the residual cold effects. As anticipated, there seems to be little gain in statistical precision from the CTS model, given that in this example the temperature variation is mainly driven by day-to-day variation more than by spatial differences.

Assessing differentials in vulnerability

The analysis can be extended by introducing additional terms in the model of Eq. 1 , for instance to control for confounders or investigate effect modifications. Associations with time-varying factors can be specified in the usual way through main and interaction terms included directly in the model. In contrast, the conditional framework of fixed-effects regression removes effects associated with time-invariant factors, which are absorbed in the intercepts ξ i ( k ) [ 12 ]. This ensures that potential confounding from such terms is controlled for by design, but has the drawback that their main effects cannot be estimated. Still, interactions with time-invariant terms can be specified to model differential health risks across small areas. In our case study, we apply this method to investigate vulnerability to extreme temperature depending on socio-economic status, represented by the index of multiple deprivation (IMD).

As mentioned above, small-area studies can rely on information collected at different geographical levels, but this requires all the variables to be re-aligned over the same spatial structure, as shown for mortality and temperature above. In this example. IMD scores (defined from 0 as the most deprived to 1 as the least deprived) were originally collected at the smallest census level, the lower super-output areas (LSOAs). Therefore, this information is first re-aligned by averaging the values by MSOA.

The model is then extended by specifying a linear interaction between the cross-basis of temperature and the IMD score. The results are shown in Fig.  5 , which displays the overall cumulative exposure–response curves predicted for low (in blue) and high (red) IMD scores, with values set at the inter-quartile range. The graph suggests little evidence of differential risks by deprivation, as confirmed by the likelihood ratio test (accounting for overdispersion) that returns a p -value of 0.73. It is worth noting, however, that this lack of evidence can be explained by the limited statistical power due to the short study period (two summers).

An external file that holds a picture, illustration, etc.
Object name is 12874_2022_1612_Fig5_HTML.jpg

Exposure–response relationships representing the temperature-mortality risk cumulated within lag 0–3 predicted for less (blue) and more (red) deprived areas, defined by the inter-quartile range of the IMD score

This contribution presents a tutorial on the extension of the CTS design for the analysis of small-area data. The tutorial illustrates the analytical steps using a real-data example, and it discusses practical issues, for instance linkage procedures and data analysis, as well as methodological aspects. The case study uses publicly available datasets with data and R code documented and made available in a GitHub repository. The example is therefore fully reproducible and can be easily adapted to other settings for epidemiological analyses using small-area data.

The main feature of the CTS design is the embedment of flexible time series methods within a self-matched framework based on multiple observational units. This setting offers strong control for both time-invariant and time-varying confounding as well as the possibility to model complex temporal relationships using finely disaggregated data. These aspects are demonstrated in the case study illustrated above. Specifically, the stratification of the baseline risk removes structural differences between MSOAs, while allowing control for area-specific temporal variations on top of common trends modelled through interactions between splines terms and year indicators. Likewise, the time series structure lends itself neatly to the application of distributed lag linear and non-linear models to define complex exposure-lag-response relationships. Finally, the design can improve the characterisation of the association of interest by providing both spatial and temporal contrasts. This is demonstrated in the case study example, where we show how the case time series framework can account for local exposure differences, for instance due to heat island effects, and allows investigating geographical variations in vulnerability.

The advantages of small-area studies, when compared to more traditional approaches based on largely aggregated data, are obvious. First, measurements of health outcomes and risk factors at a small scale are expected to represent more appropriately risk association mechanisms and to provide better control for confounding, thus reducing potential biases that affect ecological studies [ 7 ]. Even in the absence of classical measurement error, whereby the aggregated exposure value is a valid proxy of the true population average, small-area studies can reduce the Berkson-type error and therefore increase the statistical power [ 15 ]. As discussed in the example above, the gain in precision is proportional to the geographical differences in exposure across the study area relative to temporal variations.

The CTS design can be compared to other approaches previously used for epidemiological analyses using small-area data. Traditionally, spatial and spatio-temporal analyses are performed using Bayesian hierarchical models [ 6 ]. These methods provide a powerful framework that accounts for spatial correlations and allows geographically-varying risks, but they present high computational demands that pose limits in the analysis of large datasets and/or complex associations. In contrast, the CTS design offers a flexible and computationally efficient scheme to analyse temporal dependencies while removing entirely potential biases linked to between-area comparisons. As an alternative approach, other studies have replicated two-stage designs developed in multi-city investigations to small-area analyses [ 16 , 17 ]. However, this method encounters estimation issues in the presence of sparse information due to finely disaggregated data, and for instance it would be unfeasible for the analysis of MSOAs in the illustrative example (see Fig.  1 ). Conversely, the CTS design sets no limit to data disaggregation, being applicable with the same structure to individual-level analyses. This aspect is shared by the case-crossover design, a popular methodology previously proposed in small-area analysis [ 18 , 19 ]. In fact, the CTS methodology can replicate exactly the matching structure of the case-crossover scheme [ 20 ], while allowing a more flexible control for temporal trends and modelling of temporal relationships, as demonstrated in the illustrative case study.

Some limitations must be acknowledged. First, similarly to traditional time series methods, the CTS design is only applicable to study short-term risk associations with time-varying exposures, and cannot be used to assess long-term health effects. Likewise, its application in small-area studies is still based on aggregated data and it essentially retains an ecological nature. However, the extreme stratification can prevent some of the associated biases, and it is worth noting that the CTS methodology can be seamlessly applied to individual-level data, when these are available. Finally, its time series structure is ideal for modelling complex temporal dependencies and trends, but presents limitations in capturing spatially correlated and varying risks.

In conclusion, the CTS methodology represents a valuable analytical tool analysis of small-area data. The framework is highly adaptable to various data settings, and it offers flexible features for modelling complex temporal patterns while controlling for time-varying factors and trends. The availability of data collected at small-area level provides opportunities for its application in a variety of epidemiological investigations of risk associations.

Acknowledgements

Not applicable.

Abbreviations

Authors’ contributions.

AG is the sole author of this article. The author(s) read and approved the final manuscript.

This work was supported by the Medical Research Council-UK (Grant ID: MR/R013349/1).

Availability of data and materials

Declarations.

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Exploring Time Series Analysis Techniques for Sales Forecasting

  • Conference paper
  • First Online: 03 November 2023
  • Cite this conference paper

Book cover

  • Murugan Arunkumar   ORCID: orcid.org/0009-0001-5587-2856 13 ,
  • Sambandam Palaniappan   ORCID: orcid.org/0000-0003-1423-9752 13 ,
  • R. Sujithra   ORCID: orcid.org/0009-0001-2110-9262 13 &
  • S. VijayPrakash   ORCID: orcid.org/0009-0008-2112-9248 13  

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 791))

Included in the following conference series:

  • International Conference on Data Science and Network Engineering

184 Accesses

Sales forecasting is a decisive task for businesses, as it enables them to make important decisions about production, inventory, and marketing strategies. Time series analysis is a tackle for sales forecasting, as it allows us to analyze and model data based on time-dependent patterns. In this paper, we explore different time series analysis techniques and their application to sales forecasting. We use a real-world sales dataset (retail) to demonstrate the use of various time series techniques such as decomposition, auto-correlation, and lag features. This report presents a solution for a case study in which we forecast the sales of retail stores. It supports strategic decisions on three levels: the featuring of data, decomposing the data, and applying the models. We also discuss the significance of feature engineering in time series analysis and demonstrate the time series features such as lag, date time, and windowing (rolling means). Then, we compare the performance of different time series models, such as naive (persistence), Moving Average, ARIMA, and SARIMAX. We conclude that time series analysis techniques are used correctly and can handle powerful tools for businesses to make accurate sales forecasts and make informed decisions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
  • Available as EPUB and PDF
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Lee HJ, Kim H, Oh S (2019) A hybrid model based on deep learning and ARIMA for demand forecasting. Sustainability 11(6):1576

Google Scholar  

Ayub N, Abbas H, Qamar MU, Ahmad J, Rehman MA (2020) A reinforcement learning approach for demand forecasting and inventory optimization. Neural Comput Appl 32(22):16641–16653

Wang Y, Huang L, Wang Z (2020) Random forest model for demand forecasting in manufacturing industry. Int J Adv Manuf Technol 108(7–8):2445–2457

Islam SU, Tareq MA, Hossain MM (2020) Demand forecasting for the manufacturing industry using ARIMA model. J Ind Eng Int 16(2):195–202

Gopalakrishnan T, Choudhary R, Prasad S (2018) Prediction of sales value in online shopping using linear regression. Int J Innov Res Comput Commun Eng 6(4):5–10

Elcio Tarallo E, Akabane GK, Shimabukuro CI, Mello J, Amancio D (2019) Machine learning in predicting demand for fast-moving consumer goods: an exploratory research. Expert Syst Appl 137:737–742

Andueza A, Arco-Osuna MAD, Fornés B, González-Crespo R, Martín-Álvarez J-M (2023) Using the statistical machine learning models ARIMA and SARIMA to measure the impact of Covid-19 on official provincial sales of cigarettes in Spain. Int J Interact Multimedia Artif Intell 8, Special Issue on AI-driven Algorithms and Applications in the Dynamic and Evolving Environments (1):73–87

Tsoumakas G (2018) A survey of machine learning techniques for food sales prediction. Artif Intell Rev 49(3):369–387

Singh K, Awasthi A, Singh SN (2020) A hybrid approach of time series decomposition and machine learning models for wind power forecasting. Renew Energy 146:1528–1540

Singh SK, Verma P (2017) Time series forecasting of petroleum production using ARIMA and exponential smoothing models. Energy Rep 3:94–98

Article   Google Scholar  

Yang L (2016) The combination forecasting model of auto sales based on seasonal index and RBF neural network. Int J Database Theory Appl 9(8):67–76

Yurtsever MV, Tecim V (2020) Prediction of epidemic trends in COVID-19 with logistic model and machine learning techniques. In: Tuncay-Celikel AT, Önder S, İlkan SS (eds) Economic and financial challenges for balkan and eastern European countries. Springer Proceedings in Business and Economics, pp 243–256

Chapter   Google Scholar  

Arif MAI, Sany SI, Nahin FI, Rabby ASA (2019) Comparison study: product demand forecasting with machine learning for shop. In: 2019 8th international conference on systematic innovation (ICSI), pp 123–128

Ohrimuk ES, Razmochaeva NV (2020) Study of supervised algorithms for solving the forecasting retail dynamics problem. Int J Emerg Technol Learn (iJET) 15(12):441–445

Krishna A, Akhilesh V, Aich A, Hegde C (2018) Sales-forecasting of retail stores using machine learning techniques. IEEE Int Conf Commun Signal Process (ICCSP) 2018:160–166

Pinho JM, Oliveira JM, Ramos P (2016) Sales forecasting in retail industry based on dynamic regression models. Advances in manufacturing technology. In: Proceedings of the AHFE 2016 international conference on human factors and system interactions, July 27–31, 2016, Walt Disney World®, Florida, USA, p 483

Wang J, Liu L (2019) A selection of advanced technologies for demand forecasting in the retail industry. In: 2019 international conference on artificial intelligence and advanced manufacturing (AIAM 2019), pp 317–320

Serkan Aras S, Kocakoc ID, Polat C (2017) Comparative study on retail sales forecasting between single and combination methods. J Bus Econ Manage 18(4):803–832

Download references

Author information

Authors and affiliations.

Department of Artificial Intelligence and Data Science, KCG College of Technology, Anna University, Chennai, India

Murugan Arunkumar, Sambandam Palaniappan, R. Sujithra & S. VijayPrakash

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Murugan Arunkumar .

Editor information

Editors and affiliations.

Department of Computer Science and Engineering, National Institute of Technology Agartala, Agartala, Tripura, India

Suyel Namasudra

Munesh Chandra Trivedi

School of Engineering and Technology, Universidad Internacional de La Rioja, Logroño, La Rioja, Spain

Ruben Gonzalez Crespo

University of Haute Alsace, Colmar, France

Pascal Lorenz

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Cite this paper.

Arunkumar, M., Palaniappan, S., Sujithra, R., VijayPrakash, S. (2024). Exploring Time Series Analysis Techniques for Sales Forecasting. In: Namasudra, S., Trivedi, M.C., Crespo, R.G., Lorenz, P. (eds) Data Science and Network Engineering. ICDSNE 2023. Lecture Notes in Networks and Systems, vol 791. Springer, Singapore. https://doi.org/10.1007/978-981-99-6755-1_4

Download citation

DOI : https://doi.org/10.1007/978-981-99-6755-1_4

Published : 03 November 2023

Publisher Name : Springer, Singapore

Print ISBN : 978-981-99-6754-4

Online ISBN : 978-981-99-6755-1

eBook Packages : Engineering Engineering (R0)

Share this paper

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research

logo

Forecasting & Time Series Analysis – Manufacturing Case Study Example (Part 1)

Today we are starting a new case study example series on YOU CANalytics involving forecasting and time series analysis. In this case study example, we will learn about time series analysis for a manufacturing operation. Time series analysis and modeling have many business and social applications. It is extensively used to forecast company sales, product demand, stock market trends, agricultural production etc. Before we learn more about forecasting let’s evaluate our own lives on a time scale:

Life is a Sine Wave

Time Series Analysis - Sine Curve

Time Series Analysis; Life’s Sine wave – by Roopam

I learnt a valuable lesson in life when I started my doctoral research in physics & nano-technology. I always loved physics, but during my doctoral studies, I was not enjoying the aspect of spending all my time in an isolated lab performing one experiment after another. Doing laboratory research could be extremely lonely. Additionally, I always enjoyed solving more applied and practical problems which I believed was missing in my research work. After getting frustrated for some time I decided to take some career advise from a trusted physicist friend. Before you read further, I must warn you that physicists as a community are usually mathematical, and occasionally philosophical. Physicists prefer to create a simple mathematical model about a complicated situation. They slowly add complexity to this simple model to make it fit with reality. The following is the key point I discovered during that conversation with my friend.

A simple model for life is a sine wave – where we go through ups and downs of moods and circumstances. Like a sine wave, we don’t spend much of our time either on the peaks or the troughs but most of our time is spent climbing up or sliding down. Now keeping these moods and circumstances cycle in mind, a perfect choice of career is where one could enjoy both climbs and slides – as the up and down cycle is inevitable in life.

Keeping the above in mind I prepared a list of keywords that I associated with a job that I can truly love to absorb the up and down cycle of life. The following is my list of keywords:

This prompted me to change my career from laboratory research to data science and business consulting. I am lucky that my career in data science and business analytics for over a decade has allowed me to check mark all these keywords.

Interference of Other waves

Sine Waves

This is what is displayed in the adjacent chart where the product of four harmonic sine waves is an irregular shape at the bottom. Eventually, our actual lives’ function looks more like an irregular pattern produced through the interference of several sine waves.

Time Series Analysis – Decomposition

Time Series Analysis - Decomposition

Now, let me try to create a connection between what we discussed above with time series analysis and forecasting. The fundamental idea for time series analysis is to decompose the original time series (sales, stock market trends, etc.) into several independent components. Typically, business time series are divided into the following four components:

  • Trend –  overall direction of the series i.e. upwards, downwards etc.
  • Seasonality – monthly or quarterly patterns
  • Cycle  –  long-term business cycles
  • Irregular remainder – random noise left after extraction of all the components  

Interference of these components produces the final series.

Now the question is: why bother decomposing the original / actual time series into components? The answer: It is much easier to forecast the individual regular patterns produced through decomposition of time series than the actual series. This is similar to reproduction and forecasting the individual sine waves (A, B, C, and D) instead of the final irregular pattern produced through the product of these four sine waves.

Time Series Analysis – Manufacturing Case Study Example

Tractor

You will start your investigation of this problem in the next part of this series using the concept discussed in this article. Eventually, you will develop an ARIMA model to forecast sale / demand for next year. Additionally, you will also investigate the impact of marketing program on sales by using an exogenous variable ARIMA model.

Sign (Sine) off Note

Whether you like it or not, life inevitably goes through up and down cycle. A perfect career or relationship doesn’t make the variability disappear from our lives but makes us appreciate the swings of life. They keep us going in the tough times. They make us realise that variability is beautiful!

26 thoughts on “ Forecasting & Time Series Analysis – Manufacturing Case Study Example (Part 1) ”

Hi Roopam, i’m a follower of your blog for quite a time now, i believe this is your crown jewel till now, i have been working on time series analysis for the last 8 years and this is the best explanation for trend and rationale for decomposing Time series, keep it up and looking forward for your next blog.

Thanks Khalid! It’s flattering to receive such an adulation from a long term professional of time series analysis. Hope you will enjoy the remaining parts as well.

Thanks roopam. You explain things better than any book or person I have encountered. You really really get your stuff and I (along with many others) really appreciate your knowledge sharing. Your articles are the holy grail for those who really want to understand these concepts. You should write a book on machine learning algo’s explained. I’d buy it!

Thanks Jason,

I do have plans to write a book with hands on application of all the case studies on YOU CANalytics along with data, exercises, and R / Python codes. The idea behind this book is to recreate actual thought process and effort of real data science and business analytics projects. I will need to take a few months off my schedule to draft this book, hopefully you will see something soon.

is the book out soon?

For now it is all on this blog. Will update about the book once I start to work on it. Thanks for checking though.

Very nicely written article.

Really one of the best explanation. I am a huge fan of yours.

Thanks. nice approach about the decomposing time series model.

Very nice article! you got me on the “I must warn you that physicists as a community are usually mathematical, and occasionally philosophical”! I’ll be glad to help HorsePower! Looking forward for part 2! and to “Don’t settle”!

BR. Antoine

Great explanation Roopam…

I follow your blog actively. I consider the time series to be the most complicated to understand and implement in a more practical way. While doing forecasting, is there any way to get into account external interference as well? In this case, can we factor in drought conditions and sudden war or even currency exchange? Hoping to clear all my apprehensions and putting the knowledge into practical use. I am hoping to buy your book soon 🙂

Thanks Bharath,

It is possible to incorporate external factors in a time series model, however sudden war and draught are one-off events and hence you would rather not include them in the final model but study their overall effect on variable of interest (say pre and post tractor sales). Other regular interval variables like currency exchange make much more sense in the model. We will study one such variable i.e. marketing expense in latter parts of this case.

Hi Roopam, Your blog is really great way to learn analytics. You connect the concepts behind various statistical techniques to life in a easy way. I often find books making these concepts difficult to understand and creating a disconnect between analytics professional and stakeholders (users) of analysis. Looking forward for your book.

Wow. I just found out your blog, and love it! So much to learn, and so much insights. Thank you for the great article.

I love the Blog!

Excellent article ….

Hello Roopam Sir….I definetly think you should make a youtube channel on predictive analytics. I am sure people like me who aspired to be data scientist and predictive modelling expert will learn a lot from it.

Thanks, Vikrant. Will keep this in mind. You might see something soon.

Hie Roopam!!

It is great pleasure to have accessed your resourceful blog on Time Series Forecasting using R. Your blog is so unbundling of the so much complex issues of time series analysis.

I am working on some rainfall time series data (annual totals) for 58 years. I want to model using ARIMA and have plotted the time series (original series) and it looks that the series is trend stationary. I further fitted a regression line and it shows a slight declining trend from the mean but insignificant. The ACF and PACF plots both show negative significant spikes at lag 6.

I further tested for stationarity of the series using KPSS and ADF and both give p-values greater than 0.05. I also performed the Box test and obtained a higher p-value. The auto.arima suggested that the model is just white noise, AR(0,0,0) Fitting the model using these parameters, I obtained just the intercept and s.e. and when I tried to get the summary of the fitted model, i obtained an output: fit1<-arima(annual_series2, order = c(0 ,0 ,0), method = "ML", include.mean = TRUE) fit1 summary(fit1) Call: arima(x = annual_series2, order = c(0, 0, 0), include.mean = TRUE, method = "ML")

Coefficients: intercept 986.3241 s.e. 26.1365

sigma^2 estimated as 39621: log likelihood = -389.32, aic = 780.65

Training set error measures: ME RMSE MAE MPE MAPE Training set NaN NaN NaN NaN NaN Warning message: In trainingaccuracy(f, test, d, D) : test elements must be within sample

*** What does this mean? Do I need to specify the number of observations to be used, I mean specify maximum number of lags when fitting the model?

And the ACF of residuals of the the fitted model still indicate a negative significant spike at lag 6

My questions are:

1. How do I use the ACF and PACF plots of the original series to choose the model parameter? should I attempt differencing the series? 2. From which, ACF or PACF are the possible model parameters identified, I mean AR(p, d, q) 3. When forecasting, if the series is made stationary by whatever method, what series should I use to forecast? The original or the last order of making the series stationary?

Help me please

I would appreciate if you email me to [email protected]

Thanks in advance

Hi Rupam, very nicely written article. Is is possible to get the sales data used in the example in Excel (text) file format?

I am not conversant with R or SPSS, but familiar with excel. I teach first year and second year MBA students the basics of forecasting.

As a teaching tool, I want to create spreadsheet to show to students how ACF and PACF are calculated. With the data, I can actually reconstruct ACF, PACF diagrams for students and also let them play with spreadsheet and get a feel of the whole process.

Thanks and Regards Amol

excellent article on time series analysis.

Hello Roopam,

Just wanted to thank you for your excellent blog! I like the creative story you wrap around the methodology and listing of the mechanics (in R).

Hi Roopam, I am very new to data science field and trying to start my career in this field.Due to this i was going through so many articles online since almost last 60 days. But to my surprise the way you explained the concepts of breaking a bigger problem into smaller ones wrt data science is remarkable. I must say its so easy that anyone can understand it and in my opinion the person who uses the language which is easy and can be understood by many have the basic concepts very clear. Pls keep on sharing ur knowledge it can help many. Regards, Chiranjiv

Hi Roopam, You have a great blog going on here. Most of the stuff out there on the internet is basic and helps aspirants in Data Science domain. But your blog is very useful for working professionals and I am glad that you have put it up together because people often forget that the ones who are actually working also require help to revisit certain topics. Keep it up

I am a fan of your storytelling. Keep up the amazing work

Leave a comment Cancel reply

Your email address will not be published. Required fields are marked *

Notify me of follow-up comments by email.

Notify me of new posts by email.

This site uses Akismet to reduce spam. Learn how your comment data is processed .

Search code, repositories, users, issues, pull requests...

Provide feedback.

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly.

To see all available qualifiers, see our documentation .

  • Notifications

The case time series design

gasparrini/CaseTimeSeries

Folders and files, repository files navigation.

A novel self-matched design for epidemiological investigations of transient health risks associated with time-varying exposures

This repository stores tutorials, updated R code, and data for case studies and simulations presented in the article:

Gasparrini A. The case time series design. Epidemiology . 2021;32(6)829-837. DOI: 10.1097/EDE.0000000000001410. PMID: 34432723 [ freely available here ]

The three folders refer to material originally presented as online appendices of the article. Specifically:

  • CTSclinepi provides a tutorial that illustrates the first case study presented in the article, with an application of the case time series design in clinincal epidemiology.
  • CTSenvepi provides a tutorial that illustrates the second case study presented in the article, with an application of the case time series design in environmental epidemiology.
  • SimulationStudy includes the material for the simulation study, where the new study design is evaluated under different scenarios of increasingly complex data settings.

The folders contain the main documents together with the related R code to replicate the results of the two case studies and the simulation study presented in the article. The code of the case studies creates and uses simulated data to reproduce the features of the original datasets, which cannot be made publicly available, and the steps to reproduce the (approximate) results. The Rmd file to create the tutorial documents is also included. The code for the simulation study replicates the results exactly.

The Case Time Series Design

Affiliations.

  • 1 Department of Public Health Environments and Society, London School of Hygiene & Tropical Medicine, London, United Kingdom.
  • 2 Centre for Statistical Methodology, London School of Hygiene & Tropical Medicine, London, United Kingdom.
  • PMID: 34432723
  • PMCID: PMC7611753
  • DOI: 10.1097/EDE.0000000000001410

Modern data linkage and technologies provide a way to reconstruct detailed longitudinal profiles of health outcomes and predictors at the individual or small-area level. Although these rich data resources offer the possibility to address epidemiologic questions that could not be feasibly examined using traditional studies, they require innovative analytical approaches. Here we present a new study design, called case time series, for epidemiologic investigations of transient health risks associated with time-varying exposures. This design combines a longitudinal structure and flexible control of time-varying confounders, typical of aggregated time series, with individual-level analysis and control-by-design of time-invariant between-subject differences, typical of self-matched methods such as case-crossover and self-controlled case series. The modeling framework is highly adaptable to various outcome and exposure definitions, and it is based on efficient estimation and computational methods that make it suitable for the analysis of highly informative longitudinal data resources. We assess the methodology in a simulation study that demonstrates its validity under defined assumptions in a wide range of data settings. We then illustrate the design in real-data examples: a first case study replicates an analysis on influenza infections and the risk of myocardial infarction using linked clinical datasets, while a second case study assesses the association between environmental exposures and respiratory symptoms using real-time measurements from a smartphone study. The case time series design represents a general and flexible tool, applicable in different epidemiologic areas for investigating transient associations with environmental factors, clinical conditions, or medications.

Copyright © 2021 The Author(s). Published by Wolters Kluwer Health, Inc.

Publication types

  • Research Support, Non-U.S. Gov't
  • Computer Simulation
  • Environmental Exposure* / analysis
  • Research Design*

Grants and funding

  • MR/R013349/1/MRC_/Medical Research Council/United Kingdom

More From Forbes

Don cheadle: mastering the craft of storytelling on and off screen.

  • Share to Facebook
  • Share to Twitter
  • Share to Linkedin

Don Cheadle

When it comes to the art of mastering one’s craft, Don Cheadle’s career serves as the ultimate case study. The award-winning actor has consistently showcased his adeptness at bringing a variety of roles to life through his dynamic storytelling skills, which hasn’t changed with his latest endeavor. Cheadle recently starred in the newest commercial, “Get Caught with Something Good™," for the popular snack brand PopCorners®, in which he portrayed a speakeasy host who serves patrons his favorite flavors of the famous brand. Cheadle was excited about the opportunity because it’s a brand that he actually enjoys.

“When PopCorners® reached out to me, it was great because it’s a product that I actually enjoy. It’s always fun to promote something that is a good alternative. It’s popped, not fried, which is a great addition for me and how I like to snack. It was a no-brainer when I got to work with the creative team to figure out a fun way to promote the brand in this new commercial. I was super excited when the opportunity came to collaborate on the creative with their team. We had fun leaning into the idea of it being so good they're worth enjoying in secret. I enjoyed getting to play the host with the most and a friend who didn't quite get the message.”

As someone who’s had a successful career in acting, this role falls in line with Cheadle’s previous projects, in which he provided comedic relief through his character depictions. When it comes to using his talents to push past barriers, the actor makes it a habit to search for roles that have a purpose in an effort to better connect with his audience.

“I believe what we do has a subversive ability to get past people’s walls and their protective reactions,” Cheadle told ForbesBLK. “When you can entertain with comedy and get people laughing, you can smuggle a lot of great things in. I am always looking for things that have meaning and can push everything forward in a way that doesn’t feel like medicine.”

While Cheadle has played numerous roles throughout his career, he attests that he’s learned from each experience and desires to continue growing in his craft.

“There hasn’t been a project that I haven’t learned about something, whether it be the subject matter, the person I am playing, or just acting and myself in general,” Cheadle said. “I am always trying to grow and continue to expand my experience so I can be better at my job. I am always trying to work smarter, not harder, and it has been a great journey.”

One Of The Best TV Shows Ever Made Sets Sail On Netflix Today For The Very First Time

Netflix renews and also cancels the witcher as first glimpse of season 4 drops, apple watch series 9 hits all time low special offer price.

As the entertainment industry continues to evolve, Cheadle feels that despite the many changes over the last few years, it’s become a bit more challenging to navigate.

“The industry got a lot more expansive with the introduction of streaming services, but conversely, streaming hurt the business. Everyone rushing to create content has skipped over determining if it’s worthy for the screen. After COVID-19 and the writer’s strike, our business looks a lot different than it did before. We are still trying to discover what it will be. It’s never been easy, but it’s a little trickier now.”

When he isn’t hard at work, the storyteller prefers to lean into rest as a form of self-care.

“I try to get a lot of sleep. When you’re working 13 or 14-hour days, you have to prioritize rest and recovery so you have something to bring to work every day,” Cheadle told ForbesBLK. “I always try to eat well, sleep well, get massages and take care of my voice. Work hard and rest hard, that’s the goal.”

Kenneth J. Williams Jr.

  • Editorial Standards
  • Reprints & Permissions
  • Open access
  • Published: 15 April 2024

Endoscopically-assisted extraction of broken roots or fragments within the mandibular canal: a retrospective case series study

  • Junqi Jiang 1 ,
  • Kenan Chen 1 ,
  • Enbo Wang 1 ,
  • Denghui Duan 1 &
  • Xiangliang Xu 1  

BMC Oral Health volume  24 , Article number:  456 ( 2024 ) Cite this article

114 Accesses

Metrics details

To assess the impact of endoscope-assisted fractured roots or fragments extraction within the mandibular canal, along with quantitative sensory testing (QST) alterations in the inferior alveolar nerve (IAN).

Six patients with lower lip numbness following mandibular third molar extraction were selected. All patients had broken roots or fragments within the mandibular canal that were extracted under real-time endoscopic assistance. Follow-up assessments were conducted on postoperative days 1, 7, and 35, including a standardized QST of the lower lip skin.

The average surgical duration was 32.5 min, with the IAN exposed in all cases. Two of the patient exhibited complete recovery of lower lip numbness, three experienced symptom improvement, and one patient remained unaffected 35 days after the surgery. Preoperative QST results showed that the mechanical detection and pain thresholds on the affected side were significantly higher than those on the healthy side, but improved significantly by postoperative day 7 in five patients, and returned to baseline in two patients on day 35. There were no significant differences in the remaining QST parameters.

Conclusions

All endoscopic surgical procedures were successfully completed without any additional postoperative complications. There were no cases of deterioration of IAN injury, and lower lip numbness recovered in the majority of cases. Endoscopy allowed direct visualization and examination of the affected nerve, facilitating a comprehensive analysis of the IAN.

Peer Review reports

Introduction

The extraction of the mandibular third molar (M3M), a common procedure in oral and maxillofacial surgery, is associated with a high incidence of root fractures. The standard approach is to extract all fractured roots due to the potential risk of infection. However, specific circumstances may permit the conservation of broken roots, for example when they are small, there are no surrounding lesions, or more invasive surgical procedures are required. Factors contributing to root displacement include aggressive clinical procedures, inadequate radiographic examination, and limited visibility [ 1 ].

Researchers have proposed several surgical techniques for the extraction of fractured roots, including extended full-thickness flaps, computer-assisted navigation, and endoscopic-assisted approaches [ 2 , 3 , 4 ]. The intrusion of fractured roots or crown fragments into the inferior alveolar nerve (IAN) can result in persistent postoperative numbness. Although rare, this can cause significant discomfort in the form of dysesthesia, anesthesia, paresthesia, or hyperalgesia of the skin, mucous membrane, and teeth innervated by IAN. The reported IANI incidence is 0.41-8.10% for temporary injuries and 0.01-3.60% for permanent ones [ 5 ]. Even slight damage have the potential to impact the patient’s physical and psychological health [ 6 ]. Consequently, dentists often opt to retain and observe fractured roots due to the difficulty in extracting them from the IAN under limited visibility, which could potentially aggravate the nerve damage. Conservative treatment poses certain risks, including persistent numbness, root infection, and root displacement, but these have not been investigated in previous studies.

Endoscopic techniques, serving as a magnifying optical tool, offer operative field magnification and digital recording [ 7 ]. Endoscopy has been widely used and reported in oral and maxillofacial surgery. In 2014, Engelke pioneered the use of endoscopy to remove impacted M3Ms without increasing the risk of IAN injury (IANI) [ 8 ]. Huang successfully removed residual M3M roots in the maxillofacial space using endoscopy, demonstrating the safety and efficiency of the procedure [ 9 ]. Our previous study demonstrated the feasibility of using endoscopy to extract impacted M3Ms adjacent to the IAN and intraoperatively observe IAN exposure [ 10 ]. Maxillofacial surgeons have emphasized the importance of adequate IAN visualization during surgical procedures for achieving predictable treatment outcomes and minimizing complications [ 11 ].

To evaluate thermal and mechanical somatosensory functions, the German Research Network on Neuropathic Pain has established a standardized Quantitative Sensory Testing (QST) protocol [ 12 ]. Yan et al. have validated the sensitivity of QST for detecting abnormalities in IAN function related to somatosensory aspects [ 13 ]. The complex situation involving fractured roots penetrating the IAN and the resulting lip numbness presents a challenging scenario with limited effective treatment options [ 14 ]. The present study explored the use of endoscopy to remove fractured roots or fragments from the IAN and utilized a standardized QST protocol to record somatosensory functional changes.

Patients and methods

Participants.

This study was performed in line with the principles of the Declaration of Helsinki. The biomedical ethics committee of Peking University Hospital of Stomatology approved this study (PKUSSIRB-201949142). All patients requested the removal of fractured roots and signed the informed consent formsand written informed consent was obtained from all participants. Between August 2020 and September 2023, the Department of Oral and Maxillofacial Surgery at Peking University School and Hospital of Stomatology, China, treated six patients with persistent lower lip numbness. The recruitment of patients was consecutive. All patients reported no improvement following the initial surgery, and had no other medical disorders that could affect the results. In five cases, fractured roots entered the IAN, while one case involved bone fragments compressing the IAN. Patient-related information, including age, sex, time between the initial operation, radiographic data, and outcomes, was collected. Cone-beam computed tomography (CBCT) images were acquired using 3D Accuitomo (J Morita Mfg. Corp., Kyoto, Japan), utilizing the following parameters: tube potential of 85–90 kVp, tube current of 5 mA, field of view of 6 cm × 6 cm, and a voxel size of 0.125 mm. The slice thickness and interval were both set at 0.2 mm. The CBCT images were used to determine the location of the residual root, and 3D reconstructions were performed using Mimics software (Mimics research 19.0, Materialise, Belgium) (Fig.  1 ).

Endoscopic system

The Storz Hopkins endoscope (Karl Storz, Tuttlingen, Germany; Cat. No. 20223020) and DELON endoscope (Beijing Fanxing Guangdian Medical Treatment Equipment Co., Beijing, China; Cat. No. UHD3840), with a 30° view angle and 4.0-mm diameter, were used in this study. A searching-unit medical endoscope with a cold light type camera was used to record the operation.

Surgical procedure

All the operations were performed by one surgeon whose level of expertise is the professor. Surgery was performed under local anesthesia (4% articaine with 1:100,000 epinephrine + 2% lidocaine). An angular incision was made on the buccal and distal mucosa of the mandibular second molar and a flap was raised. The endoscope was placed on either the buccal or lingual side, away from the surgical field, to show the entire operative area. All procedures related to residual roots or fragments removing were performed endoscopically. Initially, the granulation tissue was scraped from the socket to identify the roots. Following endoscopic localization of the residual roots or fragments, piezosurgery was used to remove bony hindrances. The residual roots or fragments were removed using microinstruments guided by real-time endoscopic visualization. A final assessment of the IAN was performed and recorded under endoscopic visualization (Fig.  2 ).

Clinical variables

The clinical variables were accessed and recorded by one assessor who was aware of the surgery.

The patients indicated their pain levels using a 10-point visual analog scale (VAS) after surgery.

Mouth opening was measured as the distance between the mesio-incisal edges of the upper and lower right central incisors at maximal mouth opening.

Facial swelling was evaluated using horizontal and vertical guides with a flexible tape on four reference points: tragus, outer corner of the mouth, outer canthus of the eye, and the mandibular angle.

  • Quantitative sensory testing

QST of the IAN involved four assessments on the skin over the mental foramina, both on the operative and contralateral sides. These evaluations were performed 1 week before surgery and 1, 7, and 35 days after the surgery in a quiet room at 21–23 °C by one assessor who was not aware of the surgery. The QST protocol comprised seven subtests, including 13 thermal and mechanical parameters. The parameters included cold detection threshold (CDT), warm detection threshold (WDT), thermal sensory limen (TSL), paradoxical heat sensation (PHS), cold pain threshold (CPT), heat pain threshold (HPT), mechanical detection threshold (MDT), mechanical pain threshold (MPT), dynamic mechanical allodynia (DMA), mechanical pain sensitivity (MPS), wind-up ratio (MUR), vibration detection threshold (VDT), and pressure pain threshold (PPT). The approach described by Yan et al. [ 13 ] was utilized for the assessments.

Table  1 presents the clinical data for the six patients, including four females and two males, with a mean age of 32 years (range: 25–47 years). The interval between the two surgeries ranged from 3 weeks to 1 year. The average second operation time was 32.5 min (range 20–44 min). Lower lip numbness recovered completely in two patients, partially improved in three patients, and remained unchanged in one patient. IAN was exposed in all patients during surgery, and no other postoperative complications occurred. Table  2 outlines the clinical evaluation variables for the six cases. The variables returned to normal 35 days after the operation.

The QST data exhibited minimal disparity before surgery, except for the MDT and MPT. Figure  3 provides the variations in MDT and MPT data for the six cases, demonstrating a significant improvement in five patients by postoperative day 7. Additionally, the affected side of two patients returned to the baseline health status by postoperative day 35.

The proximity of M3M roots to the IAN poses a risk of IANI, leading to lower lip numbness as a postoperative complication of tooth extraction. Even minor sensory changes resulting from IANI can affect the physical and psychological health of the patient [ 15 ]. Once fractured roots enter the IAN and the patient experiences persistent lip numbness, the impact of root removal on paresthesia is uncertain. Our study demonstrated that endoscopic removal of fractured M3M roots within the IAN can reduce lip numbness to varying degrees.

Operative visibility in the M3M region is typically poor due to oral cavity restrictions. The field of view is further reduced in case of fractured roots. Retrieval surgery becomes more challenging because of the increased difficulty and risk. Aznar-Arasa et al. suggested that removal might be unnecessary for small root fragments (< 5 mm) or in cases with a high risk of lingual nerve and IANI [ 16 ]. Anand et al. proposed leaving root fragments in the absence of associated symptoms or complications [ 17 ]. While observation and follow-up are often recommended, the rate of recovery of lower lip numbness remains unknown. There was no research on removing the fractured roots from IAN to relieve lower lip numbness till now. Current studies have mainly focused on the retrieval of fractured roots displaced into the maxillofacial region, relying on CT scans, computer-assisted navigation systems, or surgeon experience for root localization [ 2 , 3 , 4 , 18 ]. Patient positional changes and poor vision can increase operative difficulty. Compared with the conventional techniques, endoscopy could provide real-time clear operation field and a brighter light source during the surgery. Endoscopy, widely applied in maxillofacial surgery [ 5 , 6 , 7 , 8 ], was used in our previous study to accurately record IAN exposure and reduce the incidence of IANI after extraction [ 10 ]. In the present study, endoscopy was used to enhance real-time visualization of the surgical field, facilitating easier identification of roots and the IAN. Real-time endoscopic assistance, coupled with microscopic instruments, enabled effective extraction of broken roots within the IAN. The endoscopy, as a magnifying optical tool, provided more adequate and direct insight for those complex cases with difficult access than common magnifying glasses. The endoscopy with fine lens tips and different angles could facilitate improved visualization of deep tissue conditions. Meanwhile, endoscopy could record real-time images during the surgery, which could not be achieved by common magnifying glasses and loupes.

Accidental displacement of fractured roots is a rare but serious complication. The timing of removal of such roots remains a subject of debate [ 18 ]. Some studies have suggested early removal to reduce complications [ 1 , 9 , 18 ], while others have proposed delayed removal to promote fibrosis and root stabilization [ 17 ]. Our results indicate that when fractured roots enter the IAN, earlier removal correlates with better alleviation of lower lip numbness. The patient with the least favorable recovery underwent surgery after a 1-year interval, while those with complete recovery underwent surgery after 3 weeks. None of the patients experienced an increase in pain, swelling, or trismus postoperatively, indicating that endoscopy-assisted techniques did not increase surgical trauma. This finding highlights the minimal invasiveness of endoscopy-assisted techniques without compromising patient comfort.

All patients in our study presented with lower lip numbness following tooth extraction, raising suspicion of IAN compression by broken roots or fragments. A few studies have recommended surgical decompression as a valuable option for nerve injuries caused by endodontic material leaks within the mandibular canal [ 19 , 20 ]. Liang et al. reported recovery of IAN function after decompression of large mandibular cystic lesions [ 21 ], while endoscopic optic nerve decompression has shown benefits for traumatic optic neuropathy [ 22 ]. Thus, nerve decompression is a feasible approach for the management of nerve impairment. Our findings also suggest that extracting fractured roots or fragments effectively alleviates IAN compression, promoting the restoration of IAN function. Direct endoscopic observation of the IAN during surgery holds potential for the administration of neurotrophic medications and fostering further research.

QST is a highly sensitive approach for identifying somatosensory abnormalities, including lower lip numbness [ 23 ]. An advantage of QST is its ability to assess lower lip numbness using objective data. Consequently, this protocol has been extensively utilized in the oral-facial region. Despite a limited sample size, our study found low sensitivity for QST parameters, except for MDT and MPT. Thermal detection thresholds remained generally unchanged, indicating normal functioning of A fibers and C fibers [ 24 ]. Consistent with a study by Porporatti et al. [ 25 ], our results suggested that MDT and MPT were the most sensitive QST parameters for IANI. The MDT test was used to determine the minimal force required for subjects to perceive a gentle, non-painful touch [ 26 ]. Notably, one patient who was unable to perceive the largest touch before surgery demonstrated complete recovery during follow-up. Data from the affected side showed significant improvement, approaching baseline data of the healthy side in five patients. Improvements in MDT data indicated that extracting broken roots within the mandibular canal effectively reduces lip numbness. The MPT test, similar to MDT, assessed painful sensations [ 26 ]. MPT on the affected side was initially higher than that on the healthy side, but decreased after 35 days postoperatively. This indirectly indicates the minimal invasiveness of our surgical procedure without raising the pain threshold on the affected side. Thus, extracting broken roots may contribute to the recovery of pain thresholds. MDT and MPT could serve as viable methods for assessing neurological recovery, with results from larger samples to be presented in a future report.

The primary constraint of our study was the limited sample size because of the low incidence. However, this study firstly introduced a novel approach to extracting fractured roots or fragments within the mandibular canal. The improvement of lower lip numbness was observed during follow-up. We plan to collect more cases in the future and enhance the clinical application of endoscopy in the treatment of IANI.

For patients with lower lip numbness following M3M extraction, an endoscope may be used for the extraction of residual broken roots or fragments within the mandibular canal. There were no instances of deterioration of IANI, and the majority of cases exhibited recovery of lower lip numbness. Endoscopy allowed direct visualization and assessment of the affected nerve, thereby facilitating a thorough investigation and analysis of the IAN.

figure 1

CBCT showing the positional relationship between fractured roots and the mandibular canal (red outline). Mimics software was used to reconstruct the 3D relationship between fractured roots (red) and the IAN (blue)

figure 2

Protocol for fractured root extraction through endoscopic surgery. ( a ) Preoperative images; ( b ) Flap elevation and exposure; ( c ) Scraping of the granulation tissue; ( d ) Removal of the distal root; ( e ) Removal of the mesial root; ( f ) IAN exposure was recorded

figure 3

Variations in MDT and MPT data for the six cases. MDT mechanical detection threshold, MPT mechanical pain threshold

Data availability

All data generated or analysed during this study are included in this published article.

Di Nardo D, Mazzucchi G, Lollobrigida M, Passariello C, Guarnieri R, Galli M, De Biase A, Testarelli L. Immediate or delayed retrieval of the displaced third molar: a review. J Clin Exp Dent. 2019;11(1):e55–61.

PubMed   PubMed Central   Google Scholar  

Kato T, Watanabe T, Nakao K. An experience of displaced third molar roots removed using computer-assisted navigation system. J Stomatol Oral Maxillofac Surg. 2023;124(6):101442.

Article   PubMed   Google Scholar  

Huang Z, Huang Z, Zhang D, Hu H, Liang Q, Chen W. Endoscopically-assisted operations in the treatment of odontogenic peripheral osteomyelitis of the posterior mandible. Br J Oral Maxillofac Surg. 2016;54(5):542–6.

Varvara G, Murmura G, Cardelli P, De Angelis D, Caputi S, Sinjari B, Piattelli M. Mandibular third molar displaced in the sublingual space: clinical management and medicolegal considerations. J Biol Regul Homeost Agents. 2016;30:609–13.

CAS   PubMed   Google Scholar  

Moreno-Vicente J, Schiavone-Mussano R, Clemente-Salas E, Marí-Roig A, Jané-Salas E, López-López J. Coronectomy versus surgical removal of the lower third molars with a high risk of injury to the inferior alveolar nerve. A bibliographical review. Med Oral Patol Oral Cir Bucal. 2015;20(4):508–17.

Article   Google Scholar  

Carvalho RWF, do, Egito Vasconcelos BC. Assessment of factors associated with surgical difculty during removal of impacted lower third molars. J Oral Maxillofac Surg. 2011;69(10):2714-21.

Engelke W, Fuentes R, Beltrán V. Endoscopically assisted removal of a lingually displaced third molar adjacent to the inferior alveolar nerve. J Craniofac Surg. 2013;24(6):2171–4.

Engelke W, Beltrán V, Cantín M, Choi EJ, Navarro P, Fuentes R. Removal of impacted mandibular third molars using an inward fragmentation technique (IFT)-method and first results. J Craniomaxillofac Surg. 2014;42(3):213–9.

Huang ZQ, Huang ZX, Wang YY, Hu WJ, Fan S, Zhang DM, Chen WL. Removal of the residual roots of mandibular wisdom teeth in the lingual space of the mandible via endoscopy. Int J Oral Max Surg. 2015;44(3):400–3.

Article   CAS   Google Scholar  

Jiang JQ, Kang YF, Chen KN, Cui NH, Yan ZY, Guo CB, Wang EB, Xu XL. Endoscopic visualization of the inferior alveolar nerve associated with somatosensory changes after impacted mandibular third molar extraction. Odontology. 2023;111(4):982–92.

Article   PubMed   PubMed Central   Google Scholar  

Weckx A, Agbaje JO, Yi S, Jacobs R, Politis C. Visualization techniques of the inferior alveolar nerve (IAN): a narrative review. Surg Radiol Anat. 2015;38(1):55–63.

Rolke R, Magerl W, Campbell KA, Schalber C, Caspari S, Birklein F, Treede RD. Quantitative sensory testing: a comprehensive protocol for clinical trials. Eur J Pain. 2006;10(1):77–88.

Article   CAS   PubMed   Google Scholar  

Yan ZY, Yan XY, Guo CB, Xie QF, Yang GJ, Cui NH. Somatosensory changes in Chinese patients after coronectomy vs. total extraction of mandibular third molar: a prospective study. Clin Oral Invest. 2020;24(9):3017–28.

Wang Y, Mo X, Zhang J, Fan Y, Wang K, Peter S. Quantitative sensory testing (QST) in the orofacial region of healthy Chinese: influence of site, gender and age. Acta Odontol Scand. 2018;76(1):58–63.

Carvalho RWF, do, Egito Vasconcelos BC. Assessment of factors associated with surgical difculty during removal of impacted lower third molars. J Oral Maxillofac Surg. 2011;69(10):2714–2721.

Aznar-Arasa L, Figueiredo R, Gay-Escoda C. Iatrogenic displacement of lower third molar roots into the sublingual space: report of 6 cases. J Oral Maxillofac Surg. 2012;70:e107–15.

Anand R, Patil PM. Accidental displacement of third molars; report of three cases, review of literature and treatment recommendations. Oral Surg. 2013;6(1):2–8.

Huang IY, Wu CW, Worthington P. The displaced lower third molar: a literature review and suggestions for management. J Oral Maxillofac Surg. 2007;65(6):1186–90.

Bianchi B, Ferri A, Varazzani A, Bergonzani M, Sesenna E. Microsurgical decompression of inferior alveolar nerve after Endodontic Treatment complications. J Craniofac Surg. 2017;28(5):1365–8.

A-V,Bastien,Adnot,et al.Secondary surgical decompression of the inferior alveolar nerve after overfilling of endodontic sealer into the mandibular canal: case report and literature review. J Stomatol Oral Maxillofac Surg. 2017;118(6):389–92.

Liang YJ, He WJ, Zheng PB, Liao GQ. Inferior alveolar nerve function recovers after decompression of large mandibular cystic lesions. Oral Dis. 2015;21(5):674–8.

Sun J, Cai X, Zou W, Zhang J. Outcome of Endoscopic Optic nerve decompression for traumatic Optic Neuropathy. Ann Otol Rhinol Laryngol. 2021;130(1):56–9.

Juhl GI, Jensen TS, Norholt SE, Svensson P. Central sensitization phenomena after third molar surgery: a quantitative sensory testing study. Eur J Pain. 2008;12(1):116–27.

Juhl GI, Svensson P, Norholt SE, Jensen TS. Long-lasting mechanical sensitization following third molar surgery. J Orofac Pain. 2006;20(1):59–73.

PubMed   Google Scholar  

Porporatti AL, Bonjardim LR, Stuginski-Barbosa J, Bonfante EA, Costa YM, Rodrigues Conti PC. Pain from Dental Implant Placement, Inflammatory Pulpitis Pain, and Neuropathic Pain Present different somatosensory profiles. J Oral Facial Pain Headache. 2017;31(1):19–29.

Levin S, Pearsall G, Ruderman RJ. Von Frey’s method of measuring pressure sensibility in the hand: an engineering analysis of the Weinstein-Semmes pressure aesthesiometer. J Hand Surg Am. 1978;3(3):211–6.

Download references

Acknowledgements

We appreciated the patience and cooperation of all the participants involved in this study. The authors gratefully acknowledge doctors from the Department of Oral and Maxillofacial Surgery, Peking University School and Hospital of Stomatology, for their efforts on participant recruitment.

This work was supported by grants from Clinical Research Foundation of Peking University School and Hospital of Stomatology (PKUSS-2023CRF205), Beijing Municipal Science & Technology Commission (grant number Z201100005520055).

Author information

Authors and affiliations.

Department of Oral and Maxillofacial Surgery, National Center of Stomatology and National Clinical Research Center for Oral Diseases and National Engineering Research Center of Oral Biomaterials and Digital Medical Devices and Beijing Key Laboratory of Digital Stomatology and Research Center of Engineering and Technology for Computerized Dentistry Ministry of Health and NMPA Key Laboratory for Dental Materials, Peking University School and Hospital of Stomatology, No. 22 Zhongguancun South Avenue, Haidian District, Beijing, 100081, People’s Republic of China

Junqi Jiang, Kenan Chen, Enbo Wang, Denghui Duan & Xiangliang Xu

You can also search for this author in PubMed   Google Scholar

Contributions

Methodology, Xiangliang Xu, Enbo Wang; formal analysis, Junqi Jiang, Kenan Chen; data curation, Junqi Jiang, Kenan Chen, Denghui Duan; writing—original draft preparation, Junqi Jiang; writing—review and editing, Xiangliang Xu; visualization, Junqi Jiang; project administration, Xiangliang Xu. All authors have read and agreed to the published version of the manuscript.

Corresponding author

Correspondence to Xiangliang Xu .

Ethics declarations

Ethics approval and consent to participate.

The biomedical ethics committee of Peking University Hospital of Stomatology approved this study (PKUSSIRB-201949142). All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards. Written informed consent was obtained from all participants.

Consent for publication

Informed consent was obtained from the participants for the publication of the identity revealing information.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Jiang, J., Chen, K., Wang, E. et al. Endoscopically-assisted extraction of broken roots or fragments within the mandibular canal: a retrospective case series study. BMC Oral Health 24 , 456 (2024). https://doi.org/10.1186/s12903-024-04216-7

Download citation

Received : 22 February 2024

Accepted : 01 April 2024

Published : 15 April 2024

DOI : https://doi.org/10.1186/s12903-024-04216-7

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Fractured roots or fragments
  • Inferior alveolar nerve

BMC Oral Health

ISSN: 1472-6831

case study on time series

IMAGES

  1. PPT

    case study on time series

  2. Case study on time series prediction in JusticePC

    case study on time series

  3. Time Series in 5-Minutes, Part 6: Modeling Time Series Data

    case study on time series

  4. Time Series Analysis and Forecasting case study

    case study on time series

  5. An Introductory Study on Time Series Modeling and Forecasting / 978-3

    case study on time series

  6. Case Studies In Time Series Analysis by Zhongjie Xie · OverDrive

    case study on time series

VIDEO

  1. Fitting of Gompertz and Logistic curve || Time series || ISS study

  2. Least Square Method In Hindi

  3. Case study : Time to Move On? Career Advice for Entrepreneurs Preparing for the Next Stage

  4. Moving Averages Case Study

  5. Time Management Techniques Course

  6. CAIIB ABM UNIT 12 TIME Value Of Money

COMMENTS

  1. Time Series Forecasting: Use Cases and Examples

    Time series forecasting is hardly a new problem in data science and statistics. The term is self-explanatory and has been on business analysts' agenda for decades now: The very first instances of time series analysis and forecasting trace back to the early 1920s. Although an intern analyst today can work with time series in Excel, the growth of computing power and data tools allows for ...

  2. Understanding Time Series Analysis and Its Components

    A time series is a sequence of data points collected and ordered chronologically over time. It is characterized by its indexing in time, distinguishing it from other types of datasets. Time series metrics represent data tracked at regular intervals, such as inventory sold in a store from one day to the next. In investing, time series tracks the ...

  3. Chapter 3 Time series / case-crossover studies

    Chapter 3 Time series / case-crossover studies. We'll start by exploring common characteristics in time series data for environmental epidemiology. In the first half of the class, we're focusing on a very specific type of study—one that leverages large-scale vital statistics data, collected at a regular time scale (e.g., daily), combined with large-scale measurements of a climate-related ...

  4. The Case Time Series Design : Epidemiology

    using traditional studies, they require innovative analytical approaches. Here we present a new study design, called case time series, for epidemiologic investigations of transient health risks associated with time-varying exposures. This design combines a longitudinal structure and flexible control of time-varying confounders, typical of aggregated time series, with individual-level analysis ...

  5. The Ultimate Guide to Time-Series Analysis

    Time-series analysis is a statistical technique that deals with time-series data, or trend analysis. It involves the identification of patterns, trends, seasonality, and irregularities in the data observed over different time periods. This method is particularly useful for understanding the underlying structure and pattern of the data.

  6. Time Series Analysis: Definition, Types & Techniques

    Time series analysis is used for non-stationary data—things that are constantly fluctuating over time or are affected by time. Industries like finance, retail, and economics frequently use time series analysis because currency and sales are always changing. ... You'll find theory, examples, case studies, practices, and more in these books ...

  7. Time series forecasting and its practical use cases

    Epidemiological studies: Time series analysis is crucial in tracking the progression of diseases over time, aiding in epidemiological research and public health strategies. Staffing and scheduling : Forecasting patient flow enables hospitals and clinics to optimise staffing schedules, ensuring adequate healthcare provision at all times.

  8. A tutorial on the case time series design for small-area analysis

    The application of the case time series for small-area analyses is demonstrated using a real-data case study to assess the mortality risks associated with high temperature in the summers of 2006 and 2013 in London, UK. ... The case time series design embeds the longitudinal structure of time series data within the self-matched framework of case ...

  9. 16 Time Series case studies

    Figure 16.3: Facet plot with free y scale of Loney flux tower parameters. Now we'll build a time series for CO 2 for an 8-day period over the summer solstice, using the start time and frequency (there's also a time stamp, but this was easier, since I knew the data had no gaps): Figure 16.4: Loney CO 2 decomposition by day, 8-day period at ...

  10. Case Studies in Time Series Analysis

    ISBN: 978-981-4583-65-7 (ebook) USD 40.00. Description. Chapters. Reviews. This book is a monograph on case studies using time series analysis, which includes the main research works applied to practical projects by the author in the past 15 years. The works cover different problems in broad fields, such as: engineering, labour protection ...

  11. A Methodological Review of Time Series Forecasting with Deep ...

    While in case of more complex multivariate time series data for electrical load and price in combination with covariates as weather, stocks, etc., deep learning methods are observed to be more use full. ... M., Vyas, O.P. (2023). A Methodological Review of Time Series Forecasting with Deep Learning Model: A Case Study on Electricity Load and ...

  12. Time Series Forecasting

    A time series is a series of data points ordered in time. In a time series, time is often the independent variable, and the goal is usually… 5 min read · Nov 10, 2023

  13. The Case Time Series Design

    The case time series design represents a general and flexible tool, applicable in different epidemiologic areas for investigating transient associations with environmental factors, clinical conditions, or medications. Keywords: study design, self-matched, self-controlled, case-only, time series, epidemiological methods, longitudinal data, AirRater.

  14. A tutorial on the case time series design for small-area analysis

    The application of the case time series for small-area analyses is demonstrated using a real-data case study to assess the mortality risks associated with high temperature in the summers of 2006 and 2013 in London, UK. The example makes use of information on individual deaths, temperature, and socio-economic characteristics collected at ...

  15. Parallel Time Series Modeling

    Abstract. MADlibis an open-source library for scalable in-database analytics. In this paper, we present our parallel design of time series analysis and implementation of ARIMA modeling in MADlib's framework. The algorithms for fitting time series models are intrinsically sequential since any calculation for a specific time \ (t\) depends on ...

  16. Exploring Time Series Analysis Techniques for Sales Forecasting

    Time series analysis is a powerful tool for predicting sales, as it enables businesses to analyze and model data based on time-dependent patterns. This paper presents a comprehensive study of retail sales forecasting using a variety of time series analysis techniques, including decomposition, auto-correlation, and lag features, using a real ...

  17. Forecasting & Time Series Analysis

    Time Series Analysis - Manufacturing Case Study Example PowerHorse, a tractor and farm equipment manufacturing company, was established a few years after World War II. The company has shown a consistent growth in its revenue from tractor sales since its inception.

  18. PDF Case Study 3: Time Series

    An R script (\fm casestudy 1 0.r") collects daily US Treasury yield data from FRED, the Federal Reserve Economic Database, and stores them in the R workspace \casestudy 1.RData". The following commands re-load the data and evaluates the presence and nature of missing values. source("fm_casestudy_0_InstallOrLoadLibraries.r") # load the R ...

  19. Time Series Forecasting: A Case Study on Telecom Revenue

    Photo by insung yoon on Unsplash. Understanding revenue trends and accurately predicting future revenues are crucial aspects of running a successful business. This article will walk you through a case study of revenue forecasting for a telecommunications company using Time Series analysis and the ARIMA (Autoregressive Integrated Moving Average) model.

  20. PDF Time Series Forecasting Case Study

    forecast simulator was built based on an ensemble of LSTM & Autoregressive Time-Series Models to forecast 1, 2, 3, 6, 9, & 12 months into the future, correcting appropriately for serial correlation (the correlation over time of the impact of unobserved variables on the variable being predicted—in this case, demand).

  21. Space-time series clustering: Algorithms, taxonomy, and case study on

    Both standard time series databases 1 and a real case study on urban traffic intelligent transportation are analyzed as follows. 4.1. Case study: Urban traffic intelligent transportation. With the popularization of GPS and IT devices, urban traffic flow analysis has attracted growing attention in the last decades.

  22. GitHub

    The case time series design. A novel self-matched design for epidemiological investigations of transient health risks associated with time-varying exposures. This repository stores tutorials, updated R code, and data for case studies and simulations presented in the article: Gasparrini A. The case time series design.

  23. The Case Time Series Design

    The Case Time Series Design Epidemiology. 2021 Nov 1;32(6):829-837. doi: 10.1097/EDE.0000000000001410. ... Here we present a new study design, called case time series, for epidemiologic investigations of transient health risks associated with time-varying exposures. This design combines a longitudinal structure and flexible control of time ...

  24. Is Agricultural Production Responsible for Environmental Degradation in

    The study aims to investigate the impact of agricultural production on environmental degradation in the case of India, an emerging market economy, based on time series data from 1990 to 2020. Methane (CH 4), nitrous oxide (N 2 O), and carbon dioxide (CO 2) emissions have been used as indicators of degradation. Autoregressive distributive lag ...

  25. Remote Sensing

    A comprehensive analysis of a 5.5-year-long time series of its microwave backscatter with Synthetic Aperture Radar (SAR) images yielded significant insights into the dynamics of their crusts. ... "Spaceborne Radars for Mapping Surface and Subsurface Salt Pan Configuration: A Case Study of the Pozuelos Salt Flat in Northern Argentina" Remote ...

  26. Don Cheadle: Mastering The Craft Of Storytelling On And Off Screen

    When it comes to the art of mastering one's craft, Don Cheadle's career serves as the ultimate case study. Subscribe To Newsletters. ... Apple Watch Series 9 Hits All Time Low Special Offer Price.

  27. Endoscopically-assisted extraction of broken roots or fragments within

    To assess the impact of endoscope-assisted fractured roots or fragments extraction within the mandibular canal, along with quantitative sensory testing (QST) alterations in the inferior alveolar nerve (IAN). Six patients with lower lip numbness following mandibular third molar extraction were selected. All patients had broken roots or fragments within the mandibular canal that were extracted ...