Skip to content

Time series analysis – but correct!

How the choice of training data affects the practicality of models

by DI Dr. Alexander Maletzky

Time series data, for example, machine data in industry or vital signs in medicine, are nowadays an important data source for the analysis of complex systems. Modern analysis systems are mostly based on machine learning methods, i.e., learned prediction models, and draw on these data sources. However, for the development of practical models, the right choice of training data is a challenging task.

Table of contents

  • The problem: The right length of the sequences
  • A concrete example from intensive care
  • Conclusion
  • Author
  • Further information
  • Sources

The problem: The right length of the sequences

Time series data are usually recorded automatically by sensors at regular intervals, and can be visualized as a line graph as shown in Figure 1. As explained in the technical paper Exploratory Data Analysis with Time Series (p.18), visual inspection of time series data is an important step in the data analysis workflow, which poses some difficulties. Even more challenging, however, is automatic time series analysis, where an AI model independently classifies time series, detects anomalies, or predicts the future course of a time series. Nowadays, models of this kind are mostly based on machine learning methods, i.e. they “learn“ to make the right decisions independently based on training data. One of the main tasks of the developers of the models is – in addition to the selection of the appropriate model class and parameters – above all the selection of the training data. Time series are usually available as long sequences of measured values that extend over longer periods of time. Depending on the field of application, however, models should be able to make valid decisions already on the basis of comparatively short excerpts, and must therefore also be trained on such excerpts – and how these are selected has a strong influence on the practical suitability of the resulting models. On the one hand, the choice has to be made in such a way that no so-called sampling bias arises, i.e. the samples adequately reflect the different aspects of the time series (curve morphology, periodicity, trends, etc.). On the other hand, the trained models should perform correctly on those events that are of particular interest to the user. If these occur only rarely, they must be taken into account accordingly disproportionately in the model generation, which in turn can lead to a sampling bias.

Mean arterial blood pressure (MAP)

Figure 1: Mean arterial blood pressure (MAP) of an intensive care patient for 30 minutes, with one reading per second. The green line indicates the value above which the blood pressure is considered normal, and the orange line that value which represents a critical drop in blood pressure.

A concrete example from intensive care

In intensive care medicine, the condition of patients is continuously monitored to enable rapid intervention by nursing staff if necessary. Particular attention is paid to acute hypotensive episodes (AHEs), i.e., critical drops in blood pressure that can lead to irreparable damage. The prediction of future AHEs in the form of an early warning system, in order to be able to take countermeasures before they occur, is currently a highly regarded research topic in the field of artificial intelligence [1, 2].

Researchers from the Department of Medical Informatics at RISC Software GmbH are currently working on this issue in the MC³ project together with research partners from MedCampus III at Kepler University Hospital and the Institute for Machine Learning at JKU Linz.

Model development: sample selection for high classification accuracy

One possible strategy for selecting training samples is based on the time series of mean arterial blood pressure (MAP; see Figure 1): Whenever the MAP falls below the critical value of 65mmHg, a positively labeled sample is selected, i.e. a short observation window based on which the model should later be able to predict the upcoming drop and trigger an alarm. If instead the MAP remains constant above 75mmHg for a longer period of time, a negatively labeled sample is selected in it, i.e. here the model should not trigger an alarm. Figure 2 shows the sample selection schematically. In both cases, various time series data of the patient in the observation window, e.g. MAP, heart rate and oxygen saturation, serve as input data for the model.

Schematic representation of the sample selection

Figure 2: Schematic representation of sample selection as a function of mean arterial blood pressure (MAP).

Classification models trained on these training samples achieve high classification accuracy on the independent test set (which is generated according to the same scheme as the training set). Thus, nothing stands in the way of a practical application.

Use in practice: Where is the error?

Of course, the developed model was not immediately used in the hospital, but the practical application was first simulated in a test environment. This showed that the model triggers alarms almost continuously, even when there is no AHE in sight far and wide. Although the classification accuracy on the test set is very good, the model does not work in practice.

Error analysis: selection of training samples

What is the reason for the model‘s unsuitability for practice? As has been shown, the training samples contain only “extreme examples“ that can be easily classified but cover only a small part of the spectrum of possibilities that occur in reality.

This is because the MAP usually changes only slowly, i.e., is usually significantly lower at the end of the observation window of positively labeled samples than of negatively labeled samples. The model ignores all time information and all other time series and only pays attention to the last available MAP value: If this is rather high, no alarm is triggered, otherwise it is. This does not work in practice, because the MAP then often moves in a “gray zone“ that does not occur in the training samples.

How to do it better?

Sampling bias can be avoided by selecting training samples either randomly or regularly (e.g., every 10 minutes), regardless of MAP. However, such an approach brings other problems: on the one hand, classifying a sample into “positive“ (MAP will fall below critical value) and “negative“ (MAP will remain normal) is no longer so simple, because what to do if MAP remains above 65mmHg, but only just? It therefore makes more sense not to train a classification model but a regression model, for example to predict the exact MAP value 15 minutes later.

Another problem is the phenomenon discovered in the course of the failure analysis that the MAP usually drops only slowly. From a medical point of view, it is precisely those (rare) cases where the MAP drops rapidly that are interesting, because an early warning system only makes sense in such cases. In order to “sensitize“ the model for such situations, they can be given higher importance during training. Researchers of the MC3 project are currently training prediction models for acute hypotensive episodes based on the new approach. If they prove to be practical, they could support nursing staff in intensive care units in the near future.


As always in machine learning, a comprehensive understanding of the data and the use case is essential for the development of well-functioning, practical models – not only in the medical environment. Often, both domain knowledge and exploratory data analysis (see Exploratory Data Analysis with Time Series) are necessary to extract the necessary information, identify potential problems early on, and adjust model development accordingly. As explained, this includes in particular training samples, the correct choice of which plays a central role especially in the case of time series data.

Further Information

Technologiestack: Python 3, mit den einschlägigen Datenanalyse-Packages (Pandas, scikit-learn, etc.).

MC³: Medical Cognitive Computing Center, gemeinsames Forschungsprojekt von Kepler Universitätsklinikum Linz / MedCampus III, Johannes Kepler Universität Linz / Institut für Machine Learning, und RISC Software GmbH / Abteilung Medizininformatik.



    DI Dr. Alexander Maletzky

    Researcher & Developer


    F Hatib et al. Machine-learning Algorithm to Predict Hypotension Based on High-fidelity Arterial Pressure Waveform Analysis. Anesthesiology 129(4): 663-674, 2018. DOI: 10.1097/ALN.0000000000002300

    S Hyland et al. Early prediction of circulatory failure in the intensive care unit using machine learning. Nature Medicine 26(3): 364-373, 2020. DOI: 10.1038/s41591-020-0789-4