Using the E4 to measure stress intensity through Machine Learning methods

This article has been guest-authored by Pekka Siirtola, an Adjunct Professor at Biomimetics and Intelligent Systems Group, University of Oulu, Finland. His research interests are related to artificial intelligence, machine learning, and how these can be used to find answers to health and wellbeing-related research questions. This piece is based on his study “Comparison of Regression and Classification Models for User-Independent and Personal Stress Detection”, published in August 2020.

Well-being at work and work efficiency are closely connected. This is because when employees have good well-being at work, they are more engaged, more motivated, and, most importantly, have fewer sick days. One of the main factors leading to reduced well-being and efficiency at work is stress. Therefore, to increase productivity, it is important to study what causes work-related stress. To investigate this efficiently, we need to first establish methods to measure stress.

Recently, there has been a lot of attention and studies on how to identify stress using wearable sensors. In fact, in many of these studies, stress has been detected with very high accuracy using machine learning and artificial intelligence methods applied to sensor data collected using wrist-worn wearable devices. The limitation of these studies is that they mostly only recognize if the person is stressed or not; in other words, they are based on classifying discrete target variables. However, similar to other emotions, stress is not a discrete phenomenon: the level of the stress of a person can be high or low, or anything in-between. It should therefore be analyzed based on continuous target values. This means that instead of using classification methods, which are used when data has discrete target variables, the data should be analyzed using regression methods as they are designed for continuous target variables.

A clear reason for the use of discrete target variables instead of continuous in most studies is that defining continuous target values for the training data is much more difficult than using discrete target values. Therefore, the decision to simplify the studied phenomenon by transforming a continuous phenomenon as discrete due to the difficulty to gather continuous target values has already been made in the data-gathering phase.

Would stress detection models then benefit from continuous target variables or not? Is it a waste of time to collect continuous target variables or not? This was studied in our article Siirtola & Röning (2020), published in Sensors (Basel) where regression and classification models for stress detection were compared to experiment if regression methods really outperform the classification method.


This study is based on a publicly open data set called AffectiveROAD. This data was originally used in Haouij et. al (2018) and after this, it was made publicly available. This high-quality dataset contains data from nine participants measured using Empatica E4 while persons were driving a car. In fact, participants wore the device on both wrists. As you may know, Empatica E4 includes accelerometers (ACC), as well as, sensors to measure skin temperature (ST), electrodermal activity (EDA), blood volume pulse (BVP), heart rate (HR), and heart rate variability (HRV); therefore, it contains more sensors than any other wrist-worn device in the market. Most importantly, the raw data provided by Empatica is of high quality. Due to this, Empatica E4 has been commonly used in studies related to stress and affect state recognition, and our study does not make a difference.

In the data gathering session, participants were asked to wear the E4 devices on both wrists and drive a car in normal traffic among other drivers. The session started with a rest period where the study subject was just sitting and resting in a car, eyes closed and engine running. The actual task consisted of driving on two types of roads and traffic conditions: at low speed in the city center, and at a much higher speed on the highway. The reason for driving on two types of roads was that city driving is assumed to be stressful as it contains traffic lights, a lot of vehicles, pedestrians, and cycles. On the other hand, the highway is a smooth road, and driving there is assumed to be less stressful. This way, it was possible to get variation to the dataset, so that data does not contain only one type and level of stress. This variation is important when training machine learning models that should be able to detect any type of stress.

What makes this dataset special and unique is that it includes continuous subjective stress estimates containing information on what the subjective stress estimate was at the point of driving. These estimates were collected by the experimenter sitting at the rear seat while the study subject was driving. Moreover, the driver validated these estimates after the session. The scale for estimates was from 0 (=no stress) to 1 (=maximum stress). Therefore, this dataset does not only assume that stress during driving at the city center was high and at the highway it was low; as a result, based on this data, regressions models to predict the level of stress can be trained.


To compare regression and classification models, a set of different features such as statistical features, were extracted from the E4 data. These were used to train the binary Random Forest classifier and Bagged Tree-based ensemble regression model. Both were trained using the leave-one-subject-out method, meaning that in turn, one person's data was used for testing and the others’ data was used for training. The classification model requires discrete target values, and therefore, for the classification model, target values were transformed as 1 and 0, so that data from baseline is labeled as 0 and data from driving as 1. However, the regression model was trained using continuous target values. On the other hand, as the outputs of the regression model are also continuous values, they were transformed to discrete in order to compare them to the results of the Random Forest classifier. This was done by finding an optimal threshold for each person to divide outputs as stress and non-stress to maximize the accuracy. In fact, this comparison shows the regression model classification algorithm. For instance, when the prediction was based on BVP and ST features, the average balanced accuracy was 74.1% with the classification model and 82.3% using the regression model. Similarly, sensitivity and specificity values are higher when the regression model was used instead of the classification model.

These results suggest that stress detection benefits from continuous target values. This was expected, as the level of the stress of a person can be high or low or anything in between, and because of this, the studied problem is a regression problem instead of a classification problem.

While regression models outperform classification models in stress detection, the real benefit of regression models is that, unlike classification, they can be used to estimate the level of stress. However, based on our experiments, this prediction is not very accurate, it worked quite well for some study subjects but not for all.

Our main goal for future work, therefore, is to study how the quality of this prediction could be improved.


The study shows that if continuous target values are available, regression models outperform classification models when it comes to user-independent stress detection. In addition, the study shows how capable the Empatica E4 is for stress detection. Unfortunately, continuous target variables are rarely available as defining them makes the data gathering process much more difficult compared to using discrete target values. As a result, stress detection based on classification models is an important topic to study also in the future.

Nevertheless, the next steps of this research field should be more concentrated on recognizing the amount of stress and not just concentrate on detecting stressful moments.

Finally, this study shows the importance of publicly open datasets. They can speed up the progress of this field, and any other field as well because with the help of open datasets experiments take less time, leading to new breakthroughs. Open datasets have especially been helpful during the pandemic as it has been impossible to organize own data-gathering sessions.

If you are interested in joining the growing number of researchers using the E4 wristband to collect real-time physiological data safely and continuously, you can reach out to us at And to learn more, simply visit our dedicated webpage.


The scientific paper “Comparison of Regression and Classification Models for User-Independent and Personal Stress Detection” was authored by Pekka Siirtola and Juha Röning.


Siirtola, P.; Röning, J. Comparison of Regression and Classification Models for User-Independent and Personal Stress Detection. Sensors 2020, 20, 4402.
Haouij, N.E.; Poggi, J.M.; Sevestre-Ghalila, S.; Ghozi, R.; Jaïdane, M. AffectiveROAD System and Database to Assess Driver’s Attention. In Proceedings of the 33rd Annual ACM Symposium on Applied Computing, Pau, France, 9–13 April 2018; pp. 800–803.