E4 Accuracy against gold standards

This blog article has been guest-authored by Luca Menghini, a Ph.D. candidate at the Psychophysiology Lab of the University of Padua, Italy. Luca’s research interests focus on the validation and use of wearable technology for measuring biosignals associated with stress—particularly workplace stress, sleep, and health.

Wearable technology has extensively developed in recent years, opening an unprecedented window of opportunity for the long-term unobtrusive assessment of psychophysiological processes in daily life. Particularly in stress research, it can be used to investigate both short-term reactivity to daily stressors (e.g., job demand), and the prolonged activation possibly following the exposure to stressors (e.g., during evening time), even at night (sleep autonomic functioning). If wearables are evidently useful to move physiological measurement outside the laboratory setting, they can also substantially shorten the time required for laboratory assessment, while improving its ecological validity.
Unfortunately, most wearable devices are consumer-oriented, use undisclosed algorithms, and only provide summaries (e.g., average heart rate per minute) or proprietary metrics (e.g., “recovery score”) instead of raw data, implying excessive uncertainty to be used in scientific research. That is why we focused on the E4 wristband, a research-grade multisensor device that provides access to the raw data continuously recorded from the wrist of the participant. Specifically, in our study, recently published on Psychophysiology, we evaluated the accuracy of E4-derived measures of heart rate, heart rate variability (HRV), and skin conductance compared to gold-standard measurements.

Validation protocol

To evaluate the E4 accuracy, we asked 40 healthy volunteers to perform a set of tasks among the most widely used in psychophysiological reactivity research: seated rest, paced breathing, the orthostatic test (standing up and remain upright without moving), the Stroop test, and the speech test. The latter task is the standard for artificially eliciting stress responses in the lab while maintaining a high degree of ecological validity. It consists of asking participants to (1) prepare a short speech to be (2) delivered to a panel of evaluators, followed by (3) a recovery period. Further ecological conditions that we included to evaluate the E4 sensitivity to motion artifacts were slow walking and keyboard typing. Each task took three minutes, and the whole procedure took approximately 40 minutes. Before performing the first task, participants were prepared with the E4 on their non-dominant wrist, and with the gold standard sensors, namely electrocardiography (ECG) and wrist skin conductance. The E4 recording was manually synchronized with that of gold standards.


Average heart rate was the most accurately measured metric in each condition, with the E4 showing strong correlations with ECG even in those conditions entailing more movements (slow walking, keyboard typing, and public speech). Systematic differences between E4 and ECG heart rate ranged from -0.01 ± 0.1 bpm (seated rest) to -2.5 ± 12.5 bpm.
In contrast, the accuracy of HRV measurements was more variable across tasks, being lowered by both physical (body movements) and mental stress (e.g., performing the Stroop test or preparing the speech). Several widely used HRV metrics were computed, including the standard deviation of NN intervals (SDNN), the root mean square of successive differences in NN intervals (RMSSD), and the spectral power in the low (LF, 0.04 - 0.15 Hz) and high-frequency range (HF, 0.15 - 0.4 Hz). With only slight differences between HRV metrics, we found acceptable accuracy only in the first seated rest and paced breathing conditions, whereas accuracy was strongly reduced during the public speech, the Stroop test, and especially during the orthostatic test, and the slow walking and keyboard typing conditions. Conditions involving motion implied a substantial loss of data (inter-beat intervals) resulting in non-comparable signals between E4 and ECG, whereas exposure to, and recovery from, cognitive (Stroop) and emotional stress (public speech) resulted in comparable but quite inaccurate measurements.
Confirming previous results from the literature, the study indicated that the correlation between EDA signals recorded on different locations can show relevant inter subjects differences, and also varies based on the type of the task, which can be ascribed to the different anatomical location of the electrodes.

Finally, we explored the potential predictors of E4 accuracy, by regressing the percent error between E4 and gold standards on a number of factors including wrist body mass index, skin tone, and wrist circumference. Only the latter factor, in addition to wrist acceleration, was found to predict larger measurement errors in HRV measurements.

Conclusions and recommendations

Our study provided evidence supporting the accuracy of the E4 wristband in measuring heart rate and HRV under several conditions commonly used in stress research. The device provides acceptably accurate heart rate measures in almost all tasks, whereas HRV is accurately measured only under motion-free rest or paced breathing conditions that are not preceded by acute physical or mental stress. In fact, the only difference between the first (seated rest) and the last task (speech recovery) was the stressful situation (public speech) preceding the latter. This might be due to the pulse transit time (the time required for the pulse wave to reach the body periphery from the heart), which can be influenced by stress-related processes resulting in larger discrepancies between ECG- and photoplethysmography-derived inter-beat intervals.
Future studies using the E4 for measuring HRV metrics should be limited to rest periods when the E4 showed the highest accuracy, definitely acceptable for most applications. Also, participants should be instructed to minimize their movements during the recordings. An adaptation period (to be excluded from the analyses) of 2-3 minutes can help to minimize the influence of preceding physical activity and mental stress. To fully exploit the E4 potentiality of providing access to the raw data, a mix of manual procedures based on visual inspection, and automatic artifact processing (e.g., using the freely available ARTiiFACT software) is also recommended. Special caution should be taken when recording HRV from participants with large wrist circumferences (and body mass index, due to its strong correlation with wrist circumference).
As expected from previous work in literature, we were not able to reach a similar level of accuracy when comparing the EDA signals coming from the wrist and from the finger. Physiological differences in the distribution and in the innervation of the eccrine glands among the two sites were probably among the causes of the different behavior between the recordings. A promising solution to improve the E4 accuracy when compared to finger measurements relies on the recently introduced lead wire extension, allowing to move skin conductance recordings from the wrist to the fingers or the palmar surface of the hand, and thus eliminating any potential site difference.

If you are interested in joining the growing number of researchers using the E4 wristband to collect real-time physiological data safely and continuously, you can reach out to us at And to learn more, simply visit our dedicated webpage.


The scientific paper “Stressing the accuracy: Wrist‐worn wearable sensor validation over different conditions” was authored by Luca Menghini, Evelyn Gianfranchi, Nicola Cellini, Elisabetta Patron, Mariaelena Tagliabue, and Michela Sarlo.