Longitudinal Investigation of Work Stressors Using Human Voice Features

doi:10.18267/j.aip.208

Acta Informatica Pragensia 2023, 12(1), 104-122 | DOI: 10.18267/j.aip.2085002

Longitudinal Investigation of Work Stressors Using Human Voice Features

Indhumathi Natarajan ORCID...¹, Maheswaran Shanmugam ORCID...¹, Samiappan Dhanalakshmi ORCID...², Santhosh Easwaramoorthy¹, Sethuraja Kuppusamy¹, Saravanan Balu¹: ¹ Department of Electronics and Communication Engineering, Kongu Engineering College, Erode-638060, India; ² SRM Institute of Science and Technology, Kattankulathur, Chennai, Tamilnadu, India

Stress is a part of everyone’s life. Any event or thought that makes you upset, furious or anxious can set it off. It will affect the human health mentally and physically and produce a negative impact on nervous and immune systems in our body. The human voice carries a lot of information about the person speaking. It also aids in determining a person's current state. In this proposed method, stress was detected using a deep learning model. Automatic stress detection is becoming an intriguing study topic as the necessity for communication between humans and intelligent systems rises. The hormone called cortisol can also be used to determine the body’s stress state. For most people, however, it is not a viable option. Speech features are particularly affected by stress, which is combined with the aim that voice data would serve as an easy-to-capture measure of everyday human stress levels and hence as an early warning signal of stress-related health problems. The proposed technique extracts Mel filter bank spectral coefficients from pre-processed voice input and the spectrum coefficients are extracted. The features of Mel frequency cepstral coefficients are applied to feed-forward networks and long short-term memory to predict the status of stress output using a binary decision, i.e., unstressed or stressed. The Mel spectrum and spectrogram output shows the variation in stressed and unstressed voice features. The results of the proposed method indicate better performance compared to an existing model. The model was developed as a web application to be used by workers to test their state of stress at any time.

Keywords: Stress; MFCC; Mel filter bank; FFT; Mel scale; Spectrogram; LSTM.

Received: October 29, 2022; Revised: February 4, 2023; Accepted: February 8, 2023; Prepublished online: March 1, 2023; Published: April 19, 2023 Show citation

Natarajan, I., Shanmugam, M., Dhanalakshmi, S., Easwaramoorthy, S., Kuppusamy, S., & Balu, S. (2023). Longitudinal Investigation of Work Stressors Using Human Voice Features. Acta Informatica Pragensia, 12(1), 104-122. doi: 10.18267/j.aip.208

Download citation

References

Akçay, M., & Oguz, K. K. (2020). Speech emotion recognition: Emotional models, databases, features, preprocessing methods, supporting modalities, and classifiers. Speech Communication, 116, 56-76. https://doi.org/10.1016/j.specom.2019.12.001 Go to original source...
AlShorman, O., Masadeh, M., Heyat, B. B., Akhtar, F., Almahasneh, H., Ashraf, G. M., & Alexiou, A. (2022). Frontal lobe real-time EEG analysis using machine learning techniques for mental stress detection. Journal of Integrative Neuroscience, 21(1), 020. https://doi.org/10.31083/j.jin2101020 Go to original source...
Archana, V.R., & Devaraju, B.M. (2020). Stress Detection Using Machine Learning Algorithms. International Journal of Research in Engineering, Science and Management, 3(8), 251-256.
Bandela, S. R., & Kumar, T. K. (2017). Stressed speech emotion recognition using feature fusion of teager energy operator and MFCC. In International Conference on Computing, Communication and Networking Technologies. IEEE. https://doi.org/10.1109/icccnt.2017.8204149 Go to original source...
Bartusiak, E. R., & Delp, E. J. (2021). Frequency Domain-Based Detection of Generated Audio. IS&T International Symposium on Electronic Imaging Science and Technology, 33(4), 273-277. https://doi.org/10.2352/issn.2470-1173.2021.4.mwsf-273 Go to original source...
Bou-Ghazale, S. E., & Hansen, J. H. L. (2000). A comparative study of traditional and newly proposed features for recognition of speech under stress. IEEE Transactions on Speech and Audio Processing, 8(4), 429-442. https://doi.org/10.1109/89.848224 Go to original source...
Burrowes, S. A., Goloubeva, O., Stafford, K. A., McArdle, P., Goyal, M., Peterlin, B. M., Haythornthwaite, J. A., & Seminowicz, D. A. (2022). Enhanced mindfulness-based stress reduction in episodic migraine-effects on sleep quality, anxiety, stress, and depression: a secondary analysis of a randomized clinical trial. Pain, 163(3), 436-444. https://doi.org/10.1097/j.pain.0000000000002372 Go to original source...
CREMA-D. (2019). Crowd Sourced Emotional Multimodal Actors Dataset (CREMA-D). https://www.kaggle.com/datasets/ejlok1/cremad
Dhole, N., & Kale, S. (2020). Stress Detection in Speech Signal Using Machine Learning and AI. In Machine Learning and Information Processing. Advances in Intelligent Systems and Computing (pp. 11-26). Springer. https://doi.org/10.1007/978-981-15-1884-3_2 Go to original source...
Dymecka, J., Gerymski, R., & Machnik-Czerwik, A. (2022). How does stress affect life satisfaction during the COVID-19 pandemic? Moderated mediation analysis of sense of coherence and fear of coronavirus. Psychology Health & Medicine, 27(1), 280-288. https://doi.org/10.1080/13548506.2021.1906436 Go to original source...
Fernandes, S. V., & Ullah, M. W. (2021). Development of Spectral Speech Features for Deception Detection Using Neural Networks. In 2021 IEEE 12th Annual Information Technology, Electronics and Mobile Communication Conference (IEMCON). IEEE. https://doi.org/10.1109/iemcon53756.2021.9623077 Go to original source...
Firoz S. A., Raji, S. A., & Babu A. P. (2009). Automatic Stress Detection from Speech by Using Discrete Wavelet Transforms. In Proceedings of National Conference on Information Technology & Business Intelligence, (pp. 1-5). India. https://www.researchgate.net/publication/200706300_Automatic_Stress_Detection_from_Speech_by_Using_Discrete_Wavelet_Transforms
Gupta M., & Vaikole, S. (2022). Audio Signal Based Stress Recognition System using AI and Machine Learning. Journal of Algebraic Statistics, 13(2), 1731-1740.
Hansen, J. H. L. (1996). Analysis and compensation of speech under stress and noise for environmental robustness in speech recognition. Speech Communication, 20(1-2), 151-173. https://doi.org/10.1016/s0167-6393(96)00050-7 Go to original source...
He, L., Lech, M., Maddage, N. C., & Allen, N. B. (2011). Study of empirical mode decomposition and spectral analysis for stress and emotion classification in natural speech. Biomedical Signal Processing and Control, 6(2), 139-146. https://doi.org/10.1016/j.bspc.2010.11.001 Go to original source...
Hilmy, M. F., Asnawi, A. L., Jusoh, A., Abdullah, K., Ibrahim, S. F., Ramli, H. a. M., & Azmin, N. F. M. (2021). Stress Classification based on Speech Analysis of MFCC Feature via Machine Learning. In International Conference on Computer and Communication Engineering, (pp. 339-343). IEEE. https://doi.org/10.1109/iccce50029.2021.9467176 Go to original source...
Kalatzantonakis-Jullien, G., Stefanakis, N., & Giannakakis, G. (2021). Investigation and ordinal modelling of vocal features for stress detection in speech. In Affective Computing and Intelligent Interaction. IEEE. https://doi.org/10.1109/acii52823.2021.9597430 Go to original source...
Kejriwal, J., Benus, S., & Trnka, M. (2022). Stress detection using non-semantic speech representation. In 2022 32nd International Conference Radioelektronika (RADIOELEKTRONIKA). IEEE. https://doi.org/10.1109/radioelektronika54537.2022.9764916 Go to original source...
Kurniawan, H., Maslov, A. V., & Pechenizkiy, M. (2013). Stress detection from speech and Galvanic Skin Response signals. In Proceedings of IEEE International Symposium on Computer-Based Medical Systems, (pp. 209-214). IEEE. https://doi.org/10.1109/cbms.2013.6627790 Go to original source...
Langari, S., Marvi, H., & Zahedi, M. (2020). Efficient speech emotion recognition using modified feature extraction. Informatics in Medicine Unlocked, 20, 100424. https://doi.org/10.1016/j.imu.2020.100424 Go to original source...
Li, C., Liu, J., & Xia, S. (2007). English sentence stress detection system based on HMM framework. Applied Mathematics and Computation, 185(2), 759-768. https://doi.org/10.1016/j.amc.2006.06.081 Go to original source...
Li, X., Tao, J., Johnson, M., Soltis, J., Savage, A., Leong, K. M., & Newman, J. D. (2007). Stress and Emotion Classification using Jitter and Shimmer Features. In International Conference on Acoustics, Speech, and Signal Processing. IEEE. https://doi.org/10.1109/icassp.2007.367261 Go to original source...
Lieskovska, E., Jakubec, M., Jarina, R., & Chmulik, M. (2021). A Review on Speech Emotion Recognition Using Deep Learning and Attention Mechanism. Electronics, 10(10), 1163. https://doi.org/10.3390/electronics10101163 Go to original source...
Lindasalwa Muda, Mumtaj Begam, & Elamvazuthi, I. (2010). Voice Recognition Algorithms using Mel Frequency Cepstral Coefficient (MFCC) and Dynamic Time Warping (DTW) Techniques. https://arxiv.org/abs/1003.4083
Lu, H., Frauendorfer, D., Rabbi, M., Mast, M.S., Chittaranjan, G.T, Campbell, A.T., Gatica-Perez, D., & Choudhury, T. (2012). Stresssense: Detecting stress in unconstrained acoustic environments using smartphones. In Proceedings of ACM conference on ubiquitous computing, (pp. 351-360). ACM. https://doi.org/10.1145/2370216.2370270 Go to original source...
Nassif, A. B., Shahin, I., Elnagar, A., Velayudhan, D., Alhudhaif, A., & Polat, K. (2022). Emotional speaker identification using a novel capsule nets model. Expert Systems With Applications, 193, 116469. https://doi.org/10.1016/j.eswa.2021.116469 Go to original source...
Nassif, A. B., Shahin, I., Hamsa, S., Nemmour, N., & Hirose, K. (2021). CASA-based speaker identification using cascaded GMM-CNN classifier in noisy and emotional talking conditions. Applied Soft Computing, 103, 107141. https://doi.org/10.1016/j.asoc.2021.107141 Go to original source...
Prabhu, Ram, N., Meeradevi, T., Vibin, Mammen, Vinod., Gothainayaki, A., Anusha, S., & Agalya, T. (2021) Comparative analysis for offensive language identification of tamil text using SVM and logistic classifier. In Proceedings of CEUR Workshop Proceedings, 3159, (pp. 976-983). India.
Reddy, V. R., Maity, S., & Rao, K. J. (2013). Identification of Indian languages using multi-level spectral and prosodic features. International Journal of Speech Technology, 16(4), 489-511. https://doi.org/10.1007/s10772-013-9198-0 Go to original source...
Robinson, L. E., Valido, A., Drescher, A., Woolweaver, A. B., Espelage, D. L., LoMurray, S., Long, A. C. J., Wright, A. A., & Dailey, M. J. (2023). Teachers, Stress, and the COVID-19 Pandemic: A Qualitative Analysis. School Mental Health, 15, 78-89. https://doi.org/10.1007/s12310-022-09533-2 Go to original source...
Rupasinghe, L., Alahendra, A.M.A.T., Ranathunge, R.A.D., & Perera, P.D. (2021). Robust Speech Analysis Framework Using CNN. In 2021 3rd International Conference on Advancements in Computing (ICAC), (pp. 485-490). IEEE. https://doi.org/10.1109/icac54203.2021.9671080 Go to original source...
Saeed, S. A., & Gargano, S. P. (2022). Natural disasters and mental health. International Review of Psychiatry, 34(1), 16-25. https://doi.org/10.1080/09540261.2022.2037524 Go to original source...
Schneiderman, N., Ironson, G., & Siegel, S. D. (2005). Stress and Health: Psychological, Behavioral, and Biological Determinants. Annual Review of Clinical Psychology, 1(1), 607-628. https://doi.org/10.1146/annurev.clinpsy.1.102803.144141 Go to original source...
Simantiraki, O., Giannakakis, G., Pampouchidou, A., & Tsiknakis, M. (2016). Stress Detection from Speech Using Spectral Slope Measurements. In Pervasive Computing Paradigms for Mental Health. FABULOUS MindCare IIOT 2016 2016 2015, (pp. 41-50). Springer. https://doi.org/10.1007/978-3-319-74935-8_5 Go to original source...
Soury, M., & Devillers, L. (2013). Stress Detection from Audio on Multiple Window Analysis Size in a Public Speaking Task. In 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction. IEEE. https://doi.org/10.1109/acii.2013.93 Go to original source...
Stanek, M., & Sigmund, M. (2015). Finding the Most Uniform Changes in Vowel Polygon Caused by Psychological Stress. Radioengineering, 24(2), 604-609. https://doi.org/10.13164/re.2015.0604 Go to original source...
Stephanie, W. (2022). Overview of Biofeedback. https://www.webmd.com/pain-management/biofeedback-therapy-uses-benefits
Tiwari, P. K., & Darji, A. D. (2022). Pertinent feature selection techniques for automatic emotion recognition in stressed speech. International Journal of Speech Technology, 25(2), 511-526. https://doi.org/10.1007/s10772-022-09978-5 Go to original source...
Vaikole, S., Mulajkar, S., More, A., Jayaswal, P., & Dhas, S. (2020). Stress Detection through Speech Analysis using Machine Learning. International Journal of Creative Research Thoughts, 8(5), 2239-2246.
Wang, K., An, N., Li, B., Zhang, Y., & Li, L. (2015). Speech Emotion Recognition Using Fourier Parameters. IEEE Transactions on Affective Computing, 6(1), 69-75. https://doi.org/10.1109/taffc.2015.2392101 Go to original source...
WHO. (2020). Doing What Matters in Times of Stress: An Illustrated Guide. WHO. https://www.who.int/publications/i/item/9789240003927
Yousefi, M., & Hansen, J. H. L. (2021). Block-Based High Performance CNN Architectures for Frame-Level Overlapping Speech Detection. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 29, 28-40. https://doi.org/10.1109/taslp.2020.3036237 Go to original source...
Zhou, G. J., Hansen, J. H. L., & Kaiser, J. F. (2001). Nonlinear feature based classification of speech under stress. IEEE Transactions on Speech and Audio Processing, 9(3), 201-216. https://doi.org/10.1109/89.905995 Go to original source...

This is an open access article distributed under the terms of the Creative Commons Attribution 4.0 International License (CC BY 4.0), which permits use, distribution, and reproduction in any medium, provided the original publication is properly cited. No use, distribution or reproduction is permitted which does not comply with these terms.

Return to the content