INTERNATIONAL JOURNAL OF NOVEL RESEARCH AND DEVELOPMENT International Peer Reviewed & Refereed Journals, Open Access Journal ISSN Approved Journal No: 2456-4184 | Impact factor: 8.76 | ESTD Year: 2016
Scholarly open access journals, Peer-reviewed, and Refereed Journals, Impact factor 8.76 (Calculate by google scholar and Semantic Scholar | AI-Powered Research Tool) , Multidisciplinary, Monthly, Indexing in all major database & Metadata, Citation Generator, Digital Object Identifier(DOI)
Speech emotion recognition is a demanding
task in modern day system applications. It
is an important research topic that is used to
improve public health and contribute
towards the ongoing progress of healthcare
technology. In current time there are
requirements of applications which can
work specific task by giving voice
commands like Alexa, Google Assistant,
Cortana, Siri. But these applications do not
recognize human emotion and engage with
them. One of the difficult tasks in Speech
emotion recognition is to obtain emotion
features effectively from user’s voice.
There has been much research in the field
of SER including the use of acoustic and
temporal and deep learning models. There
has been conducted a lot of research on
traditional machine learning algorithms like
Support Vector Machine (SVM) [1], K-
Nearest Neighbor (KNN) [2],
Convolutional Neural Network (CNN) [3],
Graph Neural Networks (GNN) [4]. An
SER system targets thespeaker’s existence
by extracting and classifying the
prominent features from a preprocessed
speech signal. Some primary human
emotions are anger, neutral, happiness and
sadness, which define the emotional state
of human at a particular time which can be
classified using trained intelligent system.
The improve emotionrecognition accuracy
we use features of user voice like pitch,
speech intensity and Mel-frequency
cepstral coefficients.
(MFCC) [5]. Throughout the past ten years,
the determination of speech signals
emotions was a primary focus but the
enhancing the present effectiveness in
recognizing needs is imperative,
considering the significant dearth of
understanding surrounding the
fundamental temporal connections inherent
in the speech waveform. To fully use the
change in emotional content over phase, a
new method to voice recognition is now
being recommended, integrating structured
audio data with Long Short-Term Memory
(LSTM) [6] networks. The temporal
aspects of the time series were augmented
by extracting structural speech features
from the waves, now responsible for
preserving the intrinsic connections
between layers within the actual speech.
Many optimized techniques based on
LSTM are provided to ascertain emotional
concentration across multiple blocks. At
the beginning, the approach minimizes
computing expenses by altering the
traditional forgetting gate. Secondly,
instead of relying on the output from the
previous iteration of the conventional
method, an attention mechanism is used on
both the time and feature dimensions
within the LSTM’s final output. Instead of
depending on outcomes from the previous
stage, an efficient technique has been used
to find the spatial and characteristic aspects
in the final output of the LSTM. SER has
broad potential in the field of human-
computer interaction,healthcare to track
The emotional state of patient, providing
best user experience through intelligent
call centers and bankingsector.
Keywords:
Cite Article:
"Speech Emotion Recognition Using LSTM", International Journal of Novel Research and Development (www.ijnrd.org), ISSN:2456-4184, Vol.9, Issue 4, page no.a781-a786, April-2024, Available :http://www.ijnrd.org/papers/IJNRD2404094.pdf
Downloads:
00033
ISSN:
2456-4184 | IMPACT FACTOR: 8.76 Calculated By Google Scholar| ESTD YEAR: 2016
An International Scholarly Open Access Journal, Peer-Reviewed, Refereed Journal Impact Factor 8.76 Calculate by Google Scholar and Semantic Scholar | AI-Powered Research Tool, Multidisciplinary, Monthly, Multilanguage Journal Indexing in All Major Database & Metadata, Citation Generator
Facebook Twitter Instagram LinkedIn