Loading

Sign Language Detection and Recognition using Image Processing for Improved Communication
Nishtha Bhagyawant1, Gauri Tamondkar2, Sneha Yadav3, Shwethashree Kenche4, Sunny Sall5

1Nishtha Bhagyawant, Department of Computer Engineering, St. John College of Engineering and Management, Palghar (Maharashtra), India.

2Gauri Tamondkar, Department of Computer Engineering, St. John College of Engineering and Management, Palghar (Maharashtra), India.

3Sneha Yadav, Department of Computer Engineering, St. John College of Engineering and Management, Palghar (Maharashtra), India.

4Shwethashree Kenche, Department of Computer Engineering, St. John College of Engineering and Management, Palghar (Maharashtra), India.

5Sunny Sall, Department of Computer Engineering, St. John College of Engineering and Management, Palghar (Maharashtra), India. 

Manuscript Received on 23 April 2025 | First Revised Manuscript Received on 27 April 2025 | Second Revised Manuscript Received on 04 May 2025 | Manuscript Accepted on 15 May 2025 | Manuscript published on 30 May 2025 | PP: 16-23 | Volume-15 Issue-2, May 2025 | Retrieval Number: 100.1/ijsce.B366815020525 | DOI: 10.35940/ijsce.B3668.15020525

Open Access | Editorial and Publishing Policies | Cite | Zenodo | OJS | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)

Abstract: This study presents an advanced deep learning framework for the real-time recognition and translation of Indian Sign Language (ISL). Our approach integrates Convolutional Neural Networks (CNNs) and Long Short-Term Memory (LSTM) networks to effectively capture both the spatial and temporal features of ISL gestures. The CNN component extracts rich visual features from the input sign language videos, while the LSTM component models the dynamic temporal patterns inherent in the gesture sequences. We evaluated our system using a comprehensive ISL dataset consisting of 700 fully annotated videos representing 100 spoken language sentences. To assess the effectiveness of our approach, we compared two different model architectures: CNN-LSTM and SVM-LSTM. The CNN-LSTM model achieved a training accuracy of 84%, demonstrating superior performance in capturing both visual and sequential information. In contrast, the SVM-LSTM model achieved a training accuracy of 66%, indicating comparatively lower effectiveness in this context. One of the key challenges faced during the development of the system was overfitting, primarily due to computational constraints and the limited size of the dataset. Nevertheless, through careful tuning of hyperparameters and the use of various optimization strategies, the model exhibited promising results, suggesting its potential for real-world applications. This paper also discusses the data preprocessing techniques employed, including video frame extraction, normalization, and data augmentation, which played a critical role in enhancing model performance. By addressing the complexities of sign language recognition, our work contributes to advancing communication accessibility for individuals relying on ISL, promoting greater inclusivity through technology.

Keywords: Sign Language (SL), OpenCV, CNN, LSTM, hand gesture, real-time, Deep Learning (DL)
Scope of the Article: Image Processing and Recognition