Sign language is a complex and essential form of communication for people who are deaf or hard of hearing. It uses hand movements, facial expressions, and body language to express detailed meanings. American Sign Language (ASL) is a prime example of this, with its unique grammar and syntax.
Sign language isn’t the same everywhere; different countries and regions have their own versions, each with its own rules and vocabulary, showcasing the rich diversity of sign languages around the world.
Efforts are underway to create systems that can translate sign language hand gestures into text or spoken language instantly. Such systems are crucial for improving communication accessibility for the deaf or hard-of-hearing community, enabling more inclusive interactions.
Researchers from Florida Atlantic University’s College of Engineering and Computer Science have undertaken a groundbreaking study to recognize ASL alphabet gestures using computer vision. They developed a unique dataset of 29,820 static images of ASL hand gestures. Each image was marked with 21 key points on the hand using MediaPipe, providing detailed information about its structure and position.
These annotations significantly enhanced the accuracy of YOLOv8, a deep learning model, by helping it detect subtle differences in hand gestures more effectively.
The study’s findings, published in the Elsevier journal Franklin Open, show that using detailed hand pose information improved the model’s ability to detect ASL gestures accurately. By combining MediaPipe for tracking hand movements with YOLOv8 for training, the researchers created a powerful system for recognizing ASL alphabet gestures with high precision.
“The combination of MediaPipe and YOLOv8, along with careful tuning of hyperparameters, represents a groundbreaking and innovative approach,” said Bader Alsharif, the study’s first author and a Ph.D. candidate at FAU. “This method hasn’t been explored in previous research, making it a promising new direction for future advancements.”
The model achieved an impressive 98% accuracy, with a recall rate of 98% and an overall performance score (F1 score) of 99%. It also attained a mean Average Precision (mAP) of 98% and a detailed mAP50-95 score of 93%, demonstrating its reliability and precision in recognizing ASL gestures.
“Our research shows that our model can accurately detect and classify ASL gestures with very few mistakes,” Alsharif noted. “The findings emphasize the system’s robustness and its potential use in practical, real-time applications, allowing for more intuitive human-computer interaction.”
The integration of landmark annotations from MediaPipe into the YOLOv8 training process significantly enhanced both the accuracy of bounding boxes and gesture classification, enabling the model to capture subtle variations in hand poses. This two-step approach of landmark tracking and object detection was essential for ensuring the system’s high accuracy and efficiency in real-world scenarios. The model’s ability to maintain high recognition rates even with varying hand positions and gestures highlights its strength and adaptability.
“Our research illustrates the potential of combining advanced object detection algorithms with landmark tracking for real-time gesture recognition, providing a reliable solution for ASL interpretation,” said Mohammad Ilyas, Ph.D., a co-author and professor at FAU. “The success of this model is largely due to the careful integration of transfer learning, meticulous dataset creation, and precise tuning of hyperparameters, leading to a highly accurate and reliable system for recognizing ASL gestures, marking a significant milestone in assistive technology.”
Future work will focus on expanding the dataset to include more hand shapes and gestures to enhance the model’s ability to differentiate between visually similar gestures, further improving recognition accuracy. Additionally, optimizing the model for use on edge devices will be a priority, ensuring it maintains real-time performance in resource-limited environments.
“By advancing ASL recognition, this work contributes to developing tools that enhance communication for the deaf and hard-of-hearing community,” said Stella Batalama, Ph.D., dean of FAU’s College of Engineering and Computer Science. “The model’s ability to interpret gestures accurately paves the way for more inclusive solutions that support accessibility, making daily interactions in education, healthcare, or social settings more seamless and effective for those who rely on sign language. This progress holds great promise for fostering a more inclusive society by reducing communication barriers.”
The study’s co-author is Easa Alalwany, Ph.D., a recent graduate of FAU’s College of Engineering and Computer Science and an assistant professor at Taibah University in Saudi Arabia.