top of page

AI, Manual Dexterity & Sign Language Captioning


Status Update: 20 December 2025

Introduction

AI has transformed accessibility, particularly through audio and video captioning. Live captions are now common across meetings, broadcasts, and digital platforms. As a result, an important question is increasingly being asked:

If AI can caption speech, can it also detect sign language and generate captions from BSL into English?


As of 20 December 2025, progress exists — but the answer depends on how the technology is designed, what it is intended to do, and whether it accounts for a core feature of sign language: manual dexterity.


Why audio captions work so well

Audio captioning is considered a mature technology because spoken language is:

  • Linear and sequential

  • Based on a single data stream (sound)

  • Supported by very large, well-labelled datasets

AI systems can reliably:

  • Detect speech sounds

  • Match sound patterns to words

  • Output readable text

This is why speech-to-text captions are now widely trusted.


Why sign language captions are different

Sign languages such as BSL are:

  • Visual and three-dimensional

  • Spatial (meaning exists in space, not a line)

  • Grammatically expressed through hands, face, body, and timing

  • Structurally different from English

Capturing movement alone is not enough. Meaning comes from how movement is produced.


Manual dexterity: the critical factor

Manual dexterity refers to the precise, continuous control of:

  • Handshapes

  • Finger articulation

  • Wrist and arm movement

  • Speed and rhythm

  • Smooth transitions between signs

In BSL, even small changes in dexterity can:

  • Change meaning

  • Disrupt grammar

  • Reduce clarity

For AI, modelling this level of dexterity — continuously and naturally — remains extremely challenging.


Two AI approaches (often confused)

Name A: Accessibility delivery systems

These systems focus on access, not translation. They:

  • Present text, audio, and sign video together

  • Improve accessibility through parallel formats

  • Do not need to linguistically understand sign language

They are:

✔ Actively used

✔ Suitable for public information

✔ Low risk when used transparently


Some implementations may include generated sign visuals. Where this happens, movement may sometimes appear jerky, segmented, or less fluid, reflecting current technical limits in modelling manual dexterity — not intent.


Name B: Sign language recognition & translation systems

These systems aim to:

  • Watch a person signing

  • Detect sign language features

  • Convert sign language into text or speech


They are:

⚠ In research or pilot stages

⚠ Often limited in vocabulary

⚠ Sensitive to signing style, speed, and context

⚠ Highly dependent on high-quality sign language data

As of late 2025, they are not yet universal or fully reliable without human validation.


What works best today

The most reliable approach remains human-in-the-loop:

  • Humans define meaning

  • AI supports formatting, timing, and distribution

  • Accuracy is prioritised over automation

This approach respects language, culture, and trust.



Conclusion

AI is already improving accessibility in meaningful ways. However, audio captioning and sign language translation are fundamentally different challenges.

As of 20 December 2025, responsible use means:

  • Using AI to support access

  • Preserving manual dexterity through human expertise

  • Being transparent about technical limits

  • Keeping humans accountable for meaning


The future of AI and sign language accessibility works best when it is ethical, Deaf-informed, and carefully deployed.

 
 
 

Comments


bottom of page