Gesture Recognition


Gesture Recognition

Humans communicate with a mix of speech and gestures. Gestures are made with the hands, or with the head (to nod 'yes'), or with any other part of the body (some people point with their lower lip). What matters is that with a gesture someone moves to communicate. He isn't performing some practical action, nor is he just fidgeting or going somewhere.

When humans interact with computers (HCI) this is typically via typing and mouse pointing and clicking. Those actions could be regarded as hand gestures, but a more obvious classification would be that these are practical actions in the sense of tool usage: I press the keys and the letters appear, I move the mouse to control the pointer, and I click a button to execute a command. This is why a windows GUI is often referred to as 'direct manipulation'. There are objects that can be acted upon. And the same is true for touchscreen, tablet, joystick, and game console. To think of this interaction in terms of 'communicating to the computer with gestures' is a fairly far-fetched abstraction.

So why develop gesture technology? Why develop speech technology? Why search for a 'more natural way to communicate with computers'. There are perhaps as many answers as there are people involved in these developments. Yet, here are some of the most common answers:

  • To enable the transformation of the computer from a tool to an assistant, or a robot, be it mr. Data or Marvin the Paranoid Android,
  • To enable a hands-free or device-free interaction with a computer, expanding the range of possibilities for computer usage,
  • Because under special circumstances it would be very handy, think VR
  • To get off our lazy buts and start being physical again, think Eye-toy,
  • To allow those who can only use their voice, or eyes, or mouth to use computers, think RSI or Stephen Hawking,
  • Because of a personal disliking of keyboards and mice,
  • Because we can.

These answers aren't mutually exclusive, nor is the list exhaustive. But if you work on speech or gesture recognition you'll probably resonate well with several of the answers above.

The Project

In our project we developed a gesture recognition system for a learning environment for deaf children (sign language tutor). The user of this system is asked to perform a certain gesture (from a database of 120 words) which is recorded by a stereo camera. Skin blobs corresponding to head and left and right hand are tracked over time and the trajectory of both hands relatively to the head is reconstructed in 3D (zie figures below). Features of this trajectory are extracted and used for recognizing if the correct gesture is made and feedback is given to the user.

Description: Description: Path%20Emile%20Vlag


There are several applications where gestured HCI could provide a solution or improve 'usability'. Below are some examples:

·         Virtual reality
In virtual reality, a user needs freedom of movement while having full control over the computer. Speech recognition could be helpful here, but can never provide the necessary direct manipulative control of the environment. Hand gestures can convey such manipulative instructions as well as symbolic meaning hence can provide a much richer HCI.

·         Multimedia presentations
Also during (powerpoint) presentations, computer control is necessary while having the freedom of movement that is needed to support your message to the audience. A good presenter will use all human communication modalities including hand gestures and all other body language. In this situation speech recognition would disturb the presentation. A remote control device can help out but provides limited control and takes away the freedom of at least one hand.
When gestured HCI is used, the presenter could have more control over his/her presentation without the movement constraints that are invoked by handheld devices. The user can, for instance, highlight locations on the slides by finger pointing or zoom in on important parts of the slide. When these gestures are also natural for humans to associate with the respective functions, it is not only easy for the presenter to learn them, but they could perhaps also help the audience to understand what the presenter wants to show them, which is not very obvious when he/she is just pushing some buttons or sliding the mouse around.

·         Sign language tutor
Automatic gesture recognition can also be used to provide feedback in Electronic Learning environments to practice sign language. The computer can show the signs and give feedback when it detects (nearly) correct signs performed by the child. We are currently developing one such tutor we call ELo.

Description: Description: GebarenKinderen

Research problems

Although a lot of research has been conducted on gesture recognition over the last few years, there is still no gesture recognition system that provides the robustness and control needed for the applications mentioned above. It is a very challenging problem that contains several research issues in the area of image processing, computer vision and machine intelligence as well as several usability issues.

·         Person detection and localization
How to find the gesturing person in a complicated, dynamic scene where even other persons can be present.

·         Body pose estimation
How to estimate the subject's (3D) body pose from one or more 2D images and/or sparse 3D data. Only the upper body (head, hands, arms, torso) is important here. Especially self occlusions (arm in front of body, hand in front of face) make this problem very challenging.

·         Motion tracking
The time aspect of the observed body pose should be exploited by using models of human movement to simplify the body pose estimation after it is properly initialized. This is difficult because body pose can change very quickly and the pose in subsequent video frames can vary significantly. A good compromise between high frame rate and smart tracking methods must be found.

·         Feature extraction
Although a hand gesture can be seen as a trajectory of body pose over time, a gesture does not have one unique motion. Instead, the body motion of the gesture will vary because of human 'sloppyness' and differences between persons. Also, there are some periodic gestures where the number of repetitions and the starting position is not defined. Features must be extracted from the measured body pose data and hand appearance that are robust to the motion, lighting and background variabilities and convey meaning about the underlying gesture. These features could be regarded as phonemes, that are the building blocks of spoken language.

·         Recognition
A person never stops having a body pose and also moves around for other reasons than making a gesture to the computer. Therefore, a continuous stream of features must be processed in an intelligent way to extract only the correct, intended gestures.

·         Real-time
A recognition system is useless for HCI if it has a slow reaction-time. A user requires immediate response. This is what makes the technological challenge even greater. Solutions must be found in simple/fast/low-resolution image processing techniques, capable of extracting the relevant information from the video stream at real-time (25+ frames per second). Also (partial) hardware implementation can help to achieve this goal.

People involved:

Marcel Reinders

Emile Hendriks

Huib de Ridder

Jeroen Arendsen

Gineke ten Holt

Jeroen Lichtenauer



image005.jpg12.36 KB