GATE project:

GATE project (Game Research for Training and Entertainment)

WP 3.2: Detecting, interpreting and affecting user behavior


Problem Statement 

There are many domains where detecting, tracking, and interpreting human motion is important. In virtual reality (interactive virtual worlds, games, virtual studios, character animation, teleconferencing), smart environment systems (access control, public area security, traffic safety, smart homes), advanced user interfaces (gesture driven input, sign language), sports (content based sports video indexing, personalized training), arts (interactive music composition, choreography analysis and training), and health-care (orthopedic treatments, ergonomics) the vision-based analysis of human motion plays an important role. Decades of research in omputer vision have resulted in many techniques and great improvements over the years, yet many issues remain to be solved.

Gesture and motion recognition starts finding its way in gaming and office applications. However, these are still controlled settings: a single person, in a known position, in front of a single camera. The range of applications of motion interpretation can be significantly broadened if it works reliably for groups of people.

Robustness is critical for gesture recognition technology. Many systems don’t read motions accurately or otherwise don’t function optimally when such factors as the background or lighting changes. In addition, they don’t always properly recognize motions made against noisy backgrounds and cluttered scenes.

The essential ingredient for an effective human-system interaction experience is that the system clearly indicates its level of understanding to the user. It should somehow be made clear to the user which gestures or facial expressions leads to what outcome. The question is which novel interaction methods will make this esired performance possible. In order to initiate research in this direction, we need to develop tools for observation, and experiment with interaction designs.

Description of the research

The objectives of the project are to develop fast and robust algorithms that can etect, track, and model accurately and robustly individual persons in the real 3D world, and to recognize gestures and motion of individuals, identify the interaction between persons such as looking or gesturing at each other, and to design interaction between humans and computer systems. For all these techniques we will first choose application domains, in order to identify the needs, and to evaluate the methods against the requirements.

The expected results are techniques for tracking people and fitting body models, for gesture tracking and recognition, and for face and eye tracking and expression recognition. Building upon these methods, we will show proof of concept with a spatial game, a gesture-driven presentation environment, and a communication pattern observation tool.

The general goal of this project is to analyze, interpret, and respond to the motion of groups of persons. For this, the following two main objectives must be achieved.

  1. The first objective is to develop fast and robust algorithms that can detect, track, and model accurately and robustly individual persons in the real 3D world.
  2. The second objective is to recognize gestures and motion of individuals, identify the interaction between persons such as looking or gesturing at each other, and to design interaction between humans and computer systems.


The application domains all consider a small number of persons in a room. The domain for whole body tracking and fitting could be some spatial game. The envisioned domain for gesture tracking
and recognition is a gesture-driven presentation environment. The domain for face tracking and expression recognition is the detection of inter-person interaction in a social setting. Each next application also exploits techniques from the previous stage.

An example is pose-driven spatial game. People can use players to get rid off controllers and play games using intuitive body movements and poses. The key element of the new tool is how to read body movements correctly.




Research directions

  1. Detect and track individuals in a group

When there is more than one person in the scene, we can use identical particle filters to track them respectively. Here the problem is whether tracking individuals in a group can work well when occlusion between persons occurs. The occlusion is always a difficult problem to deal with due to its large varieties. For instance, persons may change their directions after occlusion or still move in the same direction. Apart from this, the appearance of people may also change, such as from front view to lateral view. Additionally when people are very close to each other, it is quite easy to cause shadows on the body. All of this makes it a challenge to cope with people occlusion in a tracking system.

2.   Recognize pose and gesture of individuals in a group

For human pose/gesture recognition, the challenge is how to use the obtained visual feature to build the classifier feature space. We also need to take into account the combination of computer vision and pattern recognition. For example, how to extract more relevant features from images in order to build pose classifier. Another research issue is how to build a pose detector to detect defined poses and also reject non-poses.

  1. Understand individual behaviors and interactions in a group

Understanding individual behaviors and interactions requires the characterization of motion in terms of its level, such as global dynamic and local dynamic. Here the global dynamic is referred as the motion of whole body blob (motion tracking). Local dynamic means the motion of different body parts (limb motion, gaze direction). The research challenge is how to combine global dynamic with local dynamic in order to get correct understanding of motions.

People involved:

Marcel Reinders

Emile Hendriks

Feifei Huo