Social Signal Processing (SSPNet)

Social signal processing

Social intelligence is an important part of our cognitive abilities that allows to effectively navigate and negotiate complex social relationships and environments. Social intelligence plays an important role in every part of our lives; at work and in our private lives. 

An important limitation over today's artificially intelligent systems is that they lack social intelligence. They are unable to accurately interpret social situations. As a result, such systems are not capable of detecting potential conflicts, of automatically inferring the preferences of individuals, or of understanding attitudes towards objects or people.

The aim of the project is to develop techniques that facilitate the development of socially intelligent systems. In particular, our work focuses on interpreting social signals that take the form of complex constellations of non-verbal behavioral cues. For instance, we are developing systems for facial expression analysis and for the detection of agreement and disagreement.

Facial expression analysis

Facial expressions are an important non-verbal behavioral cue that provide insight into a person's emotions. Some of these expressions, so-called micro-expressions, are involuntary and cannot be faked. As a result, facial expression analysis may prove to be a more reliable way to detect emotions than verbal analysis.

Facial expressions can be effectively described using the Facial Action Coding System (FACS). This system distinguishes 46 action units (many of which correspond to a single facial muscle) that can be active in the face. From a FACS annotation, a wide range of emotions can be recognized.

Our system for automatic FACS annotation combines feature extraction using active appearance models (AAMs) and image region descriptors (such as SIFT and HOG) with classification techniques based on chain-structured conditional random fields (CRFs). In particular, our system uses a new AAM variant that is better capable of modeling appearance variation in the feature extraction. Classification is performed using a newly developed chain-structured CRF that models latent data structure in hidden units, thus drastically improving the classifier's variance.

To date, our system achieves a mean agreement of approximately 90% with human FACS labelers on the main action units.


Detecting (dis)agreement

The detection of agreement and disagreement between people is essential for understanding social interactions and for detecting potential conflicts. One of the key non-verbal behavioral cues for (dis)agreements are nods. Therefore, we are currently working on the development of an automatic nod detector.

Initial variants of our nod detector are window-based detectors that use features computed from optical flow images. The classification is performed using a new technique for the classification of time series that combines ideas from Fisher kernels with ideas from metric learning.

Research team
This research is performed in the context of the SSPNet project. The research is performed by Laurens van der Maaten and Emile Hendriks in collaboration with, among others, Maja Pantic and Konstaninos Bousmalis (Imperial College London) and Marc Mehu (University of Geneva).

References

- L.J.P. van der Maaten and E.A. Hendriks. Capturing Appearance Variation in Active Appearance Models. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, pages 34-41, 2010.