Yasemin Erden on BBC

AISB Committee member, and Philosophy Programme Director and Lecturer, Dr Yasemin J. Erden interviewed for the BBC on 29 October 2013. Speaking on the Today programme for BBC Radio 4, as well as the Business Report for BBC world N...


Read More...

Mark Bishop on BBC ...

Mark Bishop, Chair of the Study of Artificial Intelligence and the Simulation of Behaviour, appeared on Newsnight to discuss the ethics of ‘killer robots’. He was approached to give his view on a report raising questions on the et...


Read More...

AISB YouTube Channel

The AISB has launched a YouTube channel: http://www.youtube.com/user/AISBTube (http://www.youtube.com/user/AISBTube). The channel currently holds a number of videos from the AISB 2010 Convention. Videos include the AISB round t...


Read More...

Lighthill Debates

The Lighthill debates from 1973 are now available on YouTube. You need to a flashplayer enabled browser to view this YouTube video  


Read More...
0123

Notice

AISB miscellaneous Bulletin Item

Announcement: JMUI Special issue: Real-time affect analysis and interpretation in virtual agents and robots

http://www.springerlink.com/content/q2080w072713/?p=86740fad4fe94dcab29705938d923c39&pi=0

JMUI Special issue on real-time affect analysis and interpretation in virtual agents and robots 
is now PUBLISHED

*****************************************************************************

JOURNAL ON MULTIMODAL USER INTERFACES

Special issue: Real-time affect analysis and interpretation: closing the affective loop in virtual agents and robots

Guest Editors: Ginevra Castellano, Kostas Karpouzis, Christopher Peters and Jean-Claude Martin
Volume 3, Issues 1-2, pages 1-153, March 2010
http://www.springerlink.com/content/q2080w072713/?p=86740fad4fe94dcab29705938d923c39&pi=0


EDITORIAL

"Special issue on real-time affect analysis and interpretation: closing the affective loop in virtual agents and robots"

Ginevra Castellano, Kostas Karpouzis, Christopher Peters and Jean-Claude Martin
Pages 1-3


ARTICLES

       
"On-line emotion recognition in a 3-D activation-valence-time continuum using acoustic and linguistic cues"

Florian Eyben, Martin Wllmer, Alex Graves, Bjrn Schuller, Ellen Douglas-Cowie and Roddy Cowie
Pages 7-19

Abstract
For many applications of emotion recognition, such as virtual agents, the system must select responses while the user is speaking. This requires reliable on-line recognition of the users affect. However most emotion recognition systems are based on turnwise processing. We present a novel approach to on-line emotion recognition from speech using Long Short-Term Memory Recurrent Neural Networks. Emotion is recognised frame-wise in a two-dimensional valence-activation continuum. In contrast to currentstate-of-the-art approaches, recognition is performed on low-level signal frames, similar to those used for speechrecognition. No statistical functionals are applied to low-level feature contours. Framing at a higher level is therefore unnecessary and regression outputs can be produced in real-time for every low-level input frame. We also investigate the benefits of including linguistic features on the signal frame level obtained by a keyword spotter.


       
"Student mental state inference from unintentional body gestures using dynamic Bayesian networks"

Abdul Rehman Abbasi, Matthew N. Dailey, Nitin V. Afzulpurkar and Takeaki Uno
Pages 21-31

Abstract
Applications that interact with humans would benefit from knowing the intentions or mental states of their users. However, mental state prediction is not only uncertain but also context dependent. In this paper, we present a dynamic Bayesian network model of the temporal evolution of students mental states and causal associations between mental states and body gestures in context. Our approach is to convert sensory descriptions of student gestures into semantic descriptions of their mental states in a classroom lecture situation. At model learning time, we use expectation maximization (EM) to estimate model parameters from partly labeled training data, and at run time, we use the junction tree algorithm to infer mental states from body gesture evidence. A maximum a posteriori classifier evaluated with leave-one-out cross validation on labeled data from 11 students obtains a generalization accuracy of 97.4% over cases where the student reported a definite mental state, and 83.2% when we include cases where the student reported no mental state. Experimental results demonstrate the validity of our approach. Future work will explore utilization of the model in real-time intelligent tutoring systems.


       
"Multimodal emotion recognition in speech-based interaction using facial expression, body gesture and acoustic analysis"

Loic Kessous, Ginevra Castellano and George Caridakis
Pages 33-48

Abstract
In this paper a study on multimodal automatic emotion recognition during a speech-based interaction is presented. A database was constructed consisting of people pronouncing a sentence in a scenario where they interacted with an agent using speech. Ten people pronounced a sentence corresponding to a command while making 8 different emotional expressions. Gender was equally represented, with speakers of several different native languages including French, German, Greek and Italian. Facial expression, gesture and acoustic analysis of speech were used to extract features relevant to emotion. For the automatic classification of unimodal data, bimodal data and multimodal data, a system based on a Bayesian classifier was used. After performing an automatic classification of each modality, the different modalities were combined using a multimodal approach. Fusion of the modalities at the feature level (before running the classifier) and at the results level (combining results from classifier from each modality) were compared. Fusing the multimodal data resulted in a large increase in the recognition rates in comparison to the unimodal systems: the multimodal approach increased the recognition rate by more than 10% when compared to the most successful unimodal system. Bimodal emotion recognition based on all combinations of the modalities (i.e., face-gesture, face-speech and gesture-speech) was also investigated. The results show that the best pairing is gesture-speech. Using all three modalities resulted in a 3.3% classification improvement over the best bimodal results.


       
"Multimodal users affective state analysis in naturalistic interaction"

George Caridakis, Kostas Karpouzis, Manolis Wallace, Loic Kessous and Noam Amir
Pages 49-66

Abstract
Affective and human-centered computing have attracted an abundance of attention during the past years, mainly due to the abundance of environments and applications able to exploit and adapt to multimodal input from the users. The combination of facial expressions with prosody information allows us to capture the users emotional state in an unintrusive manner, relying on the best performing modality in cases where one modality suffers from noise or bad sensing conditions. In this paper, we describe a multi-cue, dynamic approach to detect emotion in naturalistic video sequences, where input is taken from nearly real world situations, contrary to controlled recording conditions of audiovisual material. Recognition is performed via a recurrent neural network, whose short term memory and approximation capabilities cater for modeling dynamic events in facial and prosodic expressivity. This approach also differs from existing work in that it models user expressivity using a dimensional representation, instead of detecting discrete universal emotions, which are scarce in everyday human-machine interaction. The algorithm is deployed on an audiovisual database which was recorded simulating human-human discourse and, therefore, contains less extreme expressivity and subtle variations of a number of emotion labels. Results show that in turnslasting more than a few frames, recognition rates rise to 98%.

       

"From expressive gesture to sound - The development of an embodied mapping trajectory inside a musical interface"

Pieter-Jan Maes, Marc Leman, Micheline Lesaffre, Michiel Demey and Dirk Moelants
Pages 67-78

Abstract
This paper contributes to the development of a multimodal, musical tool that extends the natural action range of the human body to communicate expressiveness into the virtual music domain. The core of this musical tool consists of a low cost, highly functional computational model developed upon the Max/MSP platform that (1) captures real-time movement of the human body into a 3D coordinate system on the basis of the orientation output of any type of inertial sensor system that is OSC-compatible, (2) extract low-level movement features that specify the amount of contraction/expansion as a measure of how a subject uses the surrounding space, (3) recognizes these movement features as being expressive gestures, and (4) creates a mapping trajectory between these expressive gestures and the sound synthesis process of adding harmonic related voices on an in origin monophonic voice. The concern for a user-oriented and intuitive mapping strategy was thereby of central importance. This was achieved by conducting an empirical experiment based on theoretical concepts from the embodied music cognition paradigm. Based on empirical evidence, this paper proposes a mapping trajectory that facilitates the interaction between a musician and his instrument, the artistic collaboration between (multimedia) artists and the communication of expressiveness in a social, musical context.



"The mental ingredients of bitterness"

Isabella Poggi and Francesca DErrico
Pages 79-86

Abstract
In view of multimodal interfaces capable of a detailed representation of the Users possible emotions, the paper analyses bitterness in terms of its mental ingredients, the beliefs and goals represented in the mind of a person when feeling an emotion. Bitterness is a negative emotion in between anger and sadness: like anger, it is caused by a sense of injustice, but also entails a sense of impotence which makes it similar to sadness. Often caused by betrayal, it comes from the disappointment of an expectation from oneself or anothers with whom one is affectively involved, or from a disproportion between commitment and actual results. The ingredients found in a pilot study were tested through qualitative analysis of a further questionnaire, which confirmed the ingredients hypothesized, further revealing the different nature of bitterness across ages and across types of work.


       
"Affect recognition for interactive companions: Challenges and design in real world scenarios"

Ginevra Castellano, Iolanda Leite, Andr Pereira, Carlos Martinho, Ana Paiva and Peter W. McOwan
Pages 89-98

Abstract
Affect sensitivity is an important requirement for artificial companions to be capable of engaging in social interaction with human users. This paper provides a general overview of some of the issues arising from the design of an affect recognition framework for artificial companions. Limitations and challenges are discussed with respect to other capabilities of companions and a real world scenario where an iCat robot plays chess with children is presented. In this scenario, affective states that a robot companion should be able to recognise are identified and the non-verbal behaviours that are affected by the occurrence of these states in the children are investigated. The experimental results aim to provide the foundation for the design of an affect recognition system for a game companion: in this interaction scenario children tend to look at the iCat and smile more when they experience a positive feeling and they are engaged with the iCat.



"When my robot smiles at me - Enabling human-robot rapport via real-time head gesture mimicry"

Laurel D. Riek, Philip C. Paul and Peter Robinson
Pages 99-108

Abstract
People use imitation to encourage each other during conversation. We have conducted an experiment to investigate how imitation by a robot affect peoples perceptions of their conversation with it. The robot operated in one of three ways: full head gesture mimicking, partial head gesture mimicking (nodding), and non-mimicking (blinking). Participants rated how satisfied they were with the interaction. We hypothesized that participants in the full head gesture condition will rate their interaction the most positively, followed by the partial and non-mimicking conditions. We also performed gesture analysis to see if any differences existed between groups, and did find that men made significantly more gestures than women while interacting with the robot. Finally, we interviewed participants to try to ascertain additional insight into their feelings of rapport with the robot, which revealed a number of valuable insights.



"Communication of musical expression by means of mobile robot gestures"

Birgitta Burger and Roberto Bresin
Pages 109-118

Abstract
We developed a robotic system that can behave in an emotional way. A 3-wheeled simple robot with limited degrees of freedom was designed. Our goal was to make the robot displaying emotions in music performance by performing expressive movements. These movements have been compiled and programmed based on literature about emotion in music, musicians movements in expressive performances, and object shapes that convey different emotional intentions. The emotions happiness, anger, and sadness have been implemented in this way. General results from behavioral experiments show that emotional intentions can be synthesized, displayed and communicated by an artificial creature, also in constrained circumstances.


       
"Investigating shared attention with a virtual agent using a gaze-based interface"

Christopher Peters, Stylianos Asteriadis and Kostas Karpouzis
Pages 119-130

Abstract
This paper investigates the use of a gaze-based interface for testing simple shared attention behaviours during an interaction scenario with a virtual agent. The interface is non-intrusive, operating in real-time using a standard web-camera for input, monitoring users head directions and processing them in real-time for resolution to screen coordinates. We use the interface to investigate user perception of the agents behaviour during a shared attention scenario. Our aim is to elaborate important factors to be considered when constructing engagement models that must account not only for behaviour in isolation, but also for the context of the interaction, as is the case during shared attention situations.



"HMM modeling of user engagement in advice-giving dialogues"

Nicole Novielli
Pages 131-140

Abstract
This research aims at defining a real-time probabilistic model of users engagement in advice-giving dialogues. We propose an approach based on Hidden Markov Models (HMMs) to describe the differences in the dialogue pattern due to the different level of engagement experienced by the users. We train our HMM models on a corpus of natural dialogues with an Embodied Conversational Agent (ECA) in the domain of healthy-eating. The dialogues are coded in terms of Dialogue Acts associated to each system or user move. Results are quite encouraging: HMMs are a powerful formalism for describing the differences in the dialogue patterns, due to the different level of engagement of users and they can be successfully employed in real-time users engagement detection. Though, the HMM learning process shows a lack of robustness when using low-dimensional and skewed corpora. Therefore we plan a further validation of our approach with larger corpora in the near future.



"Natural interaction with a virtual guide in a virtual environment - A multimodal dialogue system"

Dennis Hofs, Marit Theune and Rieks op den Akker
Pages 141-153

Abstract
This paper describes the Virtual Guide, a multimodal dialogue system represented by an embodied conversational agent that can help users to find their way in a virtual environment, while adapting its affective linguistic style to that of the user. We discuss the modular architecture of the system, and describe the entire loop from multimodal input analysis to multimodal output generation. We also describe how the Virtual Guide detects the level of politeness of the users utterances in real-time during the dialogue and aligns its own language to that of the user, using different politeness strategies. Finally we report on our first user tests, and discuss some potential extensions to improve the system.