How do we combine faces and voices?
Human social interactions are shaped by our ability to recognise people. Faces and voices are known to be some of the key features that enable us to identify individual people, and they are rich in information such as gender, age, and body size, that lead to a unique identity for a person. A large body of neuropsychological and neuroimaging research has already determined the various brain regions responsible for face recognition and voice recognition separately, but exactly how our brain goes about combining the two different types of information (visual and auditory) is still unknown. Now a new study, published in the March 2011 issue of Elsevier's Cortex (http://www.sciencedirect.com/science/journal/00109452), has revealed the brain networks involved in this "cross-modal" person recognition.
A team of researchers in Belgium used functional magnetic resonance imaging (fMRI) to measure brain activity in 14 participants while they performed a task in which they recognised previously learned faces, voices, and voice-face associations. Dr Frédéric Joassin, Dr Salvatore Campanella, and colleagues compared the brain areas activated when recognising people using information from only their faces (visual areas), or only their voices (auditory areas), to those activated when using the combined information. They found that voice-face recognition activated specific "cross-modal" regions of the brain, located in areas known as the left angular gyrus and the right hippocampus. Further analysis also confirmed that the right hippocampus was connected to the separate visual and auditory areas of the brain.
Recognising a person from the combined information of their face and voice therefore relies not only on the same brain networks involved in using only visual or only auditory information, but also on brain regions associated with attention (left angular gyrus) and memory (hippocampus). According to the authors, the findings support a dynamic vision of cross-modal interactions in which the areas involved in processing both face and voice information are not simply the final stage of a hierarchical model, but rather, they may work in parallel and influence each other.