The Ear That Dreams: Eye Tracking Sound in the Moving Image


Creator's Statement

Scholars and practitioners of visual culture have often suggested that we live in an ocular age organized around the primacy and potency of looking and seeing (Berger, 1972, Evans, Hall, 1999). So pervasive is vision to everyday life and visual making practices and processes that we forge identities out of shared visual signs and get drunk on the consumption of images.

In film studies the centrality of vision to the way spectators engage with the text has been very well documented (Mulvey, 1975, 1989, Stacey, 1994, Mayne, 2002), although its primacy has always been challenged through a recognition of the synaesthetic and embodied qualities of the moving image (Sobchack, 1992, 2004, Marks, 2000). Film is a replete visual canvas, made of light, colour and movement, but every frame, every shot, is accompanied by the poetics of impressionable sound, even or especially when the frame falls silent.


In this respect, film studies scholars have acknowledged that one only has to turn off the sound while watching a film to understand how its absence impacts on the viewing experience (Doane, 1980, 1994, Whittington, 2009).  With sound on and our ears attuned, we come to realize that film sound is bi-sensorial, both “a sonorous figure in the ears, and a vibration felt in the skin and the bones” (Chion 1994: 221). Sound not only localizes and animates the moving image but registers at an emotional and embodied level on the viewer.  We feel sound.

Vivian Sobchack suggests that film sound has particularly immersive and co-synesthetic affects, and can lead viewers to create “shapes” out of their hearing and feeling alone (2005). Indeed, following this focus on the affordances of sound, Elsaesser and Hagener have argued that film viewers can “hear around corners and through walls, in complete darkness and blinding brightness, even when we cannot see anything…The spectator is…a bodily being enmeshed acoustically, spatially and affectively in the filmic texture” (2010: 131-132).

While there exists a body of work that empirically engages with the way spectators respond to a film’s mise-en-scene, narrative, characterization, and ideological content, there is little research on what the eyes actually attend to, or how sound, voice, dialogue, and music impact the way a text is perceived and made meaningful by audiences. What do we hear when we see? What do we feel when we listen?


The empirical research that exists on the qualities and effects of sound comes largely from music studies, social psychology, and cognitive science (see Juslin and Laukka, 2004). Further, historically, and more generally, a great deal of audience research within film studies either rests on an imagined viewer or involves qualitative research based on memorial work, interviews, and focus groups. The rise of the video essay as a mode of research and method of analysis has tended to continue this dominant concern with the primacy of the image.

There exists, then, an empirical research gap around the specifics of viewing and hearing film, and of the complex relationship between the two, that this audio-video essay takes a step towards filling. The eye tracking technology employed affords one the opportunity to generate new empirical data about what viewers actually gaze at, for what length of time, and with what levels of intensity. It has also allowed us to quantitatively map the relationship between sound and vision, hearing, and seeing, and to determine how and where a relationship emerges between what was heard and what spectators gazed at. There are limitations, nonetheless, in the use of eye tracking technology, which this audio-visual essay also explores.

Drawing on cinematic theories of sound, and neuroscientific understandings of attention, comprehension, and the gaze, this video essay employs eye tracking technology in a sound on/off comparative analysis of the first five minutes of the Omaha Beach landing scene from Saving Private Ryan (Spielberg, 1998). The film was chosen as a case study because it involves complex sound design, moments of perceptual shock, internal diegetic sound, spatial and temporal shifts in sound, and heightened sonic agency.  

Six viewers were eye tracked at the Eye Tracking Lab at La Trobe University, Melbourne, and the data analyzed through a combination of close textual analysis and the statistical interpretation of aggregate gaze patterns. The viewers were shown the sequence twice: once with its normal audio field playing, and once with the sound taken out.

In this video essay I interpret this data to answer the following questions:

To what extent do viewers’ eyes follow narrative-based sound cues?

How does the soundtrack affect viewer engagement and attention to detail?

Is there an element of prediction and predictability in the way a viewer sees and hears?

Do viewers’ eyes ‘wander’ when there is no sound to guide them where to look?

Ultimately, I ask how important is sound to the cinematic experience of vision: Does the ear dream?

[1] An extended written version of this video essay has been published as: Redmond, S., Pink, S., Stadler, J., Robinson, J.,Verhagen, D., & Rassell, A. (2016). Seeing, Sensing Sound: Eye Tracking Soundscapes in Saving Private Ryan and Monsters, Inc. In C. Reinhard & C. Olson (Eds.), Making Sense of Cinema: Empirical Studies into Film Spectators and Spectatorship (pp.139- 164). New York, NY: Bloomsbury.

Response to Reviewers' Comments

I would like to thank [in]Transition's peer reviewers, who reviewed the work upon submission, as well as the colleagues who offered helpful feedback prior to submission. The final published piece is thus indebted to the reviewers and colleagues who viewed and responded to the work.  

This final version has responded to all the technical and delivery questions raised and is stronger for it. However, the substantive question raised by one reviewer with regards to its overall structure, and its use of the term ‘dreaming’, has not been addressed. I have exercised both my critical autonomy here, and the highly positive view of the other peer reviewer, to hold true to the ambitions of my work. Dreaming is not being drawn upon in this video essay to resurrect Metz, for example, or to draw on or from psychoanalytical film theory. Dreaming is being used poetically, as a creative tool to make inferences about the way we hear and see, and to complicate the reading of the empirical data generated by the eye tracking results.  


Sean Redmond does a fantastic job here of bringing the un-initiated viewer up to speed on the subject matter and how it intersects with sound design.  His piece begins provocatively - with a simple attention getter - but it also exhibits Redmond’s knowledge that videographic criticism is not an academic article simply read over a clip reel.  It is its own unique form of rhetoric - one that too often falls prey to the flat delivery of a conference paper.  Redmond’s script is incredibly dynamic in how he quickly and effectively prompts the viewer to pay attention to film in a way she might not typically be accustomed: by listening to details embedded in a scene from Road to Perdition (2002).  

After the attention getter, Redmond provides a quick and concise review of literature in a way that does not rely on a familiarity with the sub-discipline and its jargon, and that is quickly and effectively substantiated through his use of clips ranging from The Apu Trilogy and Interstellar - an inspired juxtaposition given the use of grass to link the two films, though this results in a rather jarring cut from a lower-resolution black and white 4x3 image to a higher-resolution widescreen color image.  When he segues into tying sound studies to eye tracking, the clip from The Conversation does an effective job of visualizing the idea he’s driving for when he’s explaining how sound encourages the construction of visual hypotheses.  In other words, where Redmond’s piece succeeds most impressively is in finding dynamic ways to present his argument in an audiovisual form.  At no point does the voice over feel untethered to the video track - they lean on and support one another poetically, economically, and clearly.  

This is an interesting and thought-provoking piece on film sound. The introductory section uses a nice range of examples, and the case study on the sequence from Saving Private Ryan is illuminating, with some suggestive ideas about the ways in which sound and image work together to achieve ‘attentional synchrony.’ SR is to be congratulated on taking on two significant challenges at once: the interdisciplinary challenge of integrating eye tracking with more humanistic methods; and then conveying the resulting ideas through a video essay rather than a traditional written essay. Set against these achievements, there are a number of problems which compromise the video, which I discuss below. Nonetheless, especially when paired with Darrin Verhagen’s ‘Materalization, Emotion and Attention – Tracking Sound’s Perceptual Effects in Film,’ ‘The Ear That Dreams: Eye Tracking Sound in the Moving Image’ certainly makes a contribution to our understanding of the dynamic interaction of sound and image in film.

Conceptual problems

In a range of ways, beginning with the title, the film commits itself to the analogy between film spectatorship and dreaming – in this context, ‘the ear that dreams.’ Whatever credit this analogy may once have possessed, it is now bankrupt – what in German would be called a ‘conceptual corpse.’ Whatever grounds there might be for the metaphor in relation to other aspects of film viewing, it seems entirely ill-conceived here. (Plus, it’s hard not to interpret the use of the metaphor as an invocation of psychoanalytic film theory – given that the references to vision and film, in the written accompaniment to the video, are all in that tradition. It’s also pretty odd to conjoin such ideas with eye tracking while making no mention of cognitive theories of film perception – such as work by Joe and Barb Anderson, David Bordwell, and Tim Smith – given that eye tracking is a tool of cognitive science. Smith has been pioneering the use of eye tracking in film research for more than a decade, so one might have expected to see at least a nod in his direction here.)

What is the dream analogy supposed to make salient? Presumably the idea is that an ear that ‘dreams’ is one that doesn’t simply perceive sounds, and one that hears sounds which do not simply and slavishly reinforce whatever is represented visually. At certain points the voice-over commentary plays up the expressive power of film sound, as it works on/through/with the image; at others (eg. 16:15), the focus is on the capacity of moving visual imagery to prompt imagined sounds. What is really at stake here, then, is the imaginative power of film sound: its capacity, alone and especially in conjunction with the image, to enrich – expressively and emotionally – our experience of the action. Thus we might well say that the sound design in Saving Private Ryan helps to create a ‘nightmarish’ vision of the beach landing, but that would be a critical description of this particular scene, and one that doesn’t license a general analogy between the audition of film sound and the experience of dreaming.

Less damagingly, there is also a conceptual problem with the three-part analysis of film theory on sound in the introductory section, carving it up into work on the aesthetics of film sound, the role of the score (and other components of the soundtrack, such as the voice), and sound design in specific genres. This serves well enough to give an exposition of some important existing scholarship. On a theoretical level it doesn’t hold up, though, as all three domains of study are concerned with aspects of the aesthetics of film sound (for surely the film score, for example, plays a critical role in our aesthetic experience of a film).

Methodology: rigour and exposition

The exposition of the experimental design, and more generally the methodology of eye tracking, could be clearer at several points. With regard to the gaze plot summaries, for example: what do the various colours represent? (Different viewers, I guess.) What do the numbers represent? (The order of fixation points, presumably.) These dimensions of the data visualization are never explained.

At 11.25, the voice-over announces: ‘what you’re about to now watch is one person’s visual eye tracking data for this scene.’ But are we watching the fixations of a viewer with sound on, or sound off? Presumably sound off, since the sequence runs silently here. But it would be nice to know for sure.

The specific conclusion drawn at 17.31 – concerning the longer fixation duration of gazes with sound on than with sound off – is perhaps the most striking finding from the eye tracking study here. But what is the baseline for fixation duration in everyday contexts – that is, for perception in real environments? Doubtless it varies, but the import and force of this provisional finding might be sharpened with more context about fixation duration in general.

The final moments of the video are problematic. The force of the edit at 21.30 is either misleading or obscure. If the three children that we see in the final shot were the ‘ears that dreamed,’ as the juxtaposition with the voice-over implies, does that mean that they were among the experimental subjects? That seems unlikely. But if that is not being suggested, what is?

There is a puzzle here regarding the fit between the film’s overarching question (‘does the ear dream?’) and the method of eye tracking. At the beginning of part 2 (around 9.17), eye tracking is advanced as a tool that will allow us to answer this question; by the end of part 2 (20.50), we are being told (plausibly) that it cannot reveal much about the emotions and memories elicited in viewers. Relatedly and more generally, the video seems to be pulled in two directions in its ambition to be at once a ‘poetic’ evocation of the power of film sound – as announced in the introduction – and a controlled, quantitative investigation of some specific aspects of film perception.

Some of the shortcomings in the design, exposition, and analysis of the eye tracking experiment perhaps arise from the fact that almost half the video is devoted to an introductory section on film sound in general. Compare this with the proportions we would find in a traditional piece of published research: the introduction would normally occupy no more than say 15% of the article. The extended introduction seems to be something of a generic feature of scholarly video essays; certainly it’s not unique to this piece. But it presents a particular challenge when the research being presented is truly interdisciplinary, making it more difficult to give due time and space to the methods underlying the research.s not unique to this piece. But it presents a particular challenge when the research being presented is truly interdisciplinary, making it more difficult to give due time and space to the methods underlying the research.