I have a rather ambivalent relationship with sound. While I enjoy listening to music and playing the drums, both are hobbies for me. And when I’m listening to a record or playing a set, my intent is to be as uncritical as possible—to assume and even embrace all of the stereotypes common to sonic culture: immersion in an affective experience, “feeling it” over “thinking it,” and improvising instead of outlining. I know all of these stereotypes are bunk. Music is at once an intellectual and affective endeavor, and for many academics it is an object of inquiry. Still, when I write about sound, I often want these stereotypes to hold true. Among other things, I struggle with translating the complexities of what I hear into writing: “There is simply too much going on here to describe.” “I’m sucking all of the energy from it.” “Words don’t do it justice.” Sure, I’ve felt the same way when writing about novels or poetry. But the thing is, I can at least quote, scan, or photocopy a print text. I can make parts of it co-present with my criticism. In the history of humanities scholarship, doing the same with audio is difficult at best. Not only are print monographs and journal articles still par for the course; they are also some of the very mechanisms through which scholarly practices are naturalized and measured. My fingers are crossed that such standards and mechanisms are changing.
In the meantime, I’ve been experimenting with a platform architected specifically for multimodal scholarly communication. Called Scalar, it’s a project of the Alliance for Networking Visual Culture (which is working to put the platform in the wild). Scholars including Alex Juhasz and Virginia Kuhn have already published with it, and I’m using it to compose a cultural history of magnetic recording. At first, I was excited that Scalar would easily allow me to publish scholarship complete with audio on the web. Audiences could finally hear what I was writing about. However, as I continued to compose with Scalar, I quickly learned that it sparks a curious set of demands, namely concerns about how to write with sound instead of merely about it. Below is an example of what I mean, and it includes screengrabs of my Scalar project. Since this cluster of the The New Everyday is titled Rough Cuts, I found it only fitting to provide snippets of my work on William S. Burroughs’s cut-up experiments.
Here is how I normally write about audio and use it as a form of evidence:
This approach is ideal for print, and it lends itself nicely to sequential argumentation.
Initially, adding an audio clip (e.g., an excerpt of Burroughs’s “Origin and Theory of the Tape Cut-Ups”) to this approach did not dramatically alter my writing. Rather, I put the file to the right of the text, ideally next to or near the sounds I was referencing. In the grab below, I am specifically quoting Burroughs saying: “When you cut into the present, the future leaks out.” With the multimodal interface, audiences can hear that sentence in the context of his April 1976 lecture at the Naropa Institute.
I then started to realize that, when the audio is co-present with my writing, I create some redundancies, which could be cut from my paragraphs. For instance, audiences do not necessarily need to read, “Here, he is referring to when he cut an article by J. Paul Getty.” After all, they can hear Burroughs saying this explicitly in the audio file.
I also began experimenting with where else the audio could be placed. Should it follow the linear logic of the paragraph? Should I insert it into the paragraph itself, rather than in the margins? With these questions in mind, here is one arrangement that emerged:
Using the “history editor” in Scalar, I can see the effects of my revisions. Version 10 (time-stamped 09:19, 07 October 2011) is 201 characters longer than Version 12 (time-stamped 09:33, 07 October 2011). At least at this juncture, it seems that writing with sound means less writing. Or at least less description.
However, my earlier point about reducing redundancies in the text forced me to consider how annotations linked to the audio itself could be an effective way of providing some historical context for Burroughs’s talk. For instance, during the talk he refers to “one of Getty’s sons.” Yet he never names the son. As shown below, I can include an annotation explaining which of Getty’s sons did in fact sue him, not to mention when, where, and for how much.
One issue with this approach—and I’m not sure it’s necessarily a problem—is that the annotations tend to pop-up and cover my primary text. That is, the addition of an annotation layer in the interface too easily distracts audiences from the argument I am making. And so I began porting the argument itself more directly into the listening experience. Consider these three examples:
As audiences listen to Burroughs’s talk, they can now follow my argument about it. Here, not only does the temporality of my argument morph (e.g., audiences lose some agency over the pace and place of their reading). The space and form of interpretation morphs, too. Paragraphs shift to snippets, which pop-up and disappear depending on how I’ve tagged and annotated the audio file.
And once my argument is synced with an audio file in this fashion, I can entirely rethink my initial, paragraph-based approach to writing about sound. The argument now looks like this:
Visualized this way, audiences can quickly see the date and location of the talk, as well as the title track (“Origin and Theory of the Tape Cut-Ups”) and album (Break Through in Grey Room) from which the excerpt is drawn. This version, Version 14 (time-stamped 10:21, 07 October 2011), is 1923 characters shorter than Version 10 (pictured as Figure 1 above).
I can also view the media file in Scalar and see my annotations listed in order of appearance. The image below only shows a few annotations, but it gives a sense of how the paragraph can be distributed and spatialized through the temporal logic of the audio file.
However, neither this image nor the difference in characters (i.e., 96 in Version 14 and 2019 in Version 10) means that electronic text becomes less important (even after redundancies are cut) in a multimodal, networked environment. Instead, the writing is granulated and exists in linked, horizontal relationships with all other media in the project. Consequently, each annotation becomes a unique instance—a snippet of text with its own metadata, URI, and relations. Here is an example of one snippet in isolation:
Such granularity becomes useful when visualizing the content of an entire project and the relationships within it. For instance, in the image below, all of the media files (i.e., audio, images, and video) in my project are visualized in Scalar.
The green dots represent media files; the purple dots, annotations; and the orange dots, pages. I can see that, at least for now, four portions of my excerpt of “Origin and Theory of the Tape Cut-Ups” have been annotated, and two pages include or reference the excerpt. This visualization begs the question of what a page or an annotation implies, especially when everything in Scalar is essentially a unique instance that can relate to anything else. In my case, I am inclined to say an annotation is critical commentary on a specific slice of media (e.g., audio), and a page is a composite of (or a folder for) those annotations. Within Scalar, I can then visualize the relations between these snippets with respect to the balance of the project:
This radial visualization is comparable to an index. With it, I can see that “we had no explanation for this at the time” is only mentioned once during the project, as an annotation of “Origin and Theory of the Tape Cut-Ups,” which is referenced in the page, “Cutting into the Present.” I can also see that “Origin and Theory” is tagged as an audio file; however, it is one among other audio files (seven, to be exact) in the project. As the project progresses and becomes more robust, this kind of emergent index will feed back into my writing, giving me an opportunity to express a network of individuated instances through processes arguably impossible without computation.
All of this work is of course in progress, and I’m still not sure whether writing with the temporality of audio is the approach I prefer. I have also realized that, when granulating their projects, writers need to follow clear workflows and document their middle states in order to test an array of options. Argumentation is not all improvisation.
So, am I positive that a radial view of my project affords audiences any new information about the history of magnetic audio? Have I spelled out how, exactly, scholarly communications might benefit from networking electronic text, audio, and other media through many-to-many relations? Or have I explored ways that my multimodal project can be more accessible to multiple audiences, including those who may not be able to hear or navigate the sounds I’m writing about? No, and not yet. But I hope to have responses soon.
For now, what I am learning from writing with sound is that, when composing with platforms like Scalar, scholars are practically provoked into speculation—into conjecturing with the multimodal forms of the data-driven web. A somewhat unfamiliar territory for the humanities, we might call this moment an ambivalent blend of knowing and doing, inscription and expression, thinking and feeling (including feelings of bewilderment, frustration, surprise, serendipity, confusion, and curiosity). Right now, that’s certainly not a bad middle-state to be in.
Special thanks to the Alliance for Networking Visual Culture as well as the Scalar Core Development Team, including Tara McPherson, Erik Loyer, Craig Dietrich, and Steve Anderson, for supporting my project.