Visualizing Library-Scale Media Collections


One way we can better use data and/or research visualization in the humanities is by taking advantage of the advanced computation and visualization resources developed for science and industry, bringing these tools to bear on humanistic research questions. Many digital humanists don't realize that high performance computing and advanced visualization resources are available to them through the nation's Extreme Science and Engineering Discovery Environment (XSEDE) network.

Our 1000 Words project at the Texas Advanced Computing Center (TACC) seeks to improve access to these resources by developing the software tools, skills, and knowledge base to allow humanities researchers to use visualization - specifically on high-resolution displays powered by supercomputers – to perform novel research.

Now you might be thinking, "If we humanists are just beginning to dip our toes into the waters of visualization, do we really need advanced visualization resources? What can we possibly do with supercomputers and ultra high-resolution displays?"

Surely, in many cases, a laptop will be more than enough to achieve the goals of a visualization project. But ask yourself: How many millions of documents can you store on your hard drive? How long will it take to download them? How long to search through them? How many photographs, documents, or images of cultural artifacts can fit on a single laptop screen? How much information can you overlay on a map or a timeline without running out of space?

Dr. Jason Baldridge interacting with a visualization of Civil War archives in his office and at the TACC Vislab

These are the kinds of "big data" issues that are beginning to confront the humanities, thanks to our society's exponentially increasing production of and access to digital texts, images, music and video. As the humanities liaison for a major U.S. supercomputing center and visualization lab that serves the nation's science community, I am uniquely situated to help with these problems.

Scholars like Lev Manovich, whose "cultural analytics" research explores massive cultural data sets of visual material, and Franco Moretti, whose Literary Lab at Stanford is developing new techniques of "distant reading" to study the history of the novel, are pioneers in the field.

The first fruit of our 1000 Words project at TACC is the Massive Pixel Environment (MPE), whose initial development was funded by the National Endowment for the HumanitiesMPE makes it possible for us to quickly create interactive visualizations for ultra-high resolution display walls in collaboration with humanities scholars.

While creating visualizations using MPE does require a bit of coding, it leverages the Processing programming language -- a language designed at the MIT Media Lab for teaching visual designers how to code. Learning to code visualizations in Processing is something that can be learned in a semester, rather than four years.

This brings me to the second point I'd like to make: to make better use of visualization we need to create more institutional opportunities for students and researchers in the humanities to gain experience working across disciplines. Good visualization requires not only scholarly domain expertise, but also technical/coding expertise and information design expertise (itself a multidisciplinary field). Our siloed departments and traditional publication-based incentive structures limit the potential of students and scholars to gain the kinds of experience they need to creatively apply visualization techniques in their explorations.

Please let me know your take on these issues in the comments below, and if you are a humanist working on a project that requires more CPU cycles, more memory, or more pixels than you have at your disposal, I'd love to hear from you!


Photo Credits:

Figure 2. Dr. Tanya Clement, using ProseVis in her office and at the TACC Vislab

Figure 3. Dr. Jason Baldridge interacting with a visualization of Civil War archives in his office and at the TACC Vislab


I love your idea for students to work together across disciplines to fully grasp not only their personal research topics, but also the coding requirements necessary to visually express their research. I feel that a lot of humanities research is lost to those that consider themselves "visual learners" because of the lack of data visualization in the field, but that these tools could help bring humanities research to light in a way that eliminates some of this segregation of departments. From my own limited research alone, I have run into many issues with the organization of my data and how long it takes to search through everything, etc., and I think many early scholars like myself could really benefit from these kinds of tools. I wonder, would it be difficult to train instructors in the Processing programming language so that they can share this knowledge with their students? I think that could also open up new teaching opportunities and be an exciting class that many students would love to take, if it could be offered as such. 

Hi Claire -- Great question! Training instructors in the Processing programming language is an excellent idea, and that is something we are thinking about as part of our outreach efforts at the TACC Vislab. I also agree with your point that "visual learners" have a lot to offer the humanities, and bringing together these communities is one of the motivations of my work.

As I alluded to in my post, a major barrier for students in the humanities who want to learn is that there are few incentives for their professors to investigate computational tools that might inform their work and teaching. While many might like to invest hours in learning to code, current standards for promotion and tenure make this kind of professional exploration a risky use of their time. The lack of basic literacy instruction in computer programming at primary and pre-secondary levels also means that there is a long learning curve for them to become creators (and not just users) of computational tools for humanities scholarship.

On the other hand, there are many fantastic  resources available for those who are motivated to learn to code on their own time. Processing comes with a library of examples and tutorials that don't assume any prior knowledge of the language. For those interested in visualization, O'Reilly's Visualizing Data: Exploring and Explaining Data with the Processing Environment is an excellent introduction.

Also, since Processing was born as a teaching language, there is a lively community of educators using Processing in their curriculum. Khan Academy's exciting new computer programming curriculum teaches a modified version of Processing in their online courses, which allow students to write and run code directly in their web browser.

One final point that may not be clear from my post: Processing is a great programming environment for creating dynamic visualizations, and our Massive Pixel Environment library makes it easy to create massive visualizations with millions of pixels. However, it is not designed for searching through large archives or mining "big data". With future funding, we hope to create tools to simplify this part of the data visualization process as well.

Add new comment

Log in or register to add a comment.