Data visualization is one of my primary areas of work. People enjoy looking at data that has been visualized, they are even interested in developing new academic programs related to data visualization. Before we develop new academic programs, we need to have some sense of what we are teaching and why. Data visualization can be a cross-disciplinary meeting point where art and communication can bring shades of different epistemologies and arts together. There is also the risk that instead of being taught as a translational method, that data visualization could be seen as something akin to a print shop run out of a College of Liberal Arts.
What is data visualization?
Most recent draft definition: Data visualization refers to a static image produced by automated means that communicates a situated argument by mapping features of a dataset to the visual properties of said image.
Breaking that down:
a. Static images — if we are talking about sounds then it is sonification. Different topic for a different day, or perhaps a special topics class. Haptic feedback processes are in this same bucket. Dynamic visualizations are almost always automated, but have different affordances and map underlying warranting patterns differently.
This is an animated gif, it is dynamic, also not a data visualization. Apparently, Medium didn’t like the original embed code. Click if you want, its a nice gif of John Ritter from Three’s Company. Enjoy.
b. Must be automated: the folks on the sub-reddit, DataisBeautiful are right here — the core of the produced image needs to have been produced using a method that is automated. No hand-drawings. Any of Tufte’s confections really don’t apply here as they are so meta that they really don’t include much data any more. This is the sine qua non of visualization if you follow his trajectory. Art is great, big fan of graphic design as well. They definitely have a central place in our approach to teaching. They should not replace the teaching of social science, statistics, rhetoric, or computer science.
This is some sort of cool picture, not a data visualization.
c. Communicate a situated argument — this is where scientific visualization differs from data visualization and information visualization in particular. These visualizations are intended to interface with an argument. They provide an indexical warrant that shows one of the primary warrant patterns to make an argument make sense. These include: effect-cause, cause-effect, sign, generalization, analogy, and so on. Why do I say warrant? Because or colleagues in rhetorical studies (argumentation in particular) have done a fine job of developing the means by which we can understand how people argue. No need to reinvent the wheel. We have sophisticated, rigorous ways to talk about arguments. It is possible that an NMR image of a sugar molecule could be a part of an argument, but if it is simply produced for the purposes of describing a physical or chemical phenomena, it is not data visualization. It is scientific visualization.
Here is a chart that makes an argument that Alabama is the most important team in the networked relationship of college football teams:
D. Mapping features of a dataset to the visual properties of an image — this could be as simple as using software to draw a graph, or more complex such as building a map. Mapping is a term from graphic design. In short, tie the image to the data in a consistent and predictable way. So, not a collage. Perhaps a montage. Most of the time just a really high production value graph.
The information in the map needs to connect with the information about the world A fail here is when USA Today mis-mapped a map.
And here is a fail of mine, an ill-fated attempt to build a network visual of the history of the NBA understood through the industries of the owners.
I did a really bad job in that last link. What are the colors, why this placement, also why did you treat the dynamic relation underlying the data (timeline) as nodes rather than metadata? When I try to explain myself I start talking really quickly. That is a bad sign. Next argument.
What kinds of static visualizations should we teach?
There are four major types of static data visualizations: plots, networks, trees, and maps. Angela Zoss at Duke University Library emphasizes the dimensionality of graphics, flowing from this taxonomy. For clarity multidimensional and mono-dimensional plots should be considered one category, plots are plots. Within plots there are important sub-categories. The overarching distinction between static and dynamic eliminates the need to focus on the relative dimensionality of the image. This four part taxonomy of images does flatten timelines into plots, which could be troubling as timelines tend to be emphasized emphatically in our research practice.
Plots. Here is a genetic plot of candidate support with different dimensions.
Here is a line graph, another kind of plot.
This is a network of basketball teams trading with each other.
Lines are players traded, node size is betweenness, placement is eigenvector centrality (double circle layout).
Each time the tree splits, the next branches are sub-topics contained with in the main topic in the dataset. It isn’t particularly pretty. Neither is text mining.
Trees also would contain flow charts.
Here is a panel of maps I made to help explain why Trump was going to win the nomination.
The dots are states, the rows are methods of allocation of delegates, columns are months of the election cycle. When winner-take-most and take-all are lumped, it is clear that the Republican nomination process really wasn’t designed for someone to come roaring back in April.
How to teach them…
Traditional social science methods and argumentation theory. Start there. Visual methods such as these are very constraining, especially when automated. Adapting research questions to fit with the structure that the computer can understand can be quite helpful for students. This does limit the sort of abstract expressionism or cartooning you will be submitted, but I am not sure this is a bad thing.
Breaking down social science further: familiarity with sociology or communication studies would also be helpful. Students will also need to have a sense of data structure and cleaning. This is more a mid-level math skill.
Some graphic design and computer skills. This doesn’t mean they need multiple courses in drawing or to be proficient in C++. To do this kind of work they need to be comfortable enough with a computer to use research ware like Gephi, Tableau, and possibly R. In terms of critical vocabulary, the students need to be able to speak in the vernacular of graphic design and art, something like: harmony, movement, rhythm, proportion, unity, and so on. Even if they aren’t the best artists, they need to have taste, or at least a sense of visual style.
Students might come to your class with computer science, art, or communication backgrounds. Coming at each from an intermediate level is your best bet for reaching everyone. In general, your students will likely feel weak one of these areas, you can use their strength in another to build them up while they learn to be stronger in the others. Further, since this is a theory of static automated image processing, skills in cartooning and painting really shouldn’t be that much of an advantage. It is entirely possible that with your new platforms for image creation (gephi, imagej, tableau, R) that students who were bad at art or bad at math or bad at argument might become proficient.
What really counts for me is that we appreciate the ways that the representation of data is tied to research practice. Considering the output and the limits on the kinds of graphics that exist really forces clear thinking and data structure. Sometimes, we will visualize qualitative information. The act of translating your insights into something that can drive a machine is really useful.
Dynamic data visualization is something different. That will need to wait for another day.