This paper proposes the importance of developing a realistic theoretical framework for the rise of mass digitization: the importance of the diffusion of innovations in shaping public opinion on technology; hyperbole and the myth of the ‘digital sublime’ (Mosco 2004); and the role of different media in shaping human understanding. It proposes that mass digitization can only be viewed as a significant deviation from the print medium when it emerges from the shadow of the Gutenberg Press and exerts a similarly powerful paradigm of its own, and that as a result there is a disconnect between the reality of the medium and the discourse with which it is surrounded.
In recent years companies such as Google have generated huge interest in their attempts to digitize the world’s books. Described by Crane as ‘vast libraries of digital books’ (2006), they provide a powerful future vision where books exist within a digital version of the mythical universal library. Yet we know very little about what to do with these books once they are digitized, or even how they are being used by researchers and the wider public. The contemporary debate is defined by hyperbole and the belief that digital texts will inevitably destroy the print paradigm, a discourse that reflects the rhetorical direction of many historical reactions to technological change.
At the same time, a new research method which Moretti labels ‘distant reading’ (2007) allows researchers to undertake quantitative analysis of these massive literary corpora using tools such as the Google Ngrams Viewer1. Important work on these corpora has been carried out within the Digital Humanities community, such as projects by Jockers (2011) and Cohen (2006), and further research is being made possible by Digging into Data2 funding from NSF3, NEH4, SSHRC5 and JISC6. However, as we shall argue below, it is vital that we understand the theoretical background of such research, and its potential effects on scholarly methodology in both DH and traditional literary analysis.
Rogers (2003) tells us that far from being a deterministic inevitability, the success of a new technology relies in no small part upon external human factors. The role of opinion leaders is essential to this, and in the case of mass digitization the literature is filled with examples of the fetishization of both print and digital technologies. Gurus and promoters tell us that digitization will improve the world, and render obsolete what has gone before; others express concern about the cataclysmic effects that new technologies may have on humanity (Mosco 2004: 24). The result is a powerful narrative of ruptures and dramatic change, a myth that is historically common yet rarely reflected in reality.
This narrative has been shaped by a debate that has spanned decades (Benjamin 1936; McLuhan 1962; Barthes 1977; Baudrillard 1994; Lanier 2011), and so it is essential to consider the influence that theorists have exerted on the contemporary debate, and the misconceptions that have arisen as a result. For example, McLuhan’s conception of a global village, a ‘single constricted space resonant with tribal drums’ (1962), is at odds with our experience of the contemporary World Wide Web. The village operates in a semi-closed system with clear boundaries, whereas the Web is unprecedented in its scale and openness. It more accurately resembles the loose structure of a modern city, and therefore mirrors the behavioural changes that have been noted in residents of large cities. With increased access to information comes an increased opportunity cost, the sense that whatever a person does will necessarily involve missing out on something else potentially useful (Deutsch 1961: 102). Accordingly, we can see that both city dwellers and web users exhibit similar behaviour in filtering and sorting the mass of information to which they are exposed. As a result of this promiscuous, diverse reading style (Nicholas et al. 2004) the authorial voice has been side-lined, and the growth of a specific cultural movement that hyperbolises quantitative analysis of literary corpora threatens to hasten this process.
Many of the quantitative methods we have already seen hold great potential as research methods (Cohen 2006; Jockers 2011; Michel & Shen 2010; Moretti 2007), but there is a wider social discourse that appears to have taken quantitative analysis as a sign that traditional methods have, as with print, become an unnecessary distraction. This culture has redefined the content of texts as information, in an overly literal interpretation of the death of the author that undermines both authority and context. Information, viewed in this manner, is an abstract entity analogous to computerized data and therefore open to the same methods of interrogation (Nunberg 2010). There is no doubt that quantitative analysis holds great potential when combined with close reading, but we have witnessed a rhetoric that argues that close reading becomes unnecessary when large data becomes available (Anderson 2008). What appears to be a revolutionary faith in mass digitization, though, disguises a deeply conservative technological determinism that is reflected in the manner that digitization still so closely remediates the print medium.
Instead of acting as a disruptive force, mass digitization borrows greatly from the print medium that it remediates, a feature common to technologies in the early stages of their development (Bolter & Grusin 1996: 356-357). Even quantitative analysis can operate as a confirmatory method; rather than creating new knowledge it often acts only to confirm what humanities scholars already knew. The digital form, then, still operates ‘not as a radical break but as a process of reformulating, recycling, returning and even remembering other media’ (Garde-Hansen et al. 2009: 14). The reality is that quantitative methods are most effective when used alongside the close textual reading that allows us to contextualize the current glut of information. Yet the high speed processing of huge numbers of texts brings with it a potential negative impact upon qualitative forms of research, with digitisation projects optimised for speed rather than quality, and many existing resources neglected in the race to digitise ever-increasing numbers of texts.
We must therefore learn more about the potential, and limitations, of large-scale digitization in order to ensure that a focus on quantity rather than quality does not override the research needs of the wider community. If accepted, therefore, this paper will present some of the results of Gooding’s on-going doctoral research into the use and impact of mass digitization projects, which will include case studies from institutions such as the British Library. It will argue that we must look beyond the noise that characterises the theoretical framework proposed above, and learn more about how the true impact of mass digitization.
Anderson, C. (2008). The end of theory: the data deluge that makes the scientific method obsolete. Wired. Available at: http://www.wired.com/science/discoveries/magazine/16-07/pb_theory [Accessed May 31, 2011].
Barthes, R. (1977). The death of the author. In Image-Music-Text. London: Fontana Press.
Baudrillard, J. (1994). Simulacra and simulation. Ann Arbor: U of Michigan P.
Benjamin, W. (1936). The work of art in the age of mechanical reproduction. In Illuminations. New York: Schocken Books.
Bolter, J. D., and R. A. Grusin (1996). Remediation. Configurations 4(3): 311-358.
Cohen, D. (2006). From Babel to knowledge: data mining large digital collections. D-Lib Magazine 12(3). Available at: http://www.dlib.org/dlib/march06/cohen/03cohen.html [Accessed February 7, 2011].
Crane, G. (2006). What do you do with a million books? D-Lib Magazine 12(3). Available at: http://www.dlib.org/dlib/march06/crane/03crane.html [Accessed January 7, 2011].
Deutsch, K. W. (1961). On social communication and the metropolis. Da 90(1): 99-110.
Garde-Hansen, J., A. Hoskins, and A. Reading, eds. (2009). Save as… Basingstoke: Palgrave MacMillan.
Jockers, M. (2011). Detecting and characterizing national style in the 19th Century novel. In Digital Humanities 2011. Stanford University, Palo Alto.
Lanier, J. (2011). You are not a gadget. London: Penguin.
McLuhan, M. (1962). The Gutenberg galaxy: the making of the typographic. Toronto: U of Toronto P.
Michel, J.-B., and Y. K. Shen (2010). Quantitative analysis of culture using millions of digitized books. Science Magazine.
Moretti, F. (2007). Graphs, maps, trees: abstract models for literary history. London and New York: Verso.
Mosco, V. (2004). The digital sublime: myth, power, and cyberspace. Cambridge Mass.: MIT Press.
Nicholas, D., et al. (2004). Re-appraising information seeking behaviour in a digital environment: bouncers, checkers, returnees and the like. Journal of Documentation 60(1): 24-39.
Nunberg, G. (2010). Counting on Google Books. The Chronicle of Higher Education. Available at: http://chronicle.com/article/Counting-on-Google-Books/125735 [Accessed September 15, 2011].
Rogers, E. M. (2003). Diffusion of innovations. Fifth Edition. New York: Simon & Schuster.