Warwick

English literature, electronic text and computer analysis: An impossible combination?

Claire Warwick
Department of Information Studies
University of Sheffield
Regent Court
211 Portobello Street
Sheffield, South Yorkshire S10 2TN
UK
c.warwick@sheffield.ac.uk

In 1991 Corns discovered that despite the potential usefulness of computational text analysis techniques in the study of English Literature, very little work had been published in the field which showed any evidence of their use. He hoped that this was due a lack of knowledge on the part of more traditional literary professionals. Knowledge is now more widespread and electronic text and analysis tools easier to find and use. However, the application of the same method of quantitative analysis of the research output in selected journals suggests that computational analysis of English literary texts is no more common now than it was eight years ago. This paper will suggest reasons for this, and argue that the discontinuity between the way that machines and humans read prevents the more widespread use of electronic texts by literary scholars.

Electronic text is still basically defined in terms of its content. (Renear) Thus the tools which we have at our disposal for analysing electronic literary text work in terms of information extraction. (eg. how many times does a word occur, in what collocation?) Even if the text is encoded, the searches we can perform are more complex versions of a content model. (eg how many times does Hamlet as speaker of the word Ophelia happen as opposed to the reverse?) Computational and corpus linguists have been able to produce a great deal of valuable work, based on this sort of data, yet to date very little has emerged as a result of applying computer analysis of electronic text in the field of English literature.

Researchers who are interested in tracking cultural or historical patterns in large amounts of data, or charting textual variants may find computational techniques a great use. However, most scholars still believe that the core activity of the literary critic in whatever language is critical analysis and close reading. Although we have not fully understood what we do when we read a literary text, we know that we do not simply collect quantitative data. Reading conflates the activities of information retrieval, (How many times does x occur?); text analysis, when we examine the significance of the data, (i.e. having found out how many times a word occurs, in a given writer, is it different from that of any of his contemporaries, and if so, does it matter to me?) and the identification of emotional effects (I notice that a character tends to be presented in such a way, this determines how I as reader perceive that character and the action in which they are involved). Therefore, while critics may use quantitative data to support further analysis, the definition of 'close reading' is much less easy. What we do know is that it involves intangible concepts such as sensibility, originality, creativity and is predicated upon things that are nuanced and unprovable. These characteristics can be comprehended by humans. But they are much more difficult to adapt to the right or wrong, on or off, world of logical hierarchies that are ideal for computer analysis. Furthermore, unlike linguists, literary scholars often do not need large quantities of information in order to come to their judgements, which they admit should not necessarily be absolute or objective. Humanists do not necessarily expect that a problem can be solved once and for all nor that their findings must be incontestable. (Watisboone)

To make any profitable use of computer techniques of analysis, humans must also be able to define exactly the problem under investigation, what the nature of the data is, and why results are significant. This is something that many English literature scholars find difficult, and this may be a reflection as much of the nature of the subject as the competence of the researcher. Text encoders might suggest that the text under analysis is insufficiently well analysed and marked up for the user's purpose. Perhaps therefore they should spend some time marking up their text. But what should they mark up? Even if they could define the sort of literary nuances that they are looking for, or translate them into an encoding system, would this really be a good use of time? The text would have to be so heavily marked up that the critic might as well just read it anyway.

Should humanities computing professionals therefore decry the tradition methods of literary scholars as insufficiently objective and rigorous, and try to encourage them to use computational methods to improve their standard of proof? (Smith) We could, but it seems likely to have very little effect on a long standing and still popular academic discipline. Perhaps computers are not ideally suited to the complex analysis of literary text, and perhaps we should accept this. Should we therefore resign ourselves to the terminal disjunction of literary studies and humanities computing? I think not, but we may need to accept that computational methods may leave some areas of the discipline changed and others untouched.

For example, the most successful use of computers in English literature has been in the fields of text creation and editing. This activity is better suited to the use of electronic text as it relies much more on objectivity, since manuscript A does or does not include a word that manuscript B lacks. There is no debate about this, it is a matter of proof. as a central apart of English studies. This is, in a sense, a move back in the evolution of English literary studies, since the study, display and analysis of textual variants was perceived as the essence of the discipline by its founders. Perhaps the use of computers will redress the balance and re-establish the importance of textual editing. This means that the main use of computers in English literature is presentational rather than analytical. Just as Nineteenth century scholars presented critical editions in print, so a new generation presents them electronically, and it is now for others to decide what to do with them. This may be 'simply' to read them.

References

Corns, T.N. (1991) "Computers in the humanities: Methods and Applications in the study of English Literature" Literary and Linguistic Computing 6 (2): 127-131

Renear, A. (1997) "Out of Praxis: Three (Meta) Theories of Textuality" in Katherine Sutherland (Ed) Electronic Text: Investigations in Method and Theory, (Oxford, Clarendon Press,), pp. 107-124

Smith, J.B. (1989) "Computer Criticism" in Literary Computing and Literary Criticism: Theoretical and Practical Essays on Theme and Rhetoric (Philadelphia, University of Pennsylvania Press) pp 36-49. p. 39

Watisboone, R. (1994) "The Information Needs and Habits of Humanities Scholars." Reference Quarterly 34(2): 203-216.