Digital Media Web Blogs > Web

Perl Success Story: Networked Qualitative Data Analysis with Perl and XML


Patrick Carmichael, a lecturer at the University of Reading in the UK, sent me this story. In addition to teaching teachers to use new technologies, he carries out research into how low-cost network technologies can be used to support civil-society projects, effect social change, and aid in conflict resolution around the world.

Networked Qualitative Data Analysis with Perl and XML

My work at the University of Reading involves research and development in two main areas. One is the evaluation and development of network technologies for use in education projects in the UK; the other is in a field broadly described as ?development education?. It was to talk about the latter that I attended YAPC 2000 at Carnegie-Mellon University, where I discussed how I was trying to apply ideas from development theory to the deployment of hardware and software in Southern Africa. In particular, I talked about how software developers could learn from ideas like ?appropriate technology? and how open source software was a vital element in the any strategy designed to build sustainable user and developer communities around the world.

Since then, I?ve continued to use Perl in a range of projects ? most recently developing a knowledge management system for an non-governmental organization in Egypt which provides training for other workers in the not-for-profit sector. In all the projects I?ve developed, budgets have been tight and the network infrastructure shaky, but there is enormous enthusiasm on the part of those with whom I work.

A marvelous phrase - and associated concept - to which I was introduced when working with journalists and academics who have lived, worked and networked through the wars of Yugoslav disintegration was ?tactical media?. This involves using what resources you have to the greatest effect, looking for innovative solutions to problems, and being prepared to move at high speed to respond to changing circumstances. When the network landscape ? and users? access to it and to each other ? is subject to change, then flexible, freely-available tools like Perl really come into their own. So in many parts of the world, you will find web mail, newsgroups, listservs, email auto-responders, groupware applications and custom-built server and client applications being patched together to form ad-hoc but functional networks with liberal applications of Perl glue.

One area that my academic colleagues identified as being lacking in their ?toolkit? was a lightweight and, above all, low-cost, computer-aided qualitative data analysis software (CAQDAS) tool. Researchers characteristically use these to analyze free-form texts such as interview transcripts. While some analysis can be carried out using the advanced features of proprietary ?office? applications, dedicated CAQDAS software allows the attachment of ?codes? and ?memos? to text fragments ? these codes and memos themselves can then be retrieved, sorted and even coded and ?memoed? themselves. Many dedicated CADQAS software packages are expensive and are adapted for installation on single computers; it is commonplace to find such software on the networks of Universities in the UK, but many of my potential ?client group? had intermittent access to the network via shared machines. In many cases, I knew their only access was in public spaces such as libraries, ?telecenters? and cybercafes.

In the university, we also had students ? often teachers studying part-time for Masters? and Doctoral degrees ? who wanted to be able to analyze data they collected in classrooms and other research sites without having to come to the campus. They were already supported via web mail and ?managed learning environment? software such as Blackboard, but they lacked access to data analysis tools.

A networked CAQDAS application offering the opportunity for researchers, data and analysis (the latter being securely stored server-side) to be distributed across the network seemed to be the answer. Inspired by Jon Udell?s ?Practical Internet Groupware? in which he describes a ?reviewable document base? and by the development of a number of Perl modules for parsing and querying XML (I used XML::Parser and XML::Twig), The first stage in the development of the application was to built a skeleton text-retrieval system based on regular expressions. While regular expression support already exists both in a range of generic and dedicated QDA software, but that offered by Perl, particularly when combined with other built-in functions and modules, is particularly rich, and it proved relatively easy to develop my initial text-retrieval system into a "code-and-retrieve" one in which users could retrieve data on the basis of text matching, attached codes or combinations of the two.

While some qualitative data make reference to specific dates and periods, others, particularly those of children, are more vague and make references to sequences of events without providing specific dates. When coding accounts, dates were converted wherever possible using Date::Manip. Thus ?the sixth of April, 1994? could be coded as: <date strictdate="1994-04-06">the sixth of April, 1994</date> allowing searches across a database for references to specific dates or timeframes. Other Perl modules and interfaces used included ?String::Approx? to allow approximate matching of text strings and ?Lingua::Wordnet? which provided an interface to the Wordnet lexical database in which nouns, verbs, adjectives and adverbs are organised into synonym sets, each representing one underlying lexical concept. This suite of ?middle-tier? features allowed the development of an effective data retrieval system operating through a web browser, with XML data files being parsed, and tags containing codes converted into hypertext links to further information or "pop-up" labels. A search facility allowed users to retrieve documents or document fragments, different regular expressions allowing the user to set the target or targets of their searches; their context; the amount of that context to be displayed; and the mode of presentation.

At this stage, the application allowed users to convert qualitative data into hypertexts and to ?explore? them. What was still lacking, however, was what QDA users refer to as a ?pencil-level richness? ? the capability to select a text block or fragment and attach either one of a set of predefined codes or a more substantial ?memo? through the user interface of the application ? in this case, the web browser. Automatic numbering of paragraphs within the XML of source documents allowed the generation of ?add a memo? links to the HTML and a trivial CGI script allowed these to be associated with the source document. Memoing user-selected text fragments seemed to pose more of a problem until I investigated the DHTML object model for web pages - the accessing the ?selection? and ?textrange? objects in Javascript or VBScript made it possible to submit both the originating text, its position in the entire document and the memo or codes via CGI. When this is submitted, the memo itself is stored as XML data fragments, which can be viewed alongside the source data, or independently of it, in a number of formats.

We are currently using the application ? provisionally named ?Codex? (code + XML = a body of knowledge; annals; a literary or scriptural corpus) ? in a number of areas. In addition to being made available as one of a set of lightweight network tools for development education work, it is likely to form part of a suite of network tools offered to teacher-researchers taking part in curriculum development projects in the UK. As CAQDAS applications go, it is basic; but then, it aims to provide the functions that its users need most while making minimal demands on their hardware, software and budgets. And, as their needs and expectations develop, so too might Codex. When my Inbox contains messages which read along the lines of: ?It?s good, but what would be really useful would be if it could ??, I know not only that someone is using it, but also that we are engaged in a different - and characteristically ?Perlish? - kind of participatory learning in which distinctions - between teachers and students, and between developers and users - are blurred.

A more complete description of the prototype application ? together with a discussion of its relation to other types of CAQDAS is available online at: CAQDAS

--Patrick Carmichael, University of Reading

To learn how large and small companies are using Perl to meet their goals, check out Perl Success Stories.

If you have a Perl success story of your own that you'd like to share, please let me know. You can reach me at: betsy@oreilly.com

Categories





AddThis Social Bookmark Button

Read More Entries by Betsy Waliszewski.

Recommended for You

Topics of Interest

Archives


 
 


Or, visit our complete archive.