Friday, June 3, 2016

A DH Project Lifecycle

On Thursday, we kicked it up a notch (on the abstraction scale) in the #lafdh Lafayette College Digital Humanities Workshop and worked through a DH project lifecycle. The goal Ryan & I had for the participants was to spend a day doing DH "cosplay" - spend a day in the life of a DH scholar - to get a feel for what it is like to do the work, make the decisions, fail forward, and yes, see your project actually yield some interesting results.

We started in the morning by proposing the following four steps for the DH project lifecycle:

1. Conjecture & Gather Resources
2. Prepare Data
3. Run Analysis and/or Synthesis
4. Interpret Results

This is a very simplified sequence, of course, and it serves more as a heuristic than anything else. But it is still pretty accurate if we also add the requisite "rinse, repeat" as a fifth step. Many, if not most projects proceed this way with a few notable (and perfectly reasonable) variations such as in #1, resources (e.g. an archive of original texts or images) can precede conjecture and in #3, synthesis/analysis can appear in either order.

For our participants, we added some detail. This was to condense the experience into a single working day. Not unlike a cooking show on TV, we wanted to teach the "recipe" but we also had to condense some of the steps that take a long time to fit into our scheduled programming time. So we had a version already done and ready to come out of the oven golden brown & delicious, as it were.

Our version of the four steps, above, then, included a bit more detail like this:

1. Conjecture & Gather Resources: Ask a question or make a guess based on theory and/or your available evidence

2. Prepare Data - Learn text processing basics; build a working corpus

3. Run Analysis in Custom-Built Python Environment - Select Analytic Tools & Configure I/O pipeline using Python

4. Interpret Results – Do we see what we expect? Are we surprised by anything? Can we get to a useful, reliable result?

We didn't make *all* the decisions for the group, but we did decide that we'd be doing a sentiment analysis using a naive Bayes classifier trained on the Cornell movie review corpus. Two of our seminar participants provided our analytic target - a set of student reflective essays known as "literacy narratives." This is a genre used often in writing courses. We ask students to reflect on their history as readers and writers, consider their experiences in formal and informal settings, and to set some goals for themselves as writers.

We decided, as a group, that it would be interesting to see if students generally had negative or positive feelings about their literacy experiences. So to find out, we trained a classifier to read these essays and, wherever students mentioned a particular word related to writing, to classify the sentence containing that word as having positive or negative valence.


We worked through all four steps in the list above. As with most projects of this type, quite a bit of text processing is required in order to make sure the classifier is picking up the "signal" that we are looking to hone in on. After working through that, near the end of the afternoon, we had a result! We had built a simple machine that reads a student essay and returns sentences that include references to "writing" and then makes an assessment about whether the students were writing with generally negative or postive feelings about their experience. After a spot check, we saw that it not only worked pretty well, but it helped us to formulate some new thoughts - a new conjecture - to start our process all over again.

Of course, we wouldn't have to start at the beginning this time. We could press ahead and make more ambitious goals for ourselves on the final day of #lafdh.

No comments:

Post a Comment