Sunday, June 5, 2016

A planning heuristic for DH projects

Our time in Easton, PA on the beautiful Lafayette College grounds wrapped up on Friday with a final, half-day session of the #lafdh workshop. We spent the time reflecting on what we learned together - this included Ryan and myself - and then I introduced one last thinking tool that we left our participants with. It was a great way to end a week of fun, hard work, and great ideas.
Reflecting: Learning Together
 I asked folks to respond to five questions corresponding to our key learning goals for the workshop. What did we learn about DH? about coding and programming languages? about computers and what we can make them do? about the limitations of computational methods? and about one another? That last one tends to take folks by surprise, but I always include it in my list of learning outcomes when I am teaching, leading workshops, or doing any kind of similar activity.

Why else spend so much time in a room together if we don't see some value in learning about the others around us? In hands-on workshops like #lafdh, we often find that our colleagues are our richest resource for learning. The relationships we start and/or strengthen in such experiences can persist long after the memory of a few Powerpoint Slides or a Python demonstration fades too. We can keep learning from one another once we have come to know and trust our colleagues. Ryan & I probably had the most to learn about all the others in the workshop because we were the newcomers.

We had a great time coming to understand each person's areas of specialty, hearing about their work and their experience, and seeing how they were most comfortable reasoning their way through new challenges. We had a nice blend of approaches in that last regard. Some preferred a top-down approach, reasoning from principles to integrate details. Others preferred a more inductive way, diving into the details to make order from the patterns they found.  

One thing I learned...
I had a few really valuable takeaways from this workshop that will illuminate my own scholarly work over the next few months. These came from the multiple opportunities I've had, along with Ryan, to explain what we have been doing to new audiences over the last few weeks. At #lafdh, we encountered an interdisciplinary group with a varying level of experience with DH. Half were library folks, and the other half were Humanities scholars and teachers, most from various areas in English Studies.

In response to the "what can we get computers to do" question, I realized that Ryan & I's work on "topoi mapping" - something that you can see working in both the Hedge-o-Matic and the Faciloscope - combines location-based and semantic analytic approaches in roughly equal measures to reliably locate "rhetorical moves." Combining the two gives us flexibility to correctly classify a language structure that we cannot reliably pin down to a certain number or type of words. Rhetorical moves are n-grams - chunks of text of undetermined length - that may exhibit some regularity of form in terms of their lexical components, but are highly variable nonetheless. You can do the same move with vastly different words, in other words (heh).

I'd never had a moment to distill our analytic approach into such a tight (if still a bit dense) form as that above. Nor had I tried to theorize from top to bottom - taking into account the specific transformation we perform on the texts we analyze - precisely what combination of steps we take to find something like a "hedge" in scientific writing. It came over the course of a day or so as one of those rare moments of clarity for me! I explained it out loud to the group and "tuned in" to what was coming out of my mouth with a sense of wonder. Ah ha! That's what we've been doing! So...look for more about that in a publication upcoming, I am sure.  

DH Planning Grid w/ "Master Trope" Questions
Planning a DH Project
I really hope our workshop participants found the final activity we did useful. Ryan & I walked them through a planning process that we think represents a good way to plan DH projects. Here's a peek at it.

On the X axis, we list actions that also correspond to common DH team roles: research(er), develop(er), and then the folks who think about the user experience that the team aims to facilitate. These can coincide in more than one person, of course, but they represent what are often distinct areas of interest, expertise, and work in any given interval of time on a project.

On the Y axis, we have the DH lifecycle I wrote about before. We'd spent the day before going through that with the participants in a hands-on way in an attempt to understand how DH work proceeds. Finally, below the grid, there is a prompt to go through and fill in the boxes in three passes. The first is to generate questions, the second is to generate to do list items, and the third is to plan desired outcomes.

In the grid above, I'm showing the guiding questions or "master tropes" for each of the DH activity roles. The researcher(s) ask "why" - why do the project? why do it this way? with these texts? etc. The developers ask "what" - what are we doing? with what? in what ways? And the user experience folks ask "who?" - who's looking at this? who needs to access it? who needs to understand it? All three share the all-important question: "how?" How shall we proceed? The researchers might ask how *should* we do it? The developers converge on how *will* we? while the user experience folks continually raise the question of how *can* we...?
Planning grid with Outcomes for HoM highlighted

I like to use this planning process with students as well as with project teams. Planning questions, activities, and outcomes is a good way to help all the team members feel some ownership with the project. Coming back to these decisions as the project progresses is also a good idea, because things change.

One thing folks are often pleasantly surprised by is the way each phase of the lifecycle produces valuable outcomes. The example grid here shows some for the conjecture/gather (i.e. theoretical) stage as well as the interpret results stage for the Hedge-o-Matic. That Ryan & I think of our work building applications as *primarily* theoretical in nature can come as a surprise for some. That it can also result in useful resources for folks who may or may not be interested in our theoretical work is a nice bonus!


Friday, June 3, 2016

A DH Project Lifecycle

On Thursday, we kicked it up a notch (on the abstraction scale) in the #lafdh Lafayette College Digital Humanities Workshop and worked through a DH project lifecycle. The goal Ryan & I had for the participants was to spend a day doing DH "cosplay" - spend a day in the life of a DH scholar - to get a feel for what it is like to do the work, make the decisions, fail forward, and yes, see your project actually yield some interesting results.

We started in the morning by proposing the following four steps for the DH project lifecycle:

1. Conjecture & Gather Resources
2. Prepare Data
3. Run Analysis and/or Synthesis
4. Interpret Results

This is a very simplified sequence, of course, and it serves more as a heuristic than anything else. But it is still pretty accurate if we also add the requisite "rinse, repeat" as a fifth step. Many, if not most projects proceed this way with a few notable (and perfectly reasonable) variations such as in #1, resources (e.g. an archive of original texts or images) can precede conjecture and in #3, synthesis/analysis can appear in either order.

For our participants, we added some detail. This was to condense the experience into a single working day. Not unlike a cooking show on TV, we wanted to teach the "recipe" but we also had to condense some of the steps that take a long time to fit into our scheduled programming time. So we had a version already done and ready to come out of the oven golden brown & delicious, as it were.

Our version of the four steps, above, then, included a bit more detail like this:

1. Conjecture & Gather Resources: Ask a question or make a guess based on theory and/or your available evidence

2. Prepare Data - Learn text processing basics; build a working corpus

3. Run Analysis in Custom-Built Python Environment - Select Analytic Tools & Configure I/O pipeline using Python

4. Interpret Results – Do we see what we expect? Are we surprised by anything? Can we get to a useful, reliable result?

We didn't make *all* the decisions for the group, but we did decide that we'd be doing a sentiment analysis using a naive Bayes classifier trained on the Cornell movie review corpus. Two of our seminar participants provided our analytic target - a set of student reflective essays known as "literacy narratives." This is a genre used often in writing courses. We ask students to reflect on their history as readers and writers, consider their experiences in formal and informal settings, and to set some goals for themselves as writers.

We decided, as a group, that it would be interesting to see if students generally had negative or positive feelings about their literacy experiences. So to find out, we trained a classifier to read these essays and, wherever students mentioned a particular word related to writing, to classify the sentence containing that word as having positive or negative valence.


We worked through all four steps in the list above. As with most projects of this type, quite a bit of text processing is required in order to make sure the classifier is picking up the "signal" that we are looking to hone in on. After working through that, near the end of the afternoon, we had a result! We had built a simple machine that reads a student essay and returns sentences that include references to "writing" and then makes an assessment about whether the students were writing with generally negative or postive feelings about their experience. After a spot check, we saw that it not only worked pretty well, but it helped us to formulate some new thoughts - a new conjecture - to start our process all over again.

Of course, we wouldn't have to start at the beginning this time. We could press ahead and make more ambitious goals for ourselves on the final day of #lafdh.

Wednesday, June 1, 2016

Writing/Code

Today in the #lafdh workshop we had folks working together to write original functions in Python. Ryan introduced six basic methods that are built into Python that are very useful for text processing:
  • print - print a text string or the value of a variable
  • lower - return a string in all lower case
  • len - get the length of a string
  • split - turn a string into a list, broken (by default) on white spaces
  • join - turn a list into a string
  • type - get the data type of the selected object (usually the value of a variable)
and with these, plus what we learned about variables yesterday, folks wrote functions to do cool things like
  • find out how many sentences are in the novel Moby Dick (or any other text)
  • turn a plain text passage into a list of comma separated values (.csv) so it can be opened in Excel
  • randomly switch words in a passage around, making "word salad"
The goal was to build on our work yesterday, gain confidence and experience working with a programming language, and to use computational thinking to understand how we Humanists and computers, though we see texts differently, might do some cool things together.

There were high-fives and cheers before lunch as code snippets executed without errors. And after lunch, there was a triumphant show-and-tell in which our merry band became real programmers by showing off their code and hearing from other programmers how they would have done it differently. :)

We also did some work in the afternoon building a topic model using gensim. Using libraries, moving into an IDE (from iPython notebooks) and working out i/o workflow were the real objectives in that introduction, but we did get to see and discuss the results of an LDA topic model. Real DH stuff after just two days!

A DH learning pattern

Our #lafdh #squadgoals
Yesterday, we met a fantastic group of folks taking our DH workshop at Lafayette College. As must be done with all such events, we made a hashtag: #lafdh

We spent yesterday doing a few activities focused on computational thinking. As workshop leaders, our goals for the day were to help folks build confidence to push beyond their current comfort levels, and to try out some new ways of working. We also offered folks a learning pattern - kind of like a pedagogical approach, but more learner-centric - for the DH-style mantra "more hack, less yack."

Think of it working at a variety of scales where you are trying to overcome anxiety/build confidence, build skill, acquire knowledge, and (re)consider the value of something new. It could be a piece of software or hardware, a new process or practice.

The Learn-by-Doing learning pattern goes like this:
  • Use the thing
  • "Read" the source code and/or (cultural) logic of the thing
  • Put what you have read into your own words
  • Make some changes to the thing and see what happens
  • Make something new based on the thing by modifying or building on it
  • Reflect on what you learned
As with many such patterns all of it is best understood as a cycle or recursive process. In fact, the first three are kind of an internal loop too. When I'm learning to code something new and trying to figure things out, I repeat those top three in rapid iterations for some time usually before moving on to number four.

Today, we'll try to move this pattern into the realm of "habit" for our participants. On deck: processing text with Python!