Friday, January 12, 2007

Statistics and Computing for the Humanities

When reading Natalie Bennett’s blog Philobiblion today I learned about something called the Digital Companion to the Humanities. Sounded intriguing so I checked it out.

As far as I can tell the DCH is basically an overview of the field of humanities computing. Ok, so what does that mean? Well, they give a history of the field and provide some information on principles, previous applications, and new directions for computing in fields such as music, art history, classics, lexicography, literary studies, archaeology (not something I typically think of as a humanities discipline, but blame my social sciences background for that one), linguistics, multimedia and performing arts.

Also on something called Robotic Poetics. I haven’t had a chance to look into this yet and the truth is that it’s unlikely that I ever will.

Veeeeeery interesting.

I started randomly clicking around and found the damned coolest thing Look below. The figure shows you some of the output from a simple principal components analysis (PCA) of plays. The author referred to this type of study as a stylistic analysis. Yes. OMG. What fun!

For the uninitiated, PCA is a type of linear stats that can help you simplify your data or with a similar statistic, identify latent variables within your data. Say you take an online survey to rate your own sexiness. (Come on, you know you’ve done it. It probably happened around the same phase of your life in which you posted your picture to Hot Or Not) The survey includes 100 questions (which due to your total unsexiness you had nothing better to do than answer them) and takes about 20 minutes to complete.

If the survey writers has used PCA they might not had to include 100 questions. If they were smart and wanted to increase the number of sexy people who would have the time to take the survey, they could use PCA! This would allow them to use fewer questions but still get a representative measure of sexiness, assuming they had one to begin with.

I do want to mention however that I disagree with the author’s use of PCA in this instance. There are two types of exploratory factor analysis, of which PCA is one and principal axis factoring (PAF) is another. Ideally, PAF should have been used to determine the underlying factors of “sexiness.” There are important underlying statistical differences. So the ideal applications of the two methods are actually unique. Though it’s very common to see these statistics misused--even in fields in which they have been well-established for decades. I would have used PAF for the example the author provides. If anyone cares to ask me why I will be happy to explain it to you.

Given that, is there anyone out there who is looking for people who can apply PCA to fun things like art, music, literature…? Seriously I’m available. I have a hunch that it’s a pretty wide open field, although I could be wrong.


Post a Comment

<< Home