Language, in the language of Khanh Le: Forensic linguistic analysis

Novelist and essayist Christopher Morley once said "There are no precedents: You are the first You that ever was." That, apparently, is true language-wise as well, and law-enforcement departments around the world have been using this to help them identify authors of criminal documents. Dr. Ernst Kotze--Head of the Department of Applied Languages Studies at a university in South Africa--is confident that this method of identification, called forensic linguistic analysis, is "a more reliable means of identifying the author of a document" than handwriting, fingerprint and DNA analysis. (Link: Language errors used to establish identity)

Forensic linguistic analysis studies a document's language usage, word pattern, stylistic errors, and literary techniques to pinpoint the author, typically employing a method called stylometric analysis. Stylometry originated from earlier techniques developed to verify authenticity and authorship for artworks and plays, emphasizing "the rarest or most striking element" of a work. In modern time, with the help of computers, stylometry can reveal identifying patterns in even seemingly common speech. Some common methods of stylometry are:

- Writer invariant: the computer scans an unverified text to find the 50 most common words, breaks the same text down into 5000-word chunk, and scans the chunks individually to determine the occurrence of these 50 words in each. The occurrences are plotted on the same plan as numbers from a verified text, and analysts can from this determine if the two texts are from the same author.
- Neural network: providing a network with texts of known authors to train it to identify texts written by the same authors.
- Rare pairs: analysts study the rate of occurrence of a sequence of words. People often associate and use some words with a certain other word, so rare pairs can be a powerful tool in identifying authorship.

(If you would like more information on stylometry and the methods, here is the wiki link: Stylometry)

Even when it does not have sufficient information to identify authorship, forensic linguistic analysis can point out astounding connections and similarities between texts to estimate testimonies' reliability, etc. Language Log's Roger Shuy, for example, mentioned one such example in his September-27th post titled "Treason in Georgia." Shuy was a language analyst in a trial condemning Maia Topuria--a Georgian opposition-party leader--of plotting to overthrow the Georgian government. Using topic sequencing and phrase comparison, Shuy proved all eleven, supposed "independently produced" witness statements to be composed by the same person, in this case possibly the main-party police: the statements all introduce different topics in an almost identical order, and all contain phrases such as " "elucidation by television," "pretext of protection," "to liquidate the ministers," etc. worded in exact same ways. I won't go into more details, as all of you are probably familiar with this case already. (Link: Language Log-Treason in Georgia)

To me, it is extremely interesting that language can act both to unify us and to identify us. It connects us with the world, and yet at the same time allows us to add our own personalities and styles to the world through it. Languages and cultures are constantly evolving and transforming. Probably, in part, because of this unique feature of the language itself.

A side note: I decided to bring up this article because of what Professor Boroditsky said in class the other day: "We all speak English, but you probably speak your own English, and I speak my own English, and chances are our English are very different." (Or something along the line. I'm not sure of the exact quote, but I couldn't agree with it more.)

1 comment:

Steve said...: Great post! The field of forensic linguistics certainly supports the idea that we all have our own unique brand of language. As I commented on another blog, this field might reveal something about the organization of the human mind. That is, why and how do people come by their individual patterns of linguistic manner and error? Does this suggest that speech is mostly automatically and unconsciously generated rather than consciously constructed for each speech act in each new context? What else do these findings tell us and how could we test these hypotheses?; October 8, 2007 at 11:04 PM

Wednesday, October 3, 2007

Forensic linguistic analysis

1 comment:

Language, in the language of Khanh Le

Language, in the language of Khanh Le

Blog Archive

About Me