"Well, you deliver a lot of speeches and a lot of them contain similar phrases, and they vary very little from one to the next."
-- Don Gonyea, April 13, 2004
People repeat themselves when they talk. There are words and phrases that particular individuals use over, and over, and over. This is even more true of politicians on the campaign trial in an era of sound bite policy.
That may be one of the reason that a fashion of this election season is to use counts of repeated words as an index of message and strategy.
The New York Times, for instance, summarized the 2004 party conventions by examining the frequency of key word use. This methodology has also been used to examine the debates (you can conduct your own analyses at Debate Spotter).
what does this all mean? If presidential candidates are actually repeating themselves, it might be possible to represent their arguments (or sound bites) with a Markov chain or finite state automaton.
In effect, this would involve building a random sentence generator that knows something about the lexicon and word use patterns for the respective candidates (the linked generator is similar to the one described below that I rolled for this project).
So that's what I did. Three finite state automata were trained on President Bush, Senator Kerry, and the multitude of questioners respectively using the transcripts of the first two debates. After training, these automata can be used to produce sentences that resemble (or in some case, are identical to) those produced by the original speaker.
The Bush automaton (which I named ELEPHANTRON) seemed to be the best trained of the three. This may be due to the President's preference for short, repetitive sentences (see Gonyea's comment from a press conference above). There are less possible branches, less cluases in the President's sentences before they reach a terminus. In effect, the ELEPHANTRON just recycles many of President Bush's sound bites.
The automaton trained on Senator Kerry (named DONKEYBOT) does a little less well, mostly becuase it gets lost navigating some of Kerry's subclauses. Kerry also didn't repeat phrases to the degree that that Bush did.
The AUTO-MODERATOR is the worst of the lot, mostly because it contains a variety of text, from the preambles at the beginning of the debates, to the questions, to procedural comments. It also contains text from over ten speakers.
You can access the automata here at Growlers, and view randomly generated sample output at The Bot Debate.
[UPDATE 10.14.2004] I've included last night's debates in the training set, so now DONKEYBOT has more statistics to quote, and ELEPHANTRON can work education into every answer.
[UPDATE 05.25.2007] The Bot Debate is currently offline due to some server misconfiguration that I can't understand right now. Oh well.
Monday, October 11th, 2004
permanent url