Twitter Bots

This post is part of a collection on Bots.

I've already posted about this too much on Twitter, but over the last couple of weeks I've turned out a couple of bots: @ArkhamSingles, @LesserKnownCase, and @CyberSkyMall. The technique is basically the same as the (now mostly broken) Poe-based character generator I wrote a few years ago. (update 2016-06-16: the Poe generator went offline a few years ago.)

The way it works is simple: first, using a part-of-speech tagger, pull all the nouns and adjectives out of a source document. Then, using WordNet, sort the nouns into broad categories, in this case events, locations, items, and words that describe people. Create a fill-in-the-blank sentence with some adjectives in front of nouns of the right category and you've got MadLibs.

The fun thing about the technique is how authors are as distinctive as you'd expect; on top of that, since the method just uses lists of words, it's easy to mix things together so you can get, say, half Indiana Jones and half Lovecraft.

I put an old version of the code on GitHub a while back; I'm cleaning it up to be a little easier to use. Python's NLTK helped make the tricky parts easy (though the POS tagging and sentence segmentation is really slow), but unfortunately there's nothing quite as mature in Javascript. While natural has the WordNet part taken care of, pos-js uses a single pre-trained tagger, so I need to test it to check the quality of the segmenter and tagger (both already problems in NLTK). Especially given that pos-js is semi-unmaintained it might make sense to just implement something from scratch. Of course, even if all those problems are worked out WordNet and a pre-trained tagger bring a lot of data to load into the browser for a one-off job.

A much easier proposition is putting the pre-compiled list files online so they can be used to make generators; Abulafia makes this easy, and I've gone ahead and added generators for Lovecraft, Leiber, and Dunsany. If there's any other author I should toss up, do tell me.

Besides that the next thing to do is make a Japanese version - since Mecab and Japanese Wordnet are both relatively easy to use that'll actually be easier than a web port. Ψ

2014-03-10T23:55:55+0900