The Laser Syriacum
In early September I released a new bot, the @lasersyriacum. Designed to develop a personality in response to feedback from followers, in the short time it's been alive it's been surprising, frustrating, and educational, and it's only just getting started. Since it's been running smoothly for a while I thought it was time I wrote up what exactly I'm trying to do and what's happened so far.
How it Started
The inspiration for the Laser Syriacum comes from a bot called Archillect. Archillect is a bot by Murat Pak that finds images that fit a certain aesthetic - cyberpunk, industrial photography, glistening monochrome futures - and reposts them. I thought it would be fun to make something similar, but with more emphasis on crumbling castles, mossy forests, and dusty libraries. At this point I would say I have a qualified success.
How it Works
I first thought of making something like Archillect in January, but got hung up on some of the details and put it aside. In early September I remembered the project and realized that with a few simplifications I could knock something out in afternoon, and thus the Laser Syriacum was born.
The basic algorithm has evolved over time, but the broad strokes are like this, with all images coming from either Flickr or Tumblr at present:
- There's a list of potential sources for posts ("aspects"), such as authors or tags, associated with scores.
- A few aspects are selected with a random weighted choice, and candidate posts are gathered.
- Candidates are pruned based on likes and other information from their sources.
- Remaining candidates are scored using aspect data, not just for the aspects that selected them, but also for any happy accidental overlap they may have. Unknown tags are treated with suspicion.
- After scoring the candidate with the most points is posted.
- Responses to posts (RTs and favorites) are monitored for a while and add to the scores of aspects associated with the post.
To get the bot started, I seeded it with a list of tags it considered as already having a good rank, like #history and #moss, and let it go from there.
I have had to prune things occasionally - I have a graylist of tags that tend to confuse the bot that I need to update every so often, and I've had to delete outright pornography (just once), borderline pornography (just twice), and on rare occasions other inappropriate images.
An example of a tag I've kept that confuses the bot is "snails" - people enjoy pictures of the slimy creatures, but it causes the bot to post My Little Pony fanart of a minor character every so often, which so far has gone completely ignored. "Beast" I had to greylist rapidly - while it's used for some mythical creatures, it's used far more often for a Korean boy band. "Mythical Beasts", incidentally, also had to be graylisted since it's mainly used by fans of a particular pair of Youtubers. One of the worst was very early in the bot's life, when this GIF got retweeted:
If you thought that was tagged with "cat" or "cash" or even "funny" you'd be wrong - it ended up just causing the bot to re-post GIFs tagged "taurus" or "astrology", as well as other posts from that blog. While from a distance this was amusing wackiness, without understanding the inner workings it seemed to come from nowhere and even the people who liked the cat weren't enjoying it.
The problem with these tags and posts wasn't that the content they generated was inappropriate, just that the bot couldn't respond appropriately since it can't differentiate the uses of the tags. Surprises are welcome, but not if they just bore everyone.
How it's Grown
While it's mostly gotten better over time it hasn't always been smooth sailing. For several weeks it would post the tag it used to find a picture. I expected this to attract attention, but I didn't expect it to work only for a certain subset of tags. During this period a benign invasion of freemasons left the bot with a lasting fondness for the occult in general.
Even after it stopped using tags at all, it caught on in surprising places. Reposting an Inktober drawing by someone known as Moon Rising, found via the "Sega" tag, somehow caught on with Jet Set Radio fans, ultimately being retweeted by the composer for the game. To date this is still the most popular post by the bot, and while very few (if any) of the Jet Set Radio fans stuck around, it still posts fan art occasionally.
Since then a few posts have managed to rack up multiple retweets, but besides a few JSR tweets shortly after the first nothing has compared to that burst of popularity. Besides the fondness for JSR and sequels it left the bot with an affection for all kinds of retro games that provides a bit of color on the timeline.
One thing I wanted to make sure to do with the bot was to avoid reposting art by people who weren't open to it, as well as always providing credit. This is one reason I started with Tumblr as the primary post source - while it's not always the case, reblogging being a core part of the platform means people are more likely to be accepting of re-sharing in general. Since focusing on historical subjects means it posts public domain art often I haven't had to worry about re-posting too much, though when it posts illustrations I try to check the source blog to make sure there's no request not to repost things. I haven't had to delete anything yet, and I've been surprised by how many blogs welcome reposting with credit.
With Flickr posts the situation is even clearer than with Tumblr - Flickr has a rights field, so I only repost Creative Commons or Public Domain images.
It's true that where the bot finds the image isn't always the original source of an image. Until something like Mediachain catches on, I don't think there's anything I can do but deal with problems as they happen. I'm not completely happy with this, but I think it's no worse than if I were reblogging things manually.
I also considered the strategy of not reposting images directly, but just posting links and letting Twitter's embed feature do the rest. While this would be great from many perspectives - changes to the original art, including deletions, would be reflected for one - unfortunately, as the tweet below demonstrates, Twitter crops images in undesirable ways. Besides that third-party client support is mixed and GIFs linked this way simply don't animate. I'd like to revisit this idea but it will need some work.
If the bot has posted something of yours and you want it taken down, please contact me on Twitter any time.
One unexpected benefit of leaving in the source links is that every post has a story behind it. In its short life the bot has taught me about volvelles, the mother of thousands succulent, 3D Tic-Tac-Toe, a floating sculpture with a photography contest every year, an old telephone tower in Sweden, the atomic clock that determined Japan Standard Time, and a dog-powered sewing machine. It's a continuous source of wonder, delight, and the strange.
Where it's Going
My ultimate goal is to make the bot open source in a way that's useful for other people. The basic principles should work for most any kind of tags employed on Tumblr and Flickr; while there's a kind of psychic landscape that makes some tags connect to everything, with a sufficiently unique starting point it's not hard to build something with character.
As an aside, to give an example of the tag landscape, there are a large number of tags that identify black and white photos - "black and white", "black & white", "bw", "b/w", "b&w", "monochrome", and more. Unless special care is taken to merge all these it's easy to have them over-represented, which at least partially explains why Archillect has such an easy time avoiding color.
I know that waiting "until the code is clean" often means never releasing anything, but the main thing I'm working on now is having a way to give the bot instructions - tags to ignore, posts to seed it with, and so on - without requiring fiddling with the code. Since this is a hobby project I don't have a release date, but optimistically I should have something to test in a few weeks.
If you find this interesting, do give the bot a look on Twitter and teach it a little about what you think looks good. I guarantee it appreciates the instruction.
Thanks, Thanks, and More Thanks
This bot obviously wouldn't be anything without all the people putting wonderful things on the Internet. While a few blogs I knew before have turned up repeatedly, such as Design is Fine and Dansk Jävlarna, it's also introduced me to blogs from libraries like the Linda Hall, Bodleian Libraries, Othmeralia, the National Library of Poland, and Missouri University, independent public-domain searchers like the amazing Nemfrog, and surprising blogs like Illuminati Zeitgeist. There's a little bit of everything.
What's in a Name?
The name "laser syriacum" refers to the silphium, a kind of fennel that hasn't been exactly identified but supposedly had all kinds of powers, as well as a possible historical link to the heart symbol (❤). I learned about it while doing research in college on "document dating", the problem of having a computer guess when a book was written by reading it. Looking at examples where my program had failed particularly badly, one had the word "laser" in The Adventures of Peregrine Pickle, a novel from the 1750s. I thought this was an OCR error, but it turned out to be the obsolete scientific name for this mysterious plant. While the beam-of-light laser is an acronym, the genus laserpitium takes its name from the mysterious plant and, origins aside, remains in good standing. Ψ