Book Recommendation Engines are HARD

I'm a computer geek that works at a library, and - by formal education - a librarian. So I end up thinking about how to recommend books to the people who pay my salary. This isn't something I work on on a regular basis, but several of my co-workers do. As a result, I've ended up thinking about how to do it myself, and also how a piece of software or a website might do it.

If someone you like says "oh, you should hear this really cool song," chances are you'll get on that and listen to it the next day. And if you don't like it, it's not a big deal: you tried, and now you have another musician filed away in your head under "probably not good." On the other hand if you like it - toss that onto your latest playlist and enjoy.

But when someone says "I saw this awesome movie last night" and tells you you should see it, you're probably going to think about it a bit longer - maybe check the genre (because your friend loves horror and you don't) or look it up on Rotten Tomatoes - check the reviews, find out what it's about. If you see it and you aren't happy ... "eh, I wasted an evening." Whether you liked it or not, at least the ever-present spoilers in every media stream you pay attention to cease to matter - you're up-to-date, you've seen the latest.

But when someone recommends a book - that's when we really pause and think. "What have they recommended before? Do they like my favourite genres? If they do, how accurate have they been in past recommendations?" This is because it's all about scale. A song takes you maybe a minute to locate, 3-5 minutes to listen to - and at that amount of time, even ruling out a song or artist feels almost like a win. But sitting down to watch a movie is roughly an hour and a half out of your life that you won't be getting back: unless you're a Film School major, this feels like time wasted. Now consider a book: depending on a lot of factors, a book will usually take you one to four weeks to finish. We could measure it in the actual hours reading, but hardly anybody does: this is about the time from start to completion. I think that's fair, because that portion of your life is gone once you've read the book. If someone recommends an investment that turns out to be bad, you're going to be unhappy with them in proportion to the amount of time or money invested. Books are a really big investment.

This time investment is also related to the issue of replayability. With songs, we're happy to listen to our favourites tens and even hundreds of times. With movies, we may rewatch our favourites once or thrice. But books? Not many people re-read books. Some do, but only rarely and only the true favourites.

Because of the time requirements for each of these items, we consume them in inverse proportion to their length. I've listened to tens of thousands of songs in my life. Being a particularly avid movie watcher, I've watched 3000 or more movies. But I have to suspect my book count is under 1000 - despite having grown up reading voraciously and becoming a librarian. And this fall-off in consumption also feeds into the recommendation-engine problem: any statistician will tell you the smaller the sample size, the less accurate the assessment. So if you feed a list of your favourite movies into a movie recommendation engine, it's going to get a fairly good picture of your favourite genres, actors, visual styles, and anything else it may be collating. And music recommendation engines will have even more fodder to work with (if you take the time to feed them that enormous brain-dump). But books - if I were tell a book recommendation engine "my favourite books are Dune and The Ghost Map," what the hell is it supposed to do with that? He reads science fiction (and it's not helpful that it's one of the best known SF novels on the planet - everyone likes it), and he reads scientifically oriented non-fiction. Recommend a random science book. Obviously, sample size is a problem and I need to give them more to go on - but even as an avid reader, I don't have a huge list of favourite books in any particular genre and those lists are often thoroughly eclectic. It's a bit easier to recommend for someone who's read all the Dan Brown novels and five of the Jack Reachers: send them off to read the rest of the Reachers, or give them more action-mystery (with a touch of conspiracy). But that's still a generalization (and not a very kind one, but maybe they're an undemanding reader).

This is why book recommendation sites will always be less effective than the equivalent sites for music and movies. And it's why we'll have to continue to make our own assessment based on what we know about a given author and book, and what we can glean from the reviews find. It's also why improvements in recommendation engines will always work best on the sites where you can enter the most data.