September 8, 2014

Books That Have Stayed With Us

By: Lada Adamic, Pinkesh Patel

Favorite books are something friends like to share and discuss. A Facebook meme facilitates this very interaction. You may have seen one of your friends post something like “List 10 books that have stayed with you in some way. Don’t take more than a few minutes, and don’t think too hard. They do not have to be the ‘right’ books or great works of literature, just ones that have affected you in some way.” If not great works of literature, what are the books that have stayed with us?

The following analysis was conducted on anonymized, aggregate data.

To answer this question we gathered a de-identified sample of over 130,000 status updates matching “10 books” or “ten books” appearing in the last two weeks of August 2014 (although the meme has been active over at least a year). The demographics of those posting were as follows: 63.7% were in the US, followed by 9.3%in India, and 6.3% in the UK. Women outnumbered men 3.1:1. The average age was 37. We therefore expect the books chosen to be reflective of this subset of the population.

We programmatically segmented the posts into lists, and found the most frequently occurring substrings, which corresponded to different books, e.g. “Anna Karenina by Leo Tolstoy”. However, the same book could appear as different substrings: e.g. just “Anna Karenina” or “Anna Karenina – Leo Tolstoy”. We clustered similar variants programmatically, hand tuning where the algorithm had failed to merge two popular variants. We then used the clusters to automatically match the book lists against the common variants of the top 500 most popular books.

Here are the top 20 books, along with a percentage of all lists (having at least one of the top 500 books) that contained them.

  1. 21.08 Harry Potter series – J.K. Rowling
  2. 14.48 To Kill a Mockingbird – Harper Lee
  3. 13.86 The Lord of the Rings – JRR Tolkien
  4. 7.48 The Hobbit – JRR Tolkien
  5. 7.28 Pride and Prejudice – Jane Austen
  6. 7.21 The Holy Bible
  7. 5.97 The Hitchhiker’s Guide to the Galaxy – Douglas Adams
  8. 5.82 The Hunger Games Trilogy – Suzanne Collins
  9. 5.70 The Catcher in the Rye – J.D. Salinger
  10. 5.63 The Chronicles of Narnia – C.S. Lewis
  11. 5.61 The Great Gatsby – F. Scott Fitzgerald
  12. 5.37 1984 – George Orwell
  13. 5.26 Little Women – Louisa May Alcott
  14. 5.23 Jane Eyre – Charlotte Bronte
  15. 5.11 The Stand – Stephen King
  16. 4.95 Gone with the Wind – Margaret Mitchell
  17. 4.38 A Wrinkle in Time – Madeleine L’Engle
  18. 4.27 The Handmaid’s Tale – Margaret Atwood
  19. 4.05 The Lion, the Witch, and the Wardrobe – C.S. Lewis
  20. 4.01 The Alchemist – Paulo Coelho

While there are many great ‘serious’ books on the list, the Hitchhiker’s Guide to the Galaxy makes an appearance at #7, and Harry Potter reigns supreme (although enjoying the advantage that it was most often referred to as a series and our clustering algorithm lumped all Harry Potter books into the same cluster). Stephen King’s dark novels have stayed with their readers as well (The Stand at #14 and the Dark Tower series at #64). In the complete list of the top 100, included at the end of this post, we see a number of children’s books appear as well. Although these may not normally be considered great works of literature, they tend to stay with us through the decades. In particular, two of Shel Silverstein’s books (the Giving Tree and Where the Sidewalk Ends) make it into the top 100, as does the Little Prince.

One can also look at connections between the books, e.g. ‘people who listed X also listed Y’, using pointwise mutual information. In the network visualization, each node represents a book, sized by the frequency with which it was mentioned, as an edge represents an unusual number of co-occurrences of the two books in the lists.

Each book is linked to another it occurs with more often than expected. The color represents whether the book was more often mentioned by women (red) or men (blue)

There is actually another kind of network that forms. While some people shared the meme without tagging, calling on all their friends to make their own posts, others tagged specific friends whose favorite books they’d like to know about. Even a small fragment of the cascade shows long (tangled) tagging chains through which it diffused.

Tagging links posts about favorite books.

Do friends tend to like the same books? We computed the number of books shared between lists linked via tags, which was a mere 0.4 books on average! This number was 4 times greater than the overlap of 0.1 books between any two random lists. It is also an underestimate, since our automated matching identifies only 5.3 books/list on average (rather than the full 10), due to matching on just the 500 most commonly mentioned titles. Nevertheless, the low overlap underlines that even in a world of relatively few highly successful bestsellers, lists of favorites tend to be rather different, even between friends.

Finally, the remaining top 100 books were:

  1. 3.95 Anne of Green Gables – L.M. Montgomery
  2. 3.88 The Giver – Lois Lowry
  3. 3.67 The Kite Runner – Khaled Hosseini
  4. 3.53 Ender’s Game – Orson Scott Card
  5. 3.39 The Poisonwood Bible – Barbara Kingsolver
  6. 3.38 Lord of the Flies – William Golding
  7. 3.38 The Eye of the World – Robert Jordan
  8. 3.32 The Book Thief by Markus Zusak
  9. 3.26 Wuthering Heights – Emily Bronte
  10. 3.22 Hamlet – William Shakespeare
  11. 3.21 The Little Prince – Antoine de Saint-Exupery
  12. 3.15 Sherlock Holmes – Sir Arthur Conan Doyle
  13. 3.15 Fahrenheit 451 – Ray Bradbury
  14. 3.12 Animal Farm – George Orwell
  15. 3.08 The Book of Mormon
  16. 3.05 The Diary of Anne Frank – Anne Frank
  17. 3.02 Dune – Frank Herbert
  18. 2.98 One Hundred Years of Solitude – Gabriel Garcia Marquez
  19. 2.83 The Autobiography of Malcolm X
  20. 2.78 Of Mice and Men – John Steinbeck
  21. 2.72 The Giving Tree – Shel Silverstein
  22. 2.68 The Fault in Our Stars – John Green
  23. 2.68 On the Road – Jack Kerouac
  24. 2.58 Lamb – Christopher Moore
  25. 2.54 Slaughterhouse Five – Kurt Vonnegut
  26. 2.53 A Prayer for Owen Meany – John Irving
  27. 2.52 Good Omens – Neil Gaiman and Terry Pratchett
  28. 2.45 The Help – Kathryn Stockett
  29. 2.44 The Outsiders – S.E. Hinton
  30. 2.42 American Gods – Neil Gaiman
  31. 2.41 Where the Red Fern Grows – Wilson Rawls
  32. 2.39 Stranger in a Strange Land – Robert Heinlein
  33. 2.38 The Secret Garden – Frances Hodgson Burnett
  34. 2.35 Little House on the Prairie – Laura Ingalls Wilder
  35. 2.31 The Count of Monte Cristo – Alexandre Dumas
  36. 2.31 Pillars of the Earth – Ken Follett
  37. 2.29 The Da Vinci Code – Dan Brown
  38. 2.24 Brave New World – Aldous Huxley
  39. 2.21 A Tale of Two Cities – Charles Dickens
  40. 2.21 Les Miserables – Victor Hugo
  41. 2.16 Great Expectations – Charles Dickens
  42. 2.12 Night – Elie Wiesel
  43. 2.12 The Dark Tower Series – Stephen King
  44. 2.07 Outlander – Diana Gabaldon
  45. 1.92 The Color Purple – Alice Walker
  46. 1.89 A Thousand Splendid Suns – Khaled Hosseini
  47. 1.88 The Art of War – Sun Tzu
  48. 1.85 Catch 22 – Joseph Heller
  49. 1.85 The Bell Jar – Sylvia Plath
  50. 1.83 The Perks of Being a Wallflower – Stephen Chbosky
  51. 1.78 The Old Man and the Sea – Ernest Hemingway
  52. 1.76 Memoirs of a Geisha – Arthur Golden
  53. 1.75 Tuesdays with Morrie – Mitch Albom
  54. 1.73 The Road – Cormac McCarthy
  55. 1.72 Watership Down – Richard Adams
  56. 1.72 A Tree Grows in Brooklyn – Betty Smith
  57. 1.68 Where the Sidewalk Ends – Shel Silverstein
  58. 1.65 The Girl with the Dragon Tattoo – Stieg Larsson
  59. 1.65 A Song of Ice and Fire – George R. R. Martin
  60. 1.65 Are You There God? It’s Me, Margaret – Judy Blume
  61. 1.64 Charlotte’s Web – E.B. White
  62. 1.63 The Time Traveler’s Wife – Audrey Niffenegger
  63. 1.62 Anna Karenina – Leo Tolstoy
  64. 1.62 Crime and Punishment – Fyodor Dostoyevsky
  65. 1.61 The Adventures of Huckleberry Finn – Mark Twain
  66. 1.58 The Shack – William P. Young
  67. 1.56 Watchmen – Alan Moore
  68. 1.55 Interview with the Vampire – Anne Rice
  69. 1.54 The Odyssey – Homer
  70. 1.54 The House of the Spirits – Isabel Allende
  71. 1.53 The Stranger – Albert Camus
  72. 1.52 Call of the Wild – Jack London
  73. 1.51 The Five People You Meet in Heaven – Mitch Albom
  74. 1.51 Siddhartha – Herman Hesse
  75. 1.50 East of Eden – John Steinbeck
  76. 1.50 Matilda – Roald Dahl
  77. 1.49 The Picture of Dorian Gray – Oscar Wilde
  78. 1.47 Zen and the Art of Motorcycle Maintenance – Robert Pirsig
  79. 1.45 Love in the Time of Cholera – Gabriel Garcia Marquez
  80. 1.45 Where the Wild Things Are – Maurice Sendak

[An earlier version of this post had 2 clusters representing the Chronicles of Narnia series. When these were merged, the series rose up to #10]