Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

scem0

macrumors 604
Original poster
Jul 16, 2002
7,028
1
back in NYC!
First off, hello to everyone on MR who remembers me (and those who don't, as well). I used to be a prolific poster here (read: major MR addict) :).

I'm writing a program that takes the text of a book and then compiles interesting information about that book. For example, it might take Harry Potter and the Sorcerer's Stone and tell you that the top 6 word phrase is "he-who-must-not-be-named" and that the given phrase is used 54 times. It might tell you that the average characters per word is 4.3254 characters. It might tell you that the top used letter in the book is 'e' and it's used 66385 times. You get the idea - calculated statistics about books. I'm figuring out what those statistics will be right now.

So, the question is: What statistics are you interested in knowing about your favorite books?

Keep in mind, these must be calculable figures. That is to say, I can't calculate whether or not a book uses more happy or sad words. I can't calculate the number of protagonists in a book. Most qualitative stuff is out, unfortunately. :(

Thanks for the input!

– Emerson
 
Most used word.
Most used adjective (most used word compared against a list of adjectives maybe? is that too qualitative?)
Most used first word of a paragraph
Most used sentence (eg, does the author like a certain expression/phrase?)

Leekohler, I love to read, but unfortunately I find myself only reading about a book a month on average, as they are all 500 page textbooks!

If you're into fantasy, a great series is "The Name of the Wind" by Patrick Ruthfuss, the 2nd should be coming out in a few months. In the same genre, I also recommend Sabriel by Garth Nix... I recently found out that it's a few of my friends' favorite book, and they all discovered it independently!
 
Most used word.
Most used adjective (most used word compared against a list of adjectives maybe? is that too qualitative?)
Most used first word of a paragraph
Most used sentence (eg, does the author like a certain expression/phrase?)

Leekohler, I love to read, but unfortunately I find myself only reading about a book a month on average, as they are all 500 page textbooks!

If you're into fantasy, a great series is "The Name of the Wind" by Patrick Ruthfuss, the 2nd should be coming out in a few months. In the same genre, I also recommend Sabriel by Garth Nix... I recently found out that it's a few of my friends' favorite book, and they all discovered it independently!

I'm not really into fiction at all. I like biographies and such.
 
If the books you are analyzing are fiction, then names are important. If you assume that any word that is both capitalized, and not the first word in a sentence, is a name (place name, character name, etc) then it shouldn't be too hard calculate a bunch of stuff about names.

  • How often a name appears.
  • How often it appears near other names. (within 10 words, or 15, 20 etc)
  • How often two or more names appear in the same paragraph (which is not the same as the item above)
  • How many paragraphs a name appears in...
  • etc

Could be a cool project. Good Luck.
 
You know what else could be interesting? Find out how many books change the spellings of common names. For example, Spelling "Jennifer" as "Genifer" or using "Mikal" for "Michael". Could be interesting.
 
Those are really, really awesome suggestions, guys! I especially love most used first word of a paragraph - hadn't thought of that one yet and it's so simple!

Thanks,
Emerson
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.