Yesterday and the day before, I spent hours cleaning up A Study In Magic. I remember removing a lot of adverbs. Oh boy, so many adverbs. Chamber of Secrets plotline was the worst offender. A literal hundred of adverbs per chapter. I’m pleased to note my later writings doesn’t suffer from a deluge of adverbs. Overuse of certain phrases on the other hand… I’m not sure if its an actual problem, or if I’m noticing them because they stick out. I’m tempted to write a program that analyzes a document of word/phrase use distribution, its scatter and whether it is sparse.

“Sparse?” What are you talking about?

It’s my IT background speaking. 😛 Sparse means if a combination of values does not exist in real life, you don’t store that combination in the database. The opposite is Dense. What does this concept have to do with writing? Well…

Scrivener breaks down word use frequency, per project and per chapter. This is good information for editing. If you notice you use “Like” “That” and “Look” too much, then you can change them. But it doesn’t pick out phrase frequency. It doesn’t tell you the distance between words, either. By distance, here is an example:

“So there’s no time to dilly-dally,” Wesley said grimly.

Inigo felt as grim as Wesley sounded. Grim odds, grim castle, grim situation. No, their prospects were not good, he thought.

The rest of the document may be 5000 words, and not use “grim” again after two paragraphs. But they’re all clustered here. That’s too concentrated for my taste. It’s not invalid, mind you, but edges toward overuse.

If words such as “a” “the” “is” “was”, “look” etc are achromatic, then “grim” “behold” and “concentrated” are sparks of color. I prefer a healthy diversity of words, with plenty of space between them. The question is how to detect them. My first instinct is “write a program”. Only… natural language is one of AI’s biggest challenges. How do I do this?

Interesting food for thought.