Perplexed, you begin to investigate. As you examine the text more closely, you realize that it’s riddled with formatting quirks, inconsistent spellings, and even some stray HTML tags. It becomes clear that these anomalies are affecting the readability score.
This experience highlights a critical aspect of readability analysis: cleaning or “sanitizing” the text before applying readability formulas. The cleaning process ensures the formulas accurately score your text’s readability and output correct word stats.
Here’s how to clean your text and achieve an accurate score:
TIP #1: Abbreviations: Embedded punctuation may confuse a readability program when it counts the number of sentences. Most programs tell the computer to find the end of a sentence by looking for a punctuation mark. Sometimes this punctuation falls within a sentence, rather than at the end, but the computer cannot differentiate this. Since the computer interprets any period as a sentence stop, remove any embedded punctuation, such as periods that you’ve used for abbreviations, roman numerals, numbers with decimals, etc.
Explanation: Some programs might mistaken the a.m. in the sentence as the end of a sentence because of the period.
TIP #2: Titles/Headings: A readability program cannot distinguish ordinary sentences from titles, headings, and bulleted lists because the sentence has no punctuation. If the app keeps searching for punctuation (.?!), it will include the text from headings as part of the first sentence that follows the heading. Obviously, the program will miscalculate the sentence length.
TIP #3: Normalize Numbers: You can represent numbers in different ways, such as numerals (3) or words (three). Decide on a standard format for numbers and stick to it throughout the text, as inconsistency might skew word and syllable counts.
TIP #4: Hyphenated Words: Sometimes hyphenated words can be counted as one word or multiple words. To maintain consistency, decide whether to keep or split hyphenated words and apply your decision uniformly across the text. Example: “The well-known author gave a heart-warming speech at the state-of-the-art theatre.”
TIP #5: Compound Sentences: If your text has compound sentences connected by conjunctions (and, or, but), consider breaking them into simpler sentences. This will average out sentence lengths more accurately.
TIP #6: Possessives/Contractions: Words like can’t or John’s might be misread by some programs. If possible, expand contractions and possessives to their full form (cannot, John is) to aid in accurate word and syllable counts.
TIP #7: Quotations: Quotations can disrupt sentence flow and structure. Clearly differentiate quotations from the main text. Analyze them separately if they form a significant part of the document.
Example:
Original: As Mark Twain once said, “Truth is stranger than fiction, but it is because Fiction is obliged to stick to possibilities; Truth isn’t.”
Revised: Mark Twain made a notable point about truth and fiction. He stated: “Truth is stranger than fiction, but it is because Fiction is obliged to stick to possibilities; Truth isn’t.” This statement reflects…
Reason: In the revised version, the quotation is separated from the main text with a full sentence introducing it. This can help readability programs better understand the structure of the text and not confuse the quoted material with the author’s own sentences.
TIP #8: Special Characters/Symbols: Write out characters like ampersands (&), percent signs (%), and currency symbols ($) as words (and, percent, dollars) so the program interprets them correctly.
TIP #9: Inline Lists: If you use lists within sentences (e.g., “The colors are red, green, blue, and yellow.”), you can represent them in bullet form or as separate sentences to avoid inflating word counts within a sentence.
TIP #10: Footnotes/Endnotes: If your text has footnotes or endnotes, remove them from the main text body. These often contain punctuation and formatting that could confuse the readability analysis.
TIP #11: Ellipses/Dashes: Use ellipses (…) and dashes (—) uniformly and replace any informal use with standardized punctuation marks, as these can affect sentence and word counts.
- Example: I don’t know . . . I might come to the party – if I finish work early.
- Revised: I don’t know… I might come to the party—if I finish work early.
- Reason: The proper ellipsis indicates a pause or trailing off thought, and the em dash correctly indicates a break in the sentence or an interjection.
TIP #12: Quotation Marks: Use straight quotes (" "
) instead of curly quotes (“ ”) as some programs might confuse the latter. Also, use opening and closing quotes correctly to detect sentence boundaries.
TIP #13: Emojis/Emoticons: Replace emojis and emoticons with their descriptive phrase or remove them entirely since they are not standard text characters and most readability formulas won’t process them correctly.
TIP #14: Paragraph Spacing: Make sure paragraph breaks are consistent. Extra line breaks or line spaces can be interpreted as sentence breaks, affecting sentence counts.
TIP #15: Bullet Points/Numbered Lists: Transform bullet points and numbered list items into full sentences if possible, or remove them if they are not crucial, as these can disrupt the flow and structure of connected text.
TIP #16: Complex Vocabulary: Replace unnecessary complex or highly technical terms with simpler synonyms to prevent the program from overestimating the reading difficulty based on syllable count.
TIP #17: American/British Spelling: Choose either American or British English spelling conventions and apply them throughout to avoid miscounts of words.
TIP #18: Non-Textual Elements: Remove non-textual elements like images, graphs, or tables since readability formulas cannot process them.
TIP #19: Acronyms: Expand acronyms on their first use (e.g., “NASA, the National Aeronautics and Space Administration”) to ensure the program does not misinterpret the periods or count them as individual words.
TIP #20: Dialogue: If you use dialogue, rewrite it in a narrative form, as the varying punctuations in dialogue (like dashes, ellipses, and interrupted sentences) can confuse a readability program.
Example: “Where are you going?” John asked.
“To the store,” Mary replied, “I need to buy some milk.”
Revised: John inquired where Mary was headed. She explained that she was going to the store to purchase some milk.
Reason: The revised version omits the punctuation marks by converting the dialogue into indirect speech, thus presenting it in narrative form. This can make it easier for the readability program to analyze the text.
TIP #21: HTML: Remove any HTML, hyperlinks, URLs as they can contain syntactical elements that are non-standard in regular prose and can confuse readability measures.
TIP #22: Punctuation Marks: Double-check for any misplaced or unneeded punctuation marks such as commas, colons, and semicolons.
TIP #23: Foreign Languages: If your text has foreign phrases or passages, remove or translate them to maintain consistency.
Example: The spirit of joie de vivre is what makes Paris so special.
Revised: The spirit of joyful living is what makes Paris so special.
Reason: A readability program might find the original sentence problematic because it is unable to process foreign language.
Example: The menu included items such as piña colada, jalapeño peppers, and crème brûlée.
Revised: The menu included items such as pina colada, jalapeno peppers, and creme brulee.
Reason: In the original sentence, words like “piña” and “jalapeño” contain the Spanish letter ñ, which is a special character not found in the English alphabet. Additionally, “crème brûlée” has French accented characters. In the revised sentence, these special characters are replaced with their closest English alphabet equivalents
TIP #24: Capitalization: Maintain uniform capitalization rules for headings, titles, and general text to avoid confusing the software, which may treat capitalized words as proper nouns or the start of new sentences.
TIP #25: Special Characters: Characters like ©, ®, or ™ may be incorrectly counted as words or break sentence structure. Remove or replace them with words like “copyright,” “registered,” or “trademark.”
Original: “This software is © 2024 CompanyName.”
Revised: “This software is copyright 2024 CompanyName.”
TIP #26: Superscripts/Subscripts: Superscript and subscript formatting (e.g., “x²” or “H₂O”) may not be interpreted properly and can be miscounted or ignored by readability software. Convert these to plain text.
Original: “The formula for water is H₂O.”
Revised: “The formula for water is H2O.”
TIP #27: Non-breaking Spaces: Non-breaking space characters ( ) will cause problems. Replace these characters with regular white spaces to ensure consistency.
TIP #28: Inline Code: Text that has inline code or programming syntax (<code>, function(), etc.) can confuse readability scoring because it will treats such syntax as part of a sentence.
Original: “You can call myFunction() to execute the code.”
Revised: “You can call the function ‘myFunction’ to execute the code.”
TIP #29: Inline References/Citations: The inline numerical references in square brackets (e.g., [2]
, [3]
, [4]
) can cause the app to process them as words or fail to detect the ending punctuation.
Revised: “The company experienced rapid growth during its early years, reaching a valuation of over $1 billion in 2015. By 2020, it had expanded to several new markets across Europe and Asia. Analysts projected its revenue to reach $5 billion by 2024.”
Scott, Brian. “How to Prepare Your Text for Readability Formulas – Clean, Sanitize and Normalize.” ReadabilityFormulas.com, 24 Nov. 2024, https://readabilityformulas.com/how-to-prepare-your-text-for-readability-formulas/.