Word Count Analyzer
Have you ever had different tools give you conflicting word counts? Although word count is a seemingly mundane task it is sometimes the cause of a lot of unnecessary stress in client-translator relationships. Your client's tool reports one word count, and your tool reports a different word count. What is causing the difference? This tool will tell you. TM-Town's Word Count Analyzer searches your text for areas that are known to cause word count discrepancies across different tools and reports those to you. Try the live demo below!
Word Count Analyzer is an open source tool built by TM-Town. Currently this tool supports English.
Learn More
Common word count gray areas include:
- Ellipses
- Hyperlinks
- Contractions
- Hyphenated Words
- Dates
- Numbers
- Numbered Lists
- XML and HTML tags
- Forward slashes and backslashes
- Punctuation
Other gray areas not covered by this tool:
- Headers
- Footers
- Hidden Text specific to Microsoft Word
Ellipsis
default = 'ignore'
-
'ignore'
Ignores all ellipses in the word count total.
-
'no_special_treatment'
Ellipses will not be searched for in the string.
Checks for any occurrences of ellipses in your text. Writers tend to use different formats for ellipsis, and although there are style guides, it is rare that these rules are followed.
Three Consecutive Periods ...
Tool |
Word Count |
TM-Town |
0 |
Microsoft Word / wc (Unix) |
1
|
Pages |
0 |
Four Consecutive Periods ....
Tool |
Word Count |
TM-Town |
0 |
Microsoft Word / wc (Unix) |
1
|
Pages |
0 |
Three Periods With Spaces . . .
Tool |
Word Count |
TM-Town |
0 |
Microsoft Word / wc (Unix) |
3
|
Pages |
0 |
Four Periods With Spaces . . . .
Tool |
Word Count |
TM-Town |
0 |
Microsoft Word / wc (Unix) |
4
|
Pages |
0 |
Horizontal Ellipsis …
Tool |
Word Count |
TM-Town |
0 |
Microsoft Word / wc (Unix) |
1
|
Pages |
0 |
Hyperlink
default = 'count_as_one'
-
'count_as_one'
Counts a hyperlink as one word.
-
'no_special_treatment'
Hyperlinks will not be searched for in the string. Therefore, how a hyperlink is handled in the word count will depend on other settings (mainly slashes).
-
'split_at_period'
Pages will split hyperlinks at a period and count each token as a separate word.
http://www.example.com
Tool |
Word Count |
TM-Town |
1 |
Microsoft Word / wc (Unix) |
1
|
Pages |
4 |
Contraction
default = 'count_as_one'
-
'count_as_one'
Counts a contraction as one word.
-
'count_as_multiple'
Splits a contraction into the words that make it up. Examples:
- don't => do not (2 words)
- o'clock => of the clock (3 words)
Most tools count contractions as one word. Some might argue a contraction is technically more than one word.
can't
Tool |
Word Count |
TM-Town |
1 |
Microsoft Word / wc (Unix) |
1
|
Pages |
1 |
Hyphenated word
default = 'count_as_one'
-
'count_as_one'
Counts a hyphenated word as one word.
-
'count_as_multiple'
Breaks a hyphenated word at each hyphen and counts each word separately. Example:
devil-may-care
Tool |
Word Count |
TM-Town |
1 |
Microsoft Word / wc (Unix) |
1
|
Pages |
3 |
Date
default = 'no_special_treatment'
-
'no_special_treatment'
Dates will not be searched for in the string. Therefore, how a date is handled in the word count will depend on other settings.
-
'count_as_one'
Counts a date as one word. This is more commonly seen in translation CAT tools where a date is thought of as a placeable that can usually be automatically translated. Examples:
- Monday, April 4th, 2011 (1 word)
- April 4th, 2011 (1 word)
- 04/04/2011 (1 word)
- 04.04.2011 (1 word)
- 2011/04/04 (1 word)
- 2011-04-04 (1 word)
- 2003Nov9 (1 word)
- 2003 November 9 (1 word)
- 2003-Nov-9 (1 word)
- and others...
Most word processing tools do not do recognize dates, but translation CAT tools tend to recognize dates as one word or placeable. TM-Town's tool checks for many date formats including those that include day or month abbreviations. A few examples are listed below (not an exhaustive list).
Monday, April 4th, 2011
Tool |
Word Count |
TM-Town |
4 |
Microsoft Word / wc (Unix) |
4
|
Pages |
4 |
04/04/2011
Tool |
Word Count |
TM-Town |
1 |
Microsoft Word / wc (Unix) |
1
|
Pages |
3 |
04.04.2011
Tool |
Word Count |
TM-Town |
1 |
Microsoft Word / wc (Unix) |
1
|
Pages |
1 |
Number
default = 'count'
-
'count'
Counts a number as one word.
-
'ignore'
Ignores any numbers in the string (with the exception of dates and numbered_lists) and does not count them towards the word count.
Simple number 200
Tool |
Word Count |
TM-Town |
1 |
Microsoft Word / wc (Unix) |
1
|
Pages |
1 |
Number with preceding unit $200
Tool |
Word Count |
TM-Town |
1 |
Microsoft Word / wc (Unix) |
1
|
Pages |
1 |
Number with unit following 50%
Tool |
Word Count |
TM-Town |
1 |
Microsoft Word / wc (Unix) |
1
|
Pages |
1 |
Numbered list
default = 'count'
-
'count'
Counts a number in a numbered list as one word.
-
'ignore'
Ignores any numbers that are part of a numbered list and does not count them towards the word count.
1. List item a
2. List item b
3. List item c
Tool |
Word Count |
TM-Town |
12 |
Microsoft Word / wc (Unix) |
12
|
Pages |
9 |
XML and HTML
default = 'remove'
-
'remove'
Removes any XML or HTML opening and closing tags from the string.
-
'keep'
Ignores any XML or HTML in the string.
<span class="large-text">Hello world <new-tag>Hello</new-tag>
Tool |
Word Count |
TM-Town |
3 |
Microsoft Word / wc (Unix) |
4
|
Pages |
12 |
Forward slash
default = 'count_as_multiple_except_dates'
-
'count_as_multiple_except_dates'
Separates any tokens that include a forward slash (except dates) at the slash(s) and counts each token individually. Example:
- she/he/it 4/25/2014 (4 words)
-
'count_as_multiple'
Separates any tokens that include a forward slash at the slash(s) and counts each token individually. Whether dates, hyperlinks and xhtml are included depends on what is set for those options. Example:
-
'count_as_one'
Counts any tokens that include a forward slash as one word. Example:
she/he/it
Tool |
Word Count |
TM-Town |
3 |
Microsoft Word / wc (Unix) |
1
|
Pages |
3 |
Backslash
default = 'count_as_one'
-
'count_as_one'
Counts any tokens that include a backslash as one word. Example:
- c:\Users\johndoe (1 word)
-
'count_as_multiple'
Separates any tokens that include a backslash at the slash(s) and counts each token individually. Example:
- c:\Users\johndoe (3 words)
c:\Users\johndoe
Tool |
Word Count |
TM-Town |
1 |
Microsoft Word / wc (Unix) |
1
|
Pages |
3 |
Dotted line
default = 'ignore'
-
'ignore'
Ignores any dotted lines in the string and does not count them towards the word count.
-
'count'
Counts a dotted line as one word.
.........
Tool |
Word Count |
TM-Town |
0 |
Microsoft Word / wc (Unix) |
1
|
Pages |
0 |
………………………
Tool |
Word Count |
TM-Town |
0 |
Microsoft Word / wc (Unix) |
1
|
Pages |
0 |
Dashed line
default = 'ignore'
-
'ignore'
Ignores any dashed lines in the string and does not count them towards the word count.
-
'count'
Counts a dashed line as one word.
-----------
Tool |
Word Count |
TM-Town |
0 |
Microsoft Word / wc (Unix) |
1
|
Pages |
0 |
Underscore
default = 'ignore'
-
'ignore'
Ignores any series of underscores in the string and does not count them towards the word count.
-
'count'
Counts a series of underscores as one word.
____________
Tool |
Word Count |
TM-Town |
0 |
Microsoft Word / wc (Unix) |
1
|
Pages |
0 |
Stray punctuation
default = 'ignore'
-
'ignore'
Ignores any punctuation marks surrounded on both sides by a whitespace in the string and does not count them towards the word count.
-
'count'
Counts a punctuation mark surrounded on both sides by a whitespace as one word.
?
Tool |
Word Count |
TM-Town |
0 |
Microsoft Word / wc (Unix) |
1
|
Pages |
0 |
Additional Resources