Wednesday, April 1, 2009

Generating a Word Count of an Excel Workbook

That is the question at this forum:

Someone suggests: Save as a CSV file.
Another gives some VBA code he created. See inside.

How would I do it? Well, VBA is certainly one way to go, but why do this when a simple formula will accomplish the same thing? If you make it a macro, then you need to enable macros for that spreadsheet, and some folks are wary of macros.

How do you get a word count of words in an individual cell? Well, a similar way to the VBA approach, the annoying kludge of substitute of space for null followed by a difference in length. There are other ways to avoid this kludge, but for now we will not:

That is, imagine the text is in A1.
The formula in A2: =TRIM(A1)
This will remove any spaces before and after the text, as well as runs of spaces inside the text.
A3: =SUBSTITUTE(A2, " ", "")
A4: =1 + LEN(A1) - LEN(A3)
Since we replaced all " " with nothings, each replacement will make it one character less. The difference between these two lengths are the number of spaces. That number + 1 is the number of words.

That is for a single cell. The combined formula would be:
=1+LEN(TRIM(A1))-LEN(SUBSTITUTE(TRIM(A1)," ",""))

Now what about for a range of cells? We just use an array formula, and take the SUM. Also, if the cell does not have text, we don't want to count it. For the range A1:D4, type the following in some other cell:
=SUM(IF(ISTEXT(A1:D4), 1+LEN(TRIM(A1:D4))-LEN(SUBSTITUTE(TRIM(A1:D4)," ","")), 0))
and press Ctrl-Enter. Thus will give you a count of all words in that range.

How do we do it for all cells on a given worksheet? Make the reference be 1:1048576.
=SUM(IF(ISTEXT(1:1048576), 1 + LEN(TRIM(1:1048576)) - LEN(SUBSTITUTE(TRIM(1:1048576), " ","")), 0))

Of course, this would not work, because Excel would run out of resources before calculating that value, and will give us an error message to that effect. Besides, if it encompasses the entire spreadsheet, you will get a circular reference. But use a smaller range and it should work fine.

Which is better, the VBA or this? I don't know.

0 comments: