Removing non-printing characters from text in Excel using VBA


Have you ever encountered a perfectly constructed MATCH or a VLOOKUP function failing for no apparent reason? Have you had VBA tell you that a file does not exist even though it does? You may have non-printing characters to blame. There are notorious space-like characters that are invisible to the naked eye, which sneak in when you are importing data from Web Pages, Word Documents or PDFs. In this post we will learn more about these characters and how to remove them from your data.

A brief history of Character Encoding

Letters, digits and punctuations that we use in human languages are represented in a format that a computer can understand through a process known as character encoding. The most commonly used systems are ASCII and Unicode. The ASCII started off with 128 (7-bits) characters to represent American English. This character set was extended to 256 characters independently in many encoding standards to include accented characters in European languages. Finally Unicode absorbed ASCII, and extended it to include all human languages in the world, and we still have room for more!

The reason for my rambling on about character encoding is, in this post, we will confine ourselves to the first 256 characters – 0 to 255 – of the Unicode character set. Remember that the first 128 characters – 0 to 127 – in Unicode character set are the same as ASCII.

Getting back to Non printing characters…

The first 32 characters – 0 to 31 – are non printing. You can use Excel’s CLEAN function to remove all of them. However, there are a few more higher value non printing characters: 127, 129, 141, 143, 144, and 157. Excel does not have a built-in function to remove all of them in one shot. We have to use the SUBSTITUTE and CHAR function to remove them one after the other.

For instance, to remove Character-129 from your text, you have to set up the following formula:

=SUBSTITUTE("Your Text here",CHAR(129),"")

To remove multiple non printing characters, we have to nest substitute formulas within each other:

=SUBSTITUTE(SUBSTITUTE("Your Text here",CHAR(129),""),CHAR(141),"")

The character that gets inserted when you hit the Space-bar on your keyboard is 32. There is another type of space called a non-breaking space, which is represented as 160. This often turns up in HTML pages. It is best to convert the 160’s to 32’s. Note that the first 32 characters include the new-line characters; and multiline text will get converted to single line text, if you use the CLEAN function. I recommend replacing them with 32

Doesn’t this sound like a lot of work?

I created a Function to get rid of all the non-printing characters you’d normally encounter in everyday life. Here it is:

Function CleanS2XL(ByVal Value As String, _
    Optional ReplaceWith As String = " ", _
    Optional TrimResult As Boolean = True) As String

'List of Non-Printing Characters:
'0 to 31 and 127, 129, 141, 143, 144, 157
'32 and 160 are space characters
'32 is the Space we know and love
'And 160 is a weird space, which I will replace with 32

Dim NonPrint() As Variant, _
    Counter As Long

CleanS2XL = Value

'While the Clean function completely removes all the non-printing
'characters, I often found that I needed to replace them with Spaces
'Therefore, instead of using the clean function, I have replaced it with
'the following loop to explicitly replace the first 32 non printing
'characters, with the ReplaceWith argument.
For Counter = 0 To 31
    CleanS2XL = Replace(CleanS2XL, Chr(Counter), ReplaceWith)
Next Counter

'Now we will replace the higher value non printing characters
NonPrint = Array(127, 129, 141, 143, 144, 157)
For Counter = LBound(NonPrint) To UBound(NonPrint)
    CleanS2XL = Replace(CleanS2XL, Chr(NonPrint(Counter)), ReplaceWith)
Next Counter

'160 is another type of space that often shows up when you are copying
'data from web pages, PDFs and word documents. I will replace it with
'the regular space, so the TRIM function can do its thing
CleanS2XL = Replace(CleanS2XL, Chr(160), Chr(32))

'I use Excel's TRIM Function here instead of VBA's because Excel's
'function converts multiple successive spaces to a Single Space
If TrimResult Then CleanS2XL = Application.Trim(CleanS2XL)

End Function

The ReplaceWith argument lets you replace non-printing character with any character you please. It defaults to the normal Space Character (32). I recommend using the Space character (32), along with the TrimResult argument set to TRUE, because, Excel’s TRIM Function, which I call in the end, removes double spaces.

Since this function removes the new-line characters (10 and 13), multiline text would get converted into single line text. Incase you’d like to retain the “multi-line-ness” of your string, you may want to call this function with the line-feed character (10).

Note that I use Excel’s TRIM Function instead of VBA’s TRIM Function. VBA’s function removes only leading and trailing spaces, while Excel’s function also replaces multiple spaces with a single space.


Download

Download the following Workbook for all the illustrative code in this post:

Download


Further Reading

Advertisements

If you liked it, let me know. If you didn't make sure you let me know!

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s