Search code examples
vbscriptasp-classicnon-ascii-characters

Need to strip out invalid characters in CSV file


I am generating a CSV file from a Microsoft SQL database that was provided to me, but somehow there are invalid characters in about two dozen places throughout the text (there are many thousands of lines of data). When I open the CSV in my text editor, they display as red, upside-down question marks (there are two of them in the attached screenshot).

When I copy the character and view the "find/replace" dialog in my text editor, I see this:

\x{0D}

...but I have no idea what that means. I need to modify my script that generates the CSV so it strips these characters out, but I don't know how to identify them. My script is written in Classic ASP.

Screenshot of invalid characters in text editor


Solution

  • You can also use RegEx to remove unwanted characters:

    Set objRegEx = CreateObject(“VBScript.RegExp”)
    objRegEx.Global = True
    objRegEx.Pattern = “[^A-Za-z]”
    
    strCSV = objRegEx.Replace(strCSV, “”)
    

    This code is from the following article which explains in details what it does: How Can I Remove All the Non-Alphabetic Characters in a String?

    In your case you will want to add some characters to the Pattern:

    ^[a-zA-Z0-9!@#$&()\\-`.+,/\"]*$