Search code examples
csvescapingabapsap-basis

What is the likely meaning of this character sequence? A&#C


I'm working on an application that imports data from a CSV file. I am told that the data in the CSV file comes from SAP, which I am totally unfamiliar with.

My client indicates that there is an issue. One column of data in the CSV file contains postal addresses. Sometimes, the system doesn't see a valid address. Here is a slightly fictionalized example:

1234 MAIN ST A&#C HOUSTON

As you can see, there is a street number, a street name, and a city, all in capital letters. There is no state or zip code specified. In the CSV file, all addresses are assumed to be in the same state.

Normally, where there is text between the street name and city, it is an apartment number or letter. In the above example, we get errors when we try to use the address with other services, such as Google geolocation. One suggested fix is to simply strip out there special characters, but I believe that there must be a better way.

I want to know what this A&#C means. It looks like some sort of escape sequence, but it isn't in a format I'm familiar with. Please tell me what these strange character sequence means.


Solution

  • I'm not totally sure, but I doubt there's a "canonical" escape sequence that looks like this. In the ABAP environment, # is used to replace non-printable characters. It might be that the data was improperly sanitized when importing into the SAP system in the first place, and when writing to the output file, some non-printable character was replaced by #. Another explanation might be that one of the field contained a non-ASCII unicode character (like,   ) and the export program failed to convert that to the selected target codepage. It's hard to tell without examining the actual source dataset. Of course, it might also be some programming error or a weird custom field separator...