I have a csv file with a column GeoCodes. This uses " as text qualifier.
I am trying to import this into SQLServer using the SQL Server Import Wizard. The problem with the data is, if there is no GeoCode the csv file will sometimes output the GeoCode as " " and then several spaces. This errors when trying to import the data as it picks up the data within the text qualifier and then there are these spaces before the next comma delimiter.
An example of the data below. The Pontypandy row is the row that errors.
Place ,Geo Codes ,Type
Northpole ,"90.0000,0.0000 ",Pole
Southpole ,"-90.0000,0.0000 ",Pole
Pyramids ,"29.9765,31.1313 ",BigTriangle
France ," ",Country
Pontypandy ," " ,City
I have to use the text qualifiers as there is a comma in the GeoCodes.
I have no say on how the data is sent to me and therefore have to deal with the data as is.
As a work around I have to do a find and replace on the data in notepad first before importing. This adds an extra step to the job that hopefully isn't needed.
Is there anyway I can get around the " " spaces during the import?
As an extra note, I don't currently have access to SSIS but if it can be done in there any easier then please answer with that as it could help me justify getting SSIS (I might have to remove this comment later if I have to show it to my manager).
If your data really is the way you show above you can use fixed width format. Import the data as is and replace the " afterwards. This is not the best solution.
Much better: pipe the import file through sed before importing. This is not only much faster, but in all cases, when data is larger than your RAM the only easy way (OK, there are some other). All you need is sed at operation system level. If you can copy the executable somewhere it's all you need. If you want to replace "[any number of blanks], with ", this is the regex should be:
cat myfile.txt|sed -b -e "s/\" *,/\",/">yournewfile.txt
The regex is easy once you get the idea: - s means Substitute, - /first /second/ means look for first and replace with second, - \" is the escaped " (because of DOS) - Space and * means any number of spaces - , means ,
On a lot of systems sed is still there (cygwin). Have fun!