for a project I have to get data from a csv-file to a websystem via rest (curl) using xml format. I got almost everything to work but a thing.
The file I received for this project is a .xlsx-file. I exported it to a .csv-file (delimiter separated) and converted the encoding to UTF-8. The original file has about 2000 rows and 30 columns.
My strategy is to "cut" the fields of the .csv-file and save the output to an array with a bash (4.3.42) script (I haven't use Python before ;) ) on SLES12 SP2 (it's a test machine).
The csv is in the following format (simplified): (Note: the description which has three rows in one cell. And that is the part that breaks my script.) The curl command creates false entries with these new-lines.
About a fourth of the data has descriptions like that.
After the export to csv, this is what the file looks like if I open it with vim:
title|description|firstname
Test|Lorem ipsum dolor sit amet,
consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat
sed diam voluptua.|Chris
This is my simplified script:
file="Kontakte.csv"
# Get the values from the file
arrV[1]=$(cut -d'|' -f1 "$file" | cut -d$'\n' -f2) # Title
arrV[2]=$(cut -d'|' -f2 "$file" | cut -d$'\n' -f2) # Description
arrV[3]=$(cut -d'|' -f3 "$file" | cut -d$'\n' -f2) # First name
echo "### Values ###"
# For reference
echo "Title: " ${arrV[1]}
echo "Description: " ${arrV[2]}
echo "Name: " ${arrV[3]}
What I get is ...
### Values ###
Title: Test
Description: Lorem ipsum dolor sit amet,
Name: Chris
... instead of something like that:
### Values ###
Title: Test
Description: Lorem ipsum dolor sit amet,\n onsetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat\n sed diam voluptua.
Name: Chris
I have a vba script to get rid of the new-lines but would be a less-than-ideal solution.
It would be nice to just put some quotes around the executions to make it work, like:
arrV[2]="$(cut -d'|' -f2 "$file" | cut -d$'\n' -f2)"
But I coudn't solve the issue like that.
Do you have an idea, what to do about these new-lines? Or is getting rid of the new-lines beforehand in Excel my only option?
Thank you in advance.
You are using line-oriented tools, so it's not surprising that embedded \n
causes trouble. Pick a character like ~, then use grep to verify it does not appear in your .csv. Export to .csv with \r
(Mac) line terminators. Then use tr & mac2unix to map newlines to your reserved character and to repair the line terminators:
tr '\n' '~' < Kontakte.csv | mac2unix > New.csv
When you're finished munging the file, tr '~' '\n'
will restore matters.
Of course, it would be much better to use an appropriate tool, such as a CSV library.