Community!
Story: I am trying to upload a CSV file with a huge batch of products to my e-commerce shop. But there are many very similar products, but all with every column slightly different. And luckily the plugin I use can handle this, but it needs the same title for the entire product range or some reference to its parent product. The reference is sadly not there.
Now I want to know how I can find values in a CSV file that are nearly the same (in SQL there was something called '%LIKE%') to structure the table appropriately. I can hardly describe what I want to achieve, but here is an example for what I'm looking for.
I basically want to transform this table:
+---------------+---------------+---------------+---------------+ | ID | Title | EAN | ... | +---------------+---------------+---------------+---------------+ | 1 | AquaMat 3.6ft | 1234567890 | ... | +---------------+---------------+---------------+---------------+ | 2 | AquaMat 3.8ft | 1234567891 | ... | +---------------+---------------+---------------+---------------+ | 3 | AquaMat 4ft | 1234567892 | ... | +---------------+---------------+---------------+---------------+
into this:
+---------------+---------------+---------------+---------------+ | ID | Title | EAN | ... | +---------------+---------------+---------------+---------------+ | 1 | AquaMat | 1234567890 | ... | +---------------+---------------+---------------+---------------+ | 2 | AquaMat | 1234567891 | ... | +---------------+---------------+---------------+---------------+ | 3 | AquaMat | 1234567892 | ... | +---------------+---------------+---------------+---------------+
The extra data can be scraped. Can I do this with Excel? With Macros? With Python?
Thank you for taking time and reading this.
If you have any questions, than feel free to ask.
EDIT:
The Title column contains products with completely different names and might even contain more whitespaces. And some products might have 1 attribute but others have up to 3 attributes. But this can be sorted manually.
And with nearly the same I mean as you can see in the table. The Title's are basically the same but not identical. I want to remove the attributes from them. Also, there are no other columns with any more details, only numbers and the attributes that I am trying to cut of the title!!!
Here's an idea using Python and .split():
import csv
with open('testfile.csv', 'r', encoding="utf-8-sig") as inputfile:
csv_reader = csv.reader(inputfile, delimiter=',')
with open('outputfile.csv', 'w', newline='') as outputfile:
w = csv.writer(outputfile)
header=['ID','Title','EAN','Product','Attr1','Attr2','Attr3']
w.writerow(header)
for row in csv_reader:
if row[0]=='ID':
header_row=True
pass
else:
header_row=False
list=row[1].split()
for item in list:
# if you want, you can add some other conditions on the attribute (item) in here
row.append(item)
if not header_row:
print('row: {}'.format(row))
w.writerow(row)