I am fairly new to Python but understand what it is I need help with.
I am trying to import a csv file which looks like this:
Title of Project |
---|
name1, Info 1 |
name2, Info 2 |
name3, Info 3 |
name4, Info 4 |
name5, Info 5 |
a,b,c,d,e,f,g,h,i,j,k,l,m |
a1,b1,c1,d1,e1,f1,g1,h1,i1,j1,k1,l1,m1 |
... |
... |
an,bn,cn,dn,en,fn,gn,hn,in,jn,jn,kn,ln,mn |
All of the commas are actually delimited by '\t' and I don't know how to deal with that?
So overall, I want to remove the title and the next 5 rows and only work with the data from row 7 to the nth row in the file for plotting purposes.
The row that holds 'a,b,c,d,etc' need to be headers and the data under it should be separated accordingly into the columns.
I need help with syntax and understanding the most efficient way to do this. Any help or guidance would be much appreciated!
Thanks.
You can use Python's built-in CSV module to handle this.
My example reads your sample data with a tab-delimiter, discards the first 6 rows, and saves the rest of the rows to write out to a new comma-delimited file.
You first need to open the file for reading and create a CSV reader object:
import csv
with open('input.csv', newline='') as f:
reader = csv.reader(f, delimiter='\t')
...
You can iterate over the rows of the reader a couple of different ways.
In your case, to skip a number of rows, call next(reader)
a number of times in sequence, ignoring the data it returns:
...
for i in range(1, 7):
next(reader) # manually advance reader / discard rows
...
You can also iterate the rows of the reader with a standard for-loop. I'm creating the all_rows
list to hold the rows you really care about:
...
all_rows = []
for row in reader:
all_rows.append(row)
...
For the standard reader object, there's nothing special about headers, it's totally up to you if it exists or has any special value... to the reader, it's just a row like any other:
print(all_rows)
[
['a' , 'b' , 'c' , 'd' , 'e' , 'f' , 'g' , 'h ', 'i' , 'j' , 'k' , 'l' , 'm'],
['a1', 'b1', 'c1', 'd1', 'e1', 'f1', 'g1', 'h1', 'i1', 'j1', 'k1', 'l1', 'm1'],
['an', 'bn', 'cn', 'dn', 'en', 'fn', 'gn', 'hn', 'in', 'jn', 'kn', 'ln', 'mn']
]
Now, you can do whatever you need with all_rows
, feed it to a plotter, or save it as a cleaned-up CSV:
with open('output.csv', 'w', newline='') as f:
writer = csv.writer(f)
writer.writerows(all_rows)
Here's the official docs for the CSV module. It's a good reference to remind of you how it works, once you have the hang of it, but I don't think it's really very good to get a basic understanding of what's going on.
I've written my own how-to CSV w/Python guide. I take a little more time to go over more basic things, but it's the simple things I see people here on SO getting tripped up on.