Search code examples
pythoncsvreader

Keep double quotes in a text file using csv reader


Hi I have a text file with string :

hello,"foo, bar"

i want to split it into a list as:

['hello', '"foo, bar"']

Is there a way I can acheive this ?

I am trying this as of now :

for line in sys.stdin: csv_file = StringIO.StringIO(line) csv_reader = csv.reader(csv_file)

I want them to split into two string i.e:

'hello' and '"foo, bar"'


Solution

  • Say you read a row from a CSV:

    from StringIO import StringIO
    import csv
    
    infile = StringIO('hello,"foo, bar"')
    reader = csv.reader(infile)
    row = reader.next()  # row is ['hello', 'foo, bar']
    

    The second value in the row is foo, bar instead of "foo, bar". This isn't some Python oddity, it's a reasonable interpretation of CSV syntax. The quotes probably weren't placed there to be part of a value, but rather to show that foo, bar is one value and shouldn't be split into foo and bar based on the comma (,). An alternative solution would be to escape the comma when creating the CSV file, so the line would look like:

    hello,foo \,bar
    

    So it's quite a strange request to want to keep those quotes. If we know more about your use case and the bigger picture we can help you better. What are you trying to achieve? Where does the input file come from? Is it really a CSV or is it some other syntax that looks similar? For example if you know that every line consists of two values separated by a comma, and the first value never contains a comma, then you can just split on the first comma:

    print 'hello,"foo, bar"'.split(',', 1)  # => ['hello', '"foo, bar"']
    

    But I doubt the input has such restrictions which is why things like quotes are needed to resolve ambiguities.

    If you're trying to write to a CSV again, then the quotes will be recreated as you're doing so. They don't have to be there in the intermediate list:

    outfile = StringIO()
    writer = csv.writer(outfile)
    writer.writerow(row)
    print outfile.getvalue()
    

    This will print

    hello,"foo, bar"
    

    You can customise the exact CSV output by setting a new dialect.

    If you want to grab the individual values in the row with the appropriate quoting rules applied to them, it's possible, but it's a bit of a hack:

    # We're going to write individual strings, so we don't want a line terminator
    csv.register_dialect('no_line_terminator', lineterminator='')
    
    def maybe_quote_string(s):
        out = StringIO()
    
        # writerow iterates over its argument, so don't give it a plain string
        # or it'll break it up into characters
        csv.writer(out, 'no_line_terminator').writerow([s])
    
        return out.getvalue()
    
    print maybe_quote_string('foo, bar')
    print map(maybe_quote_string, row)
    

    The output is:

    "foo, bar"
    ['hello', '"foo, bar"']
    

    This is the closest I can come to answering your question. It's not really keeping the double quotes, rather it's removing them and adding them back with likely the same rules that put them there in the first place.

    I'll say it again, you're probably headed down the wrong path with this question. Others will probably agree. That's why you're struggling to get good answers. What is the bigger problem that you're trying to solve? We can help you better to achieve that.