Search code examples
pythonpandasdataframeescapingbed

Outputting DataFrame to tsv, how to ignore or override 'need to escape' error


Related to, but distinct from, this question.

I want to output my pandas dataframe to a tsv file. The first column of my data is a pattern that actually contains 3 bits of information which I'd like to separate into their own columns:

Range                 c1
chr1:2953-2965       -0.001069
chr1:35397-35409     -0.001050
chr1:37454-37466     -0.001330
chr2:37997-38009     -0.001235
chrX:44465-44477     -0.001292

So I do this:

Df = Df.reset_index()
Df["Range"] = Df["Range"].str.replace( ":", "\t" ).str.replace( "-", "\t" )
Df
    Range   c1
0   chr1\t2953\t2965    -0.001069
1   chr1\t35397\t35409  -0.001050
2   chr1\t37454\t37466  -0.001330
3   chr2\t37997\t38009  -0.001235
4   chrX\t44465\t44477  -0.001292

All I need to do now is output with no header or index, and add one more '\t' to separate the last column and I'll have my 4-column output file as desired. Unfortunately...

Df.to_csv( "~/testout.bed", 
                  header=None,
                  index=False,
                  sep="\t",
                  quoting=csv.QUOTE_NONE,
                  quotechar=""
               )
Error: need to escape, but no escapechar set

Here is where I want to ignore this error and say "No, python, actually you Don't need to escape anything. I put those tab characters in there specifically to create column separators."

I get why this error occurs. Python thinks I forgot about those tabs, and this is a safety catch, but actually I didn't forget about anything and I know what I'm doing. I know that the tab characters in my data will be indistinguishable from column-separators, and that's exactly what I want. I put them there specifically for this reason. Surely there must be some way to override this, no? Is there any way to ignore the error and force the output?


Solution

  • You can simply use str.split to split the Range column directly -

    df['Range'].str.split(r":|-", expand=True)
    #     0      1      2
    #0  chr1   2953   2965
    #1  chr1  35397  35409
    #2  chr1  37454  37466
    #3  chr2  37997  38009
    #4  chrX  44465  44477
    

    To retain all the columns, you can simply join this split with the original

    df = df.join(df['Range'].str.split(r":|-", expand=True))