Related to, but distinct from, this question.
I want to output my pandas dataframe to a tsv file. The first column of my data is a pattern that actually contains 3 bits of information which I'd like to separate into their own columns:
Range c1
chr1:2953-2965 -0.001069
chr1:35397-35409 -0.001050
chr1:37454-37466 -0.001330
chr2:37997-38009 -0.001235
chrX:44465-44477 -0.001292
So I do this:
Df = Df.reset_index()
Df["Range"] = Df["Range"].str.replace( ":", "\t" ).str.replace( "-", "\t" )
Df
Range c1
0 chr1\t2953\t2965 -0.001069
1 chr1\t35397\t35409 -0.001050
2 chr1\t37454\t37466 -0.001330
3 chr2\t37997\t38009 -0.001235
4 chrX\t44465\t44477 -0.001292
All I need to do now is output with no header or index, and add one more '\t' to separate the last column and I'll have my 4-column output file as desired. Unfortunately...
Df.to_csv( "~/testout.bed",
header=None,
index=False,
sep="\t",
quoting=csv.QUOTE_NONE,
quotechar=""
)
Error: need to escape, but no escapechar set
Here is where I want to ignore this error and say "No, python, actually you Don't need to escape anything. I put those tab characters in there specifically to create column separators."
I get why this error occurs. Python thinks I forgot about those tabs, and this is a safety catch, but actually I didn't forget about anything and I know what I'm doing. I know that the tab characters in my data will be indistinguishable from column-separators, and that's exactly what I want. I put them there specifically for this reason. Surely there must be some way to override this, no? Is there any way to ignore the error and force the output?
You can simply use str.split
to split the Range
column directly -
df['Range'].str.split(r":|-", expand=True)
# 0 1 2
#0 chr1 2953 2965
#1 chr1 35397 35409
#2 chr1 37454 37466
#3 chr2 37997 38009
#4 chrX 44465 44477
To retain all the columns, you can simply join this split with the original
df = df.join(df['Range'].str.split(r":|-", expand=True))