Search code examples
pythonbashawkkeyerror

KeyError in Python script used to generate Bash scripts


I have a Python script called renaming.py that I want to use to generate many Bash scripts (over 500). The Python script looks like so:

#!/usr/bin/python

#Script to make multiple Bash scripts based on a .txt file with names of files
#The .txt file contains names of files, one name per line
#The .txt file must be passed as an argument.

import os
import sys

script_tpl="""#!/bin/bash
#BSUB -J "renaming_{line}"
#BSUB -e /scratch/username/renaming_SNPs/renaming_{line}.err
#BSUB -o /scratch/username/renaming_SNPs/renaming_{line}.out
#BSUB -n 8
#BSUB -R "span[ptile=4]"
#BSUB -q normal
#BSUB -P DBCDOBZAK
#BSUB -W 168:00

cd /scratch/username/renaming_SNPs

awk '{sub(/.*/,$1 "_" $3,$2)} 1' {file}.gen > {file}.renamed.gen

"""

with open(sys.argv[1],'r') as f:
    for line in f:
        line = line.strip()
        if not line:
            continue
        line = line.strip(".gen")
        script = script_tpl.format(line=line)
        with open('renaming_{}.sh'.format(line), 'w') as output:
            output.write(script)

The .txt file I pass as an argument to this Python script looks like so:

chr10.10.merged.no_unwanted.formatted.gen
chr10.11.merged.no_unwanted.formatted.gen
chr10.12.merged.no_unwanted.formatted.gen
chr10.13.merged.no_unwanted.formatted.gen
chr10.14.merged.no_unwanted.formatted.gen
chr10.15.merged.no_unwanted.formatted.gen
etc

When I run the Python script, I get the following error message:

Traceback (most recent call last):
  File "renaming.py", line 33, in <module>
    script = script_tpl.format(line=line)
KeyError: 'sub(/'

I am not entirely sure what is happening, but here is what I think

  • Something is wrong with line 33 - not sure what is the problem. I have used very similar scripts like this one before. In this line 33, I am replacing all the {line} instances in script_tpl by the entries in the .txt file (this happens 500, once for each line of the .txt file).

  • I am very confused by the KeyError. I am working on Linux HPC server (using a Mac laptop). I have managed to use this awk command with no problem when directly typing it into the terminal (as a Bash command). However, it seems that Python is maybe getting confused when I try and "print" it as a variable in the script..

Any help would be deeply appreciated.


Solution

  • When you use .format all { } in your string will invoke string formatting. Since you used those chars in your awk command, you must escape them. To do that you double the {{ and }}:

    script_tpl="""#!/bin/bash
    #BSUB -J "renaming_{line}"
    #BSUB -e /scratch/username/renaming_SNPs/renaming_{line}.err
    #BSUB -o /scratch/username/renaming_SNPs/renaming_{line}.out
    #BSUB -n 8
    #BSUB -R "span[ptile=4]"
    #BSUB -q normal
    #BSUB -P DBCDOBZAK
    #BSUB -W 168:00
    
    cd /scratch/username/renaming_SNPs
    
    awk '{{sub(/.*/,$1 "_" $3,$2)}} 1' {line}.gen > {line}.renamed.gen
    
    """
    

    Here are the relevant docs.