python indentation text-processing code-translation

Replace words in a file with specified replacement string and handle indent level

So i have an exercise. I have to write a python script that finds all files in the actual folder with '.prog' extension. (This part of the program already works). This prog file looks like something like this:

import sys

n = int(sys.argv[1]) ;print "Start of the program!"

LOOP i in range(1,n) [[print "The number:";print i]]

DECISION n < 5 [[print n ;print "smaller then 5"]]

The output should be this:

import sys  

n = int(sys.argv[1]) 
print "Start of the program!"

for i in range(1,n) :
    print "The number:"
    print i

if n < 5 :
    print n 
    print "smaller then 5"

So i have to replace LOOP to for and DECISION to if. It can be a space before the ';', but it cant after it. The '[[**]]' always contains python statements. After the for loops and if statements the commands always have to start after four spaces. This is my code:

#!/usr/bin/env python
# -*- coding: utf-8 -*-

def find():
import glob, os
os.chdir(os.getcwd())
for file in glob.glob("*.prog"):
    ProgToPy(file)


def  ProgToPy(f):
outname = f.replace("prog","py")
replacements = {'LOOP':'for',  'DECISION':'if', ' ;':'\n', ';':'\n    ', ' [[':' :\n    ', ']]':''}
with open(f) as infile, open(outname, 'w') as outfile:
    for line in infile:
        for src, target in replacements.iteritems():
            line = line.replace(src, target)
        outfile.write(line)

find()

The problem with this that my output looks like this:

import sys

n = int(sys.argv[1])
print "Start of the program!"

for i in range(1,n) :
    print "The number:"
    print i

if n < 5 :
    print n
print "smaller then 5"

And if i put in something like this in the replacements ' ;':'\n '. The first print will starts right in after the four spaces. Then the created .py file don't work properly.

Solution

Since the semi-colon works differently based on its location, you should essentially write separate functions for each state. The [[...]] represent a block scope in Python, so let's call these states normal state and block state.

To control for the state, we'll use a boolean variable in_block and use regex to determine its value.

Here, I put all the replacement tasks in a replaceTokens function.

def replaceTokens(statement, in_block=False):
    if in_block:
        statement = statement.replace(SEMI, NEWLINE+TABSPACE)
        statement = statement.replace(OPEN_BRACKETS, COLON+NEWLINE+TABSPACE)
    else:
        statement = statement.replace(SEMI, NEWLINE)
        statement = statement.replace(OPEN_BRACKETS, COLON+NEWLINE)

    statement = statement.replace(CLOSE_BRACKETS, NEWLINE)

    statement = statement.replace(LOOP, FOR)
    statement = statement.replace(DECISION, IF)

    return statement

Above code looks quite tedious, but if you wanted you could easily write a for loop for that. The important take away here is the fact that I'm using the boolean variable in_block. That allows you to decide whether to follow the newline with a tab or not depending on the presence of the double-brackets.

To find the statements within the block scope, I use regex:

def progToPy(f):
    outname = f.replace("prog","py")
    rf = open(f, "r")
    text = rf.read()

    rf.close()

    block_regex = re.compile(r'\[\[.*\]\]')
    mo = block_regex.findall(text)

    for match in mo:
        statement = blockScope(match)
        text = text.replace(match, statement)

    text = replaceTokens(text)
    print(text)

The blockScope function replaces only the statements within the block scope using in_block=True, then replaces those parts first. Then, when we call replaceTokens on the entire document, those in the block scope have already been replaced and therefore won't be affected by the second call.

def blockScope(block):
    statement = replaceTokens(block, in_block=True)
    return statement