Search code examples
pythonbashtext

Remove certain no. of characters before and after each line in a TXT file with bash or python


I was wondering if there is a way to remove desired no. of characters before and after each line in a text file using bash or python

I am trying to remove first 5 characters and last 2 characters in each line in this case

file.txt

(2, 'https://en.wikipedia.org/wiki/Register_(sociolinguistics)')
(3, 'https://dictionary.cambridge.org/dictionary/english/register')

desired_file.txt

https://en.wikipedia.org/wiki/Register_(sociolinguistics)
https://dictionary.cambridge.org/dictionary/english/register

I've tried to look for similar questions unfortunately none of'em worked for me


Solution

  • Can try this code. It seems like a simple fix but I wouldn't recommend this since it restricts how you parse.

    input_file = "input.txt"
    output_file = "output.txt"
    
    with open(input_file, "r") as infile, open(output_file, "w") as outfile:
        for line in infile:
            trimmed_line = line[5:-3]
            outfile.write(trimmed_line + "\n")
    

    Instead I would recommend your code be based on the word http and end with ') As per the code below:

    input_file = "input.txt"
    output_file_1 = 'output1.txt'
    
    with open(input_file, "r") as infile, open(output_file_1, "w") as outfile1:
        for line in infile:
            start_index = line.find("http")
            end_index = line.rfind("')")
            trimmed_line = line[start_index:end_index]
            outfile1.write(trimmed_line + "\n")