Search code examples
pythonpython-3.xinsertspaces

How to insert random spaces in txt file?


I have a file with lines of DNA in a file called 'DNASeq.txt'. I need a code to read each line and split each line at random places (inserting spaces) throughout the line. Each line needs to be split at different places.

EX: I have: AAACCCHTHTHDAFHDSAFJANFAJDSNFADKFAFJ And I need something like this: AAA ADSF DFAFDDSAF ADF ADSF AFD AFAD

I have tried (!!!very new to python!!):

import random

for x in range(10):
  print(random.randint(50,250))

but that prints me random numbers. Is there some way to get a random number generated as like a variable?


Solution

  • You can read a file line wise, write each line character-wise in a new file and insert spaces randomly:

    Create demo file without spaces:

    with open("t.txt","w") as f:
        f.write("""ASDFSFDGHJEQWRJIJG
    ASDFJSDGFIJ
    SADFJSDFJJDSFJIDFJGIJSRGJSDJFIDJFG
    SDFJGIKDSFGOROHPTLPASDMKFGDOKRAMGO""")
    

    Read and rewrite demo file:

    import random
    max_no_space = 9 # if max sequence length without space
    no_space = 0
    with open("t.txt","r") as f, open("n.txt","w") as w: 
        for line in f:
            for c in line:
                w.write(c)
                if random.randint(1,6) == 1 or no_space >= max_no_space:
                    w.write(" ")
                    no_space = 0
                else:
                    no_space += 1
    with open("n.txt") as k:
        print(k.read())
    

    Output:

    ASDF SFD GHJEQWRJIJG 
    A SDFJ SDG FIJ
    SADFJSD FJ JDSFJIDFJG I JSRGJSDJ FIDJFG 
    

    The pattern of spaces is random. You can influence it by settin max_no_spaces or remove the randomness to split after max_no_spaces all the time


    Edit:

    This way of writing 1 character at a time if you need to read 200+ en block is not very economic, you can do it with the same code like so:

    with open("t.txt","w") as f:
        f.write("""ASDFSFDGHJEQWRJIJSADFJSDFJJDSFJIDFJGIJSRGJSDJFIDJFGG
    ASDFJSDGFIJSADFJSDFJJDSFJIDFJGIJSRGJSDJFIDJFGSADFJSDFJJDSFJIDFJGIJK
    SADFJSDFJJDSFJIDFJGIJSRGJSDJFIDJFGSADFJSDFJJDSFJIDFJGIJSRGJSDJFIDJF
    SDFJGIKDSFGOROHPTLPASDMKFGDOKRAMGSADFJSDFJJDSFJIDFJGIJSRGJSDJFIDJFG""")
    
    
    import random
    min_no_space = 10
    max_no_space = 20 # if max sequence length without space
    no_space = 0
    with open("t.txt","r") as f, open("n.txt","w") as w: 
        for line in f:
            for c in line:
                w.write(c)
                if no_space > min_no_space:
                    if random.randint(1,6) == 1 or no_space >= max_no_space:
                        w.write(" ")
                        no_space = 0
                else:
                    no_space += 1
    with open("n.txt") as k:
        print(k.read())
    

    Output:

    ASDFSFDGHJEQ WRJIJSADFJSDF JJDSFJIDFJGIJ SRGJSDJFIDJFGG
     ASDFJSDGFIJSA DFJSDFJJDSFJIDF JGIJSRGJSDJFIDJ FGSADFJSDFJJ DSFJIDFJGIJK
    SADFJ SDFJJDSFJIDFJG IJSRGJSDJFIDJ FGSADFJSDFJJDS FJIDFJGIJSRG JSDJFIDJF
    SDFJG IKDSFGOROHPTLPASDMKFGD OKRAMGSADFJSDF JJDSFJIDFJGI JSRGJSDJFIDJFG