Search code examples
pythonsequencestring-length

Longest and shortest sequence, Python


I have this program to generate random N sequences.

import random
N = 5
def randseq(abc, length):
    return "".join([random.choice(abc) for i in range(random.randint(1, length))])
for i in range(N):
    print(f'Sequence {i+1}:')
    print(randseq("ATCG", 120))

I got the sequences

Sequence 1:

TGGTACACGTGCTTAATGTTAACCTGTCTGGCGCAGGGTAACTATTTCATCCCT

Sequence 2:

CGTATATAATGCTTCCTCTTCAGGCGACCTTGCGATAGTGTCCGGCCATGTGAGTCCCTGTGGAGTGCCTTTAGATGACCTATACGTCTTTAGACTATGTTTATGGGG

Sequence 3:

CACAGCCTTCCTCCAATG . . .

Sequence N:

How can I print the longest and shortest N sequences and their lengths?

....


Solution

  • Please check on my code. The descriptions are inside there.

    import random
    
    
    def randseq(abc, length):
        return "".join([random.choice(abc) for i in range(random.randint(1, length))])
    
    
    # You should move the input value to the main part of code
    # If not, it will treat as global variable
    N = 5
    
    # Init the longest seq with shortest one (empty string) 
    # to make sure that all random seq must longer than this init
    longest_seq = ""
    
    # Init the shortest seq with longest one 
    # (assume that randseq("ATCG", 1000) is long enough) 
    # to make sure that all random seq must shorter than this init
    shortest_seq = randseq("ATCG", 1000)
    
    for i in range(N):
        print(f'Sequence {i+1}:')
        seq = randseq("ATCG", 120)
        
        # Find the longest one then update it to the longest_seq variable
        if len(seq) > len(longest_seq):
            longest_seq = seq
        
        # Find the shortest one then update it to the shortest_seq variable
        if len(seq) < len(shortest_seq):
            shortest_seq = seq
        
        print(seq)
       
    print("") 
    print('The longest seq is ', longest_seq)
    print('The lenght of longest seq is ', len(longest_seq))
    print('The shortest is ', shortest_seq)
    print('The lenght of shortest seq is ', len(shortest_seq))
    

    Example result (it's random, so it will not same as you when you run it)

    Sequence 1:
    CGGTGATCGCGATTACTGCCCGGCCTTGTCCACTCACAGCGATAACAGTGCTTATAGATCTCTCAAGTCTACCGTCTCACCCGTTGATTACCAA
    Sequence 2:
    AAGGTCAAGATTCGAATTCGTATCGCCGTATGGATAGGCGAAACGAGGGGTGGCTAAGGGGTAGACAGCAGAGCCGCTTTTGTACACCGTAAAACGGACGGTTCAGAACCGGAGGTACG
    Sequence 3:
    ACGGCCTCATGGATAATGCCCGGGGGAACAGGGAAGGAAAGATTTTGTCAAACTGATTCAGTTAC
    Sequence 4:
    GATACA
    Sequence 5:
    ATCGAAAGGAATATCTGTACGGGACGTTTGGTCTCGAGCCTAGCGTAAGCCGCCCGCAATTCGCTCTGATGAGCTACCG
    
    The longest seq is  AAGGTCAAGATTCGAATTCGTATCGCCGTATGGATAGGCGAAACGAGGGGTGGCTAAGGGGTAGACAGCAGAGCCGCTTTTGTACACCGTAAAACGGACGGTTCAGAACCGGAGGTACG
    The lenght of longest seq is  119
    The shortest is  GATACA
    The lenght of shortest seq is  6
    

    Precaution:

    In some (rarely) case, the initialization of shortest_seq might be too small (smallest among all random seq). If this case occur, the program will be failed. You can increase the length of randseq input to reduce the possibility to encounter with this problem.

    For example.

    You can change it from:

    shortest_seq = randseq("ATCG", 1000)
    

    to:

    shortest_seq = randseq("ATCG", 10000)