Search code examples
pythonif-statementpython-3.xfor-loopdna-sequence

Determining if a sequence is a valid DNA sequence


I'm attempting to write this program that reads in a sequence into a string variable, called sequence, and finds out if sequence contains a valid DNA sequence or not. I want to use a single for and one if-elif-elsestatement to determine whether the sequence is valid DNA or not.
This is what I have written so far:

sequence = input("Please enter a sequence: ").upper()
valid_dna = "ACGT"
sequence = sequence.replace(" ", "")

common=0
for eachletter in sequence:
    if eachletter in valid_dna:
        common +=1

print("This is a valid dna sequence")

elif sequence != valid_dna:
    print("This is not a valid DNA sequence")

else:
    print()

I don't know what to add after elif, because what I added after elif it returns Syntax error.

I originally had

sequence = input().upper()
sequence= input("Please enter a sequence:  ")

which didn't work well together, thank you to VHarisop for pointing it out!

Update: This is what I have now, and it works!

sequence = input().upper()
valid_dna = "ACGT"
sequence = sequence.replace(" ", "")

for i in sequence:
    if i in valid_dna:
            count = 1
    else:
            count=0
if count==1:
    print("This is a valid DNA sequence.") 
else:
    print("This is an invalid DNA sequence")

Solution

  • First of all, you have:

    sequence = input().upper()
    # irrelevant code
    sequence= input("Please enter a sequence:  ")
    

    This will ask for input two times, turning everything you type to uppercase the first time and leaving it untouched the second, which will obviously result in erroneous behaviour. I would recommend keeping only:

    sequence = input('Please enter a sequence: ').upper()
    

    and then using a generator expression to check validity.

    Actually, there is no need to keep a separate string for non-valid characters. Just do:

    valid_dna = 'ACGT'
    sequence = input('Please enter a sequence: ').upper()
    
    # will print True if every character in the sequence belongs to valid_dna
    print(all(i in valid_dna for i in sequence))
    

    Here, the generator expression (i in valid_dna for i in sequence) will return True for every character of the sequence that belongs to valid_dna and False for every character that does not. The built-in function any() will return True only if every value generated by the expression is True.

    If you want a proper message, you can simply check the return value of the expression and print accordingly:

    condition = all(i in valid_dna for i in sequence)
    print('Valid sequence') if condition else print('Invalid sequence')