Finding names in a text file (Text A) using a list in another text file (Text B) and assign values next to the names in Text A (Python)

I am a newbie in Python language and I need your help please.

I have 2 different text files. Let's they are Text_A.txt and Text_B.txt.

Text_A.txt contains a list of names as following (they are tab delineated):

Sequence_1 Sequence_2 Sequence_3 Sequence_4 Sequence_5 Sequence_6 Sequence_7 Sequence_8

and Text_B.txt contains a list of names as following (sequence names are written in each line):

Sequence_1 Sequence_2 Sequence_3 Sequence_4 Sequence_5 Sequence_6 Sequence_7 Sequence_8 Sequence_9 Sequence_10 Sequence_11

What I would like to do is assign "1" next to the sequence names in Text_B.txt if the names are in Text_A.txt. And assign "0" next to the sequence names in Text_B.txt if the names are not in Text_A.txt.

so... the expected output using the example above is something like below (the names and corresponding values should be written in each line):

Sequence_1;1
Sequence_2;1 Sequence_3;1 Sequence_4;1 Sequence_5;1 Sequence_6;1 Sequence_7;1 Sequence_8;1 Sequence_9;0 Sequence_10;0 Sequence_11;0

I would like the output in .txt format.

How should I do this using Python?

Your help is really needed here as I have more than 3000 and 6000 names in Text_A.txt and Text_B.txt files respectively.

Thank you so much!

Solution

You may do the following

# read each file assuming that your sequence of strings 
# is the first line respectively
with open('Text_A.txt', 'r') as f:
    seqA = f.readline()
with open('Text_B.txt', 'r') as f:
    seqB = f.readline()

# remove end-of-line character
seqA = seqA.strip('\n')
seqB = seqB.strip('\n')

# so far, seqA and seqB are strings. split them now on tabs
seqA = seqA.split('\t')
seqB = seqB.split('\t')

# now, seqA and seqB are list of strings
# since you want to use seqA as a lookup, you should make a set out of seqA
seqA = set( seqA )

# now iterate over each item in seqB and check if it is present in seqA
# store result in a list
out = []
for item in seqB:
    is_present = 1 if item in seqA else 0
    out.append('{item}:{is_presnet}\n'.format(item=item,is_present=is_present))

# write result to file
with open('output.txt','w') as f:
    f.write( '\t'.join( out ) )

If your sequences contain several millions entries you should think about a more advanced approach.