I need to split text out of file names which look like this: 'foo_bar_1_10.asc.gz'
and I have a corresponding text list for each one of these files that looks like this: '1 10'
. This corresponding list is what I want to re-create. The reason is I need to compare all of my files to a master list to find missing files. So ultimately I need a method to compare the two lists (diff?) Any help would be great
import os
newtxt = []
oldtxt = '\foobar\master_list.txt'
wd = '\foobar'
for file in os.listdir(wd):
file = file.split('.')
subpieces = file[0].split('_')
numbers = ' '.join(subpieces[-2:])
newtxt.append(numbers)
print txt
@@@ Update @@@
I now I have 2 lists with line numbers (using a function similar to nl in unix -- named nl
and output looks something like this 1: 1 10
and 2: 1 12
. I need to check for missing values in newtxt
from oldtxt
. I've tried this:
s = set(nl(newtxt))
diff = [x for x in nl(oldtxt) if x not in s]
print diff
What this returns is some text characters and not what I expected. Any help?
It sounds like you're struggling with the string parsing part. First split up the file name into pieces by calling the string .split
method, splitting by a period:
>>> file = 'foo_bar_1_10.asc.gz'
>>> pieces = file.split('.')
>>> pieces
['foo_bar_1_10', 'asc', 'gz']
Then split that up into subpieces based on the _ character:
>>> subpieces = pieces[0].split('_')
>>> subpieces
['foo', 'bar', '1', '10']
You can then join the last two pieces back together, separated by a space, like this:
>>> numbers = ' '.join(subpieces[-2:])
>>> numbers
'1 10'