I have a list of strings as below:
['/home/xin/Downloads/BrainImaging_UNC/out04_adni_roi_signals2/roi_signals_power264_sub-130_S_4817-ses-2018-05-04_14_33_33.0.txt',
'/home/xin/Downloads/BrainImaging_UNC/out04_adni_roi_signals2/roi_signals_power264_sub-141_S_0767-ses-2019-04-08_12_52_36.0.txt',
'/home/xin/Downloads/BrainImaging_UNC/out04_adni_roi_signals2/roi_signals_power264_sub-041_S_5097-ses-2019-05-07_09_56_14.0.txt',
'/home/xin/Downloads/BrainImaging_UNC/out04_adni_roi_signals2/roi_signals_power264_sub-068_S_4061-ses-2017-09-26_14_07_37.0.txt',
'/home/xin/Downloads/BrainImaging_UNC/out04_adni_roi_signals2/roi_signals_power264_sub-002_S_1280-ses-2017-03-13_13_38_31.0.txt',
'/home/xin/Downloads/BrainImaging_UNC/out04_adni_roi_signals2/roi_signals_power264_sub-082_S_5282-ses-2019-06-17_10_11_15.0.txt',
'/home/xin/Downloads/BrainImaging_UNC/out04_adni_roi_signals2/roi_signals_power264_sub-018_S_4399-ses-2019-08-06_13_03_58.0.txt',
'/home/xin/Downloads/BrainImaging_UNC/out04_adni_roi_signals2/roi_signals_power264_sub-123_S_0106-ses-2018-10-11_12_54_59.0.txt',
'/home/xin/Downloads/BrainImaging_UNC/out04_adni_roi_signals2/roi_signals_power264_sub-141_S_2333-ses-2018-12-26_15_31_55.0.txt',
'/home/xin/Downloads/BrainImaging_UNC/out04_adni_roi_signals2/roi_signals_power264_sub-031_S_2018-ses-2019-01-24_11_26_13.0.txt',
'/home/xin/Downloads/BrainImaging_UNC/out04_adni_roi_signals2/roi_signals_power264_sub-041_S_0679-ses-2017-07-05_09_46_36.0.txt',
'/home/xin/Downloads/BrainImaging_UNC/out04_adni_roi_signals2/roi_signals_power264_sub-037_S_0303-ses-2017-05-11_13_39_46.0.txt',
'/home/xin/Downloads/BrainImaging_UNC/out04_adni_roi_signals2/roi_signals_power264_sub-037_S_0454-ses-2017-09-06_09_41_25.0.txt',
'/home/xin/Downloads/BrainImaging_UNC/out04_adni_roi_signals2/roi_signals_power264_sub-068_S_2187-ses-2019-10-09_13_19_17.0.txt',
'/home/xin/Downloads/BrainImaging_UNC/out04_adni_roi_signals2/roi_signals_power264_sub-116_S_4043-ses-2018-03-02_10_03_10.0.txt',
I hope to extract the unique subject id with the pattern 'sub-???_S_????' in the list.
So far I can do it with:
unique_subject = re.search('(.*)_sub-(.*)-ses(.*).txt', all_files[0]).group(2)
But that only works for a single string. I need to do it with a loop.
unique_subject = set()
for f in all_files:
unique_subject.add(re.search('(.*)_sub-(.*)-ses(.*).txt', f).group(2))
I am wondering if there are better ways to do this. Finally I would like to get the first session for each subject. Is there a fast way to do that?
Try using this:
l = re.findall('\d{3}_S_\d{4}', ''.join(all_files))