Search code examples
pythonglob

python: how can I sort elements within glob.list according to the name?


I have a folder consisted of 240 filles:

model_1.pdb ... model_240.pdb

I need to create a glob list sorting the filles according to their names in the following order:

model_1.pdb, model_2.pdb, model_3.pdb, model_4.pdb ... model_240.pdb

I've tried:

pdb_list2 = [os.path.basename(p) for p in sorted(glob.glob(pdbs + '/*.pdb'))]

That gives me a wrong sorting:

>>> pdb_list2
['model_1.pdb', 'model_10.pdb', 'model_100.pdb', 'model_101.pdb', 'model_102.pdb', 'model_103.pdb', 'model_104.pdb', 'model_105.pdb', 'model_106.pdb', 'model_107.pdb', 'model_108.pdb', 'model_109.pdb', 'model_11.pdb', 'model_110.pdb', 'model_111.pdb', 'model_112.pdb', 'model_113.pdb', 'model_114.pdb', 'model_115.pdb', 'model_116.pdb', 'model_117.pdb', 'model_118.pdb', 'model_119.pdb', 'model_12.pdb', 'model_120.pdb', 'model_121.pdb', 'model_122.pdb', 'model_123.pdb', 'model_124.pdb', 'model_125.pdb', 'model_126.pdb', 'model_127.pdb', 'model_128.pdb', 'model_129.pdb', 'model_13.pdb', 'model_130.pdb', 'model_131.pdb', 'model_132.pdb', 'model_133.pdb', 'model_134.pdb', 'model_135.pdb', 'model_136.pdb', 'model_137.pdb', 'model_138.pdb', 'model_139.pdb', 'model_14.pdb', 'model_140.pdb', 'model_141.pdb', 'model_142.pdb', 'model_143.pdb', 'model_144.pdb', 'model_145.pdb', 'model_146.pdb', 'model_147.pdb', 'model_148.pdb', 'model_149.pdb', 'model_15.pdb', 'model_150.pdb', 'model_151.pdb', 'model_152.pdb', 'model_153.pdb', 'model_154.pdb', 'model_155.pdb', 'model_156.pdb', 'model_157.pdb', 'model_158.pdb', 'model_159.pdb', 'model_16.pdb', 'model_160.pdb', 'model_161.pdb', 'model_162.pdb', 'model_163.pdb', 'model_164.pdb', 'model_165.pdb', 'model_166.pdb', 'model_167.pdb', 'model_168.pdb', 'model_169.pdb', 'model_17.pdb', 'model_170.pdb', 'model_171.pdb', 'model_172.pdb', 'model_173.pdb', 'model_174.pdb', 'model_175.pdb', 'model_176.pdb', 'model_177.pdb', 'model_178.pdb', 'model_179.pdb', 'model_18.pdb', 'model_180.pdb', 'model_181.pdb', 'model_182.pdb', 'model_183.pdb', 'model_184.pdb', 'model_185.pdb', 'model_186.pdb', 'model_187.pdb', 'model_188.pdb', 'model_189.pdb', 'model_19.pdb', 'model_190.pdb', 'model_191.pdb', 'model_192.pdb', 'model_193.pdb', 'model_194.pdb', 'model_195.pdb', 'model_196.pdb', 'model_197.pdb', 'model_198.pdb', 'model_199.pdb', 'model_2.pdb', 'model_20.pdb', 'model_200.pdb', 'model_201.pdb', 'model_202.pdb', 'model_203.pdb', 'model_204.pdb', 'model_205.pdb', 'model_206.pdb', 'model_207.pdb', 'model_208.pdb', 'model_209.pdb', 'model_21.pdb', 'model_210.pdb', 'model_211.pdb', 'model_212.pdb', 'model_213.pdb', 'model_214.pdb', 'model_215.pdb', 'model_216.pdb', 'model_217.pdb', 'model_218.pdb', 'model_219.pdb', 'model_22.pdb', 'model_220.pdb', 'model_221.pdb', 'model_222.pdb', 'model_223.pdb', 'model_224.pdb', 'model_225.pdb', 'model_226.pdb', 'model_227.pdb', 'model_228.pdb', 'model_229.pdb', 'model_23.pdb', 'model_230.pdb', 'model_231.pdb', 'model_232.pdb', 'model_233.pdb', 'model_234.pdb', 'model_235.pdb', 'model_236.pdb', 'model_237.pdb', 'model_238.pdb', 'model_239.pdb', 'model_24.pdb', 'model_240.pdb', 'model_25.pdb', 'model_26.pdb', 'model_27.pdb', 'model_28.pdb', 'model_29.pdb', 'model_3.pdb', 'model_30.pdb', 'model_31.pdb', 'model_32.pdb', 'model_33.pdb', 'model_34.pdb', 'model_35.pdb', 'model_36.pdb', 'model_37.pdb', 'model_38.pdb', 'model_39.pdb', 'model_4.pdb', 'model_40.pdb', 'model_41.pdb', 'model_42.pdb', 'model_43.pdb', 'model_44.pdb', 'model_45.pdb', 'model_46.pdb', 'model_47.pdb', 'model_48.pdb', 'model_49.pdb', 'model_5.pdb', 'model_50.pdb', 'model_51.pdb', 'model_52.pdb', 'model_53.pdb', 'model_54.pdb', 'model_55.pdb', 'model_56.pdb', 'model_57.pdb', 'model_58.pdb', 'model_59.pdb', 'model_6.pdb', 'model_60.pdb', 'model_61.pdb', 'model_62.pdb', 'model_63.pdb', 'model_64.pdb', 'model_65.pdb', 'model_66.pdb', 'model_67.pdb', 'model_68.pdb', 'model_69.pdb', 'model_7.pdb', 'model_70.pdb', 'model_71.pdb', 'model_72.pdb', 'model_73.pdb', 'model_74.pdb', 'model_75.pdb', 'model_76.pdb', 'model_77.pdb', 'model_78.pdb', 'model_79.pdb', 'model_8.pdb', 'model_80.pdb', 'model_81.pdb', 'model_82.pdb', 'model_83.pdb', 'model_84.pdb', 'model_85.pdb', 'model_86.pdb', 'model_87.pdb', 'model_88.pdb', 'model_89.pdb', 'model_9.pdb', 'model_90.pdb', 'model_91.pdb', 'model_92.pdb', 'model_93.pdb', 'model_94.pdb', 'model_95.pdb', 'model_96.pdb', 'model_97.pdb', 'model_98.pdb', 'model_99.pdb']

How can I fix it ?


Solution

  • The default sorting is lexicographic sorting (essentially a dictionary order sort).

    A simple solution is possible by extracting the numeric component of the filenames as the sorting key, as below:

    import glob
    import os
    import re
    
    files = glob.glob('*.pdb')
    files = sorted(files, key=lambda x: int(re.findall(r'\d+', x)[0]))
    files = [os.path.basename(p) for p in files]
    

    Here, no assumption is made as to the exact format of the filename - it simply takes the first sequence of digits in the filename and uses that as the (numeric) sorting key.