Search code examples
pythondirectorystring-parsing

Parsing directories and detecting unexpected blanks


I'm trying to parse some directories and identifying folders witch do not have a specific correct pattern. Let's exemplify:

Correct: Level1\\Level2\\Level3\\Level4_ID\\Date\\Hour\\file.txt
Incorrect: Level1\\Level2\\Level3\\Level4\\Date\\Hour\\file.txt

Notice that the incorrect one does not have the _ID. My final desired goal is parse the data replacing the '\' for a delimiter to import for MS excel:

Level1;Level2;Level3;Level4;ID;Date;Hour;file.txt
Level1;Level2;Level3;Level4; ;Date;Hour;file.txt

I had successfully parsed all the correct data making this steps: Let files be a list of my all directories

for i in arange(len(files)):
    processed_str = files[i].replace(" ", "").replace("_", "\\")
    processed_str = processed_str.split("\\")

My issue is detecting whether or not Level4 folder does have an ID after the underscore using the same script, since "files" contains both correct and incorrect directories. The problem is that since the incorrect one does not have the ID, after performing split("\") I end up having the columns mixed without a blanck between Level4 and Date:

 Level1;Level2;Level3;Level4;Date;Hour;file.txt

Thanks,


Solution

  • Do the "_ID" check after splitting the directories, that way you don't loose information. Assuming the directory names themselves don't contain escaped backslashes and that the ID field is always in level 4 (counting from 1), this should do it:

    for i in arange(len(files)):
        parts = files[i].split("\\")
        if parts[3].endswith("_ID"):
            parts.insert(4, parts[3][:-len("_ID")])
        else:
            parts.insert(4, " ")
        final = ";".join(parts)