Search code examples
phpregexdata-structureslanguage-agnosticfilesystems

Recursively mapping file paths in one folder to another folder


Let's say I have a folder (folder_1) with the following structure:

/folder_1
  /dir_1
     - file_1_1.txt
     - file_1_2.txt
  /dir_2
     - file_2_1.txt
     /dir_2_1
       - file_2_1_1.txt
  - file_1.txt

Now, let's say I have another folder (folder_2) with the following structure:

/folder_2
  /dir_1
     - file_1_1.txt
     - default.txt
  /dir_2
     - file_2_1.txt
     - default.txt
  - default.txt

I need to map every file in folder_1 to a file in folder_2 such that:

  1. /folder_1/dir_1/file_1_1.txt maps to /folder_2/dir_1/file_1_1.txt.
  2. /folder_1/dir_1/file_1_1.txt maps to /folder_2/dir_1/default.txt
  3. /folder_1/dir_2/file_2_1.txt maps to /folder_2/dir_2/file_2_1.txt
  4. /folder_1/dir_2/dir_2_1/file_2_1_1.txt maps to /folder_2/dir_2/default.txt
  5. /folder_1/file_1.txt maps to /folder_2/default.txt

I am not the best communicator, so hopefully, the above pattern makes sense to you guys. The question is language agnostic really, but an answer in PHP and/or Javascript would be really great.

So far, I was able to accomplish this in PHP using FileIterator, RecursiveDirectoryIterator, and a bunch of custom classes that extract and then map the path to the files one by one.

This makes me wonder if I am missing an easier way to do this simple mapping. Maybe using regex named groups or something?

**Edit: **

Is it possible that for each file (file path) in folder_1, we use a regex pattern to find (reduce) the best match out of a map of all file paths in folder_2?

Further edit:

This is for mapping data files in folder_1 to template files in folder_2. If for a file in folder_1, an exact matching file path (including filename) in folder_2 is not found, we look for default.txt. If default.txt is not found, then we move up a directory and use that parent directory's default.txt. This way, we keep moving up directory levels till we find the first default.txt.


Solution

  • First, use your recursive directory scanner to scan all of the folder_2 directory tree. Build a hash table that contains the file names, without the folder_2 prefix. So your hash table would contain:

    /dir_1
    /dir_1/file_1_1.txt
    /dir_1/default.txt
    /dir_2/file_2_1.txt
    /dir_2/default.txt
    /default.txt
    

    Now, start scanning folder_1. When you get a file, strip folder_1 from the front, and look for the resulting string in the hash table. If it's there, then you have a match.

    If the file is not there, replace the last segment with "default.txt", and try again. So, when you begin scanning folder_1, you get:

    /folder_1/dir_1/file_1_1.txt
    

    You look up dir_1/file_1_1.txt in the hash table and find it. You have a match.

    Next, you get /folder_1/dir_1/file_1_2.txt. You look up /dir_1/file_1_2.txt in the hash table and don't find it. So you replace file_1_2.txt with default.txt, giving you /dir_1/default.txt. You look that up in the hash table, find it, and you have a match.

    Now, if /dir_1/default.txt did not exist, then you would again adjust the file name to remove the last directory. That is, you'd remove /dir_1, and you'd look up /default.txt in the hash table.

    In pseudo code it looks like this:

    for each file in folder_1
        name = strip `/folder_1` from the name
        if name in hash table then
            match found
            continue (next file)
        end if
        replace file name (everything after the last '/') with "default.txt"
        do
            if name in hash table then
                match found
                continue (next file)
            end if
            remove the last slash, and everything between it and the previous slash.
            (so "/dir_1/default.txt" becomes "/default.txt")
        while name.length > 0
    
        // if you get here, no match was found
    end for