Search code examples
pythonglobpathlib

A python Path.rglob pattern to match all package.json files in a directory that are not nested inside of a node_modules folder


I'm working with a massive monorepo, and I'm trying to write a script that will need to grab some information from all of the monorepo's package.json files, but not and package.json files that are nested in any of the node_modules folder. I've tried everything apart from just filtering them with a regex after recursively going through the entire directory, including the node_modules folder. I'm aware that that's an option, but ideally I'd like to be able to filter those directories before the search for performance reasons. The monorepo structure looks something like:

root/
    node_modules/
    apps/
        someApp/
            node_modules/
        someApp2/
            node_modules/
    packages/
        somePackage1/
            node_modules/
        somePackage2/
            node_modules/
        somePackage3/
            node_modules/
        ...

Any help would be greatly appreciated! Thanks.


Solution

  • I would go through the whole file tree and skip everything that is node_modules.

    This will be much more perfomant then searching for all package.json and filtering them by their path.

    from os import walk
    
    w = walk("/path/to/your/repo")
    for(dir_path, dir_names, file_names) in w:
      if dir_path.endswith('node_modules'):
        continue
      for file in file_names:
        if file == 'package.json':
            yield os.path.join(dir_path, file)