I wish to write a script in Python 3 that scans a folder recursively, finds all files ending in .xml extension and store file name and relative path in a dictionary.
The structure is like this:
ROOT (/home/myuser/projects/myproject)
├── src/
│ └── mypythonscript.py
│
└── import/
└── xml/
├── folder1/
│ ├── filea.json
│ ├── fileb.json
│ └── filec.xml
└── folder2/
├── filea.json
├── fileb.xml
└── filec.json
My base path is defined as:
basepath = "../import/xml/"
If I use os.path.abspath()
, I am getting an invalid absolute path of the files:
/home/myuser/projects/myproject/src/filec.xml
What I am looking for is to extract a path that looks more like this:
../import/xml/folder1/filec.xml
I have tried this so far:
for folder, subfolder, files in os.walk(basepath):
for file in files:
if os.path.splitext(file)[1] == ".xml":
print(os.path.join(folder, file))
However, this won't print me anything.
If I use print(os.path.relpath(file, basepath))
, I get some invalid paths, like:
../../src/filec.xml
The idea here is to store relative paths to xml files so that I can load them afterwards and parse them using xmltodict.
The scope is to batch parse the XML files and extract certain nodes from them to push them elsewhere, where they can be edited by non-technical people. Once that is done, I need to grab the edited data and put it back into the XML files. Thus, I need a path to those files.
Am I doing something wrong or should I just use the absolute paths?
This seems to work:
import os
basepath = "../import/xml/"
for root, dirs, files in os.walk(os.path.relpath(basepath)):
for file in files:
print(os.path.join(root, file))