Search code examples
pythonjsonpandascsvpython-os

How to rename multiple .Json files within a subfolder using python and pandas


I am having difficulty trying to rename Json files within a ton of subfolders. What I want to do is to replace json files with a count variable. Since, each and every one of the .json files end with messages_1.json within its respective folder.

Here Person_1, Person_2, Person_3,......,Person_n are individual sub-folders inside the Inbox folder

Example file structure

- C:/abc/def/ghi/klmn/opq/rst/uvw/xyz/messages/Inbox:
   - Person_1
     - message_1.json
   - Person_2
     - message_1.json
   - Person_3:
      - message_1.json
   .
   .
   .
   .
   - Person_n:
      - message_1.json

Additionally, I want to save them as a single panda dataframe and later export it as a csv file where I can work further on the created dataframe.

Here is What I have tried so far and am stuck:

Code I've Tried:

directory = os.path.dirname(os.path.realpath(sys.argv[0]))
for root, dirs, files in os.walk("C:/abc/def/ghi/klmn/opq/rst/uvw/xyz/messages/inbox/"):
    
    for name in files:
        
        if name.endswith((".json")):
            folder_names = os.path.relpath(root, directory)
            
            json_files = os.path.join(folder_names, name)

Output Which I want to get

- Person_1
  - message_1.json
- Person_2
   - message_2.json
- Person_3:
  - message_3.json
   .
   .
   .
   .
- Person_n:
 - message_n.json

OR

All replaced json names and then a single csv file with all json files

Any help will be deeply appreciated I'm not able to wrap my head around how to get this


Solution

  • Use pathlib to build the dataframe, then you can rename the files.

    from pathlib import Path
    import pandas as pd
    
    pth = Path("C:/abc/def/ghi/klmn/opq/rst/uvw/xyz/messages/inbox/")
    
    data = [(f, f.parent, f.stem, f.suffix)
            for f in pth.rglob('*.json')]
    
    # load into dataframe
    df = pd.DataFrame(data=data, columns=['pth', 'dname', 'fname', 'suffix'])
    
    # create new filename
    df['new_name'] = (
      df['fname'].str.split('_').str[0] +
      '_' +
      (df.index + 1).astype(str) +
      df['suffix']
    )
    
    # now rename each file
    for row in df.itertuples():
        row.pth.rename(Path(row.dname) / row.new_name)