Merging Two CSV Files Based on Many Criteria

I have two CSV files. They have a same column but each of rows in the same column are not unique, like this:

gpo_full.csv:
    Date           hearing_sub_type   topic      Specific_Date
    January,1997   Oversight          weather    January 12,1997
    June,2000      General            life       June 5,2000
    January,1997   General            forest     January 1,1997
    April,2001     Oversight          people     NaN 
    June,2000      Oversight          depressed  June 6,2000
    January,1997   General            weather    January 1,1997
    June,2000      Oversight          depressed  June 5,2000

CAP_cols.csv:
    majortopic   id     Chamber   topic           Date           Specific_Date
    21           79846  1         many forest     January,1997   January 1,1997
    4            79847  2         emotion         June,2000      June 6,2000
    13           79848  1         NaN             May,2001       NaN
    7            79849  2         good life       June,2000      June 5,2000
    21           79850  1         good weather    January,1997   January 1,1997
    25           79851  1         rain & cloudy   January,1997   January 12,1997
    6            79852  2         sad & depressed June,2000      June 5,2000

I want to use three criteria to match these data: Specific_Date, Date, and topic.
First, I want to use "Date" column to group these data. Next, I try to use "Specific_Date" column to narrow down the scope since some data are lost in this column. Finally, I want to use "topic" column by similar words like word-embedding to make sure which rows in gpo_full can be corresponding with a unique row in CAP_cols.
I have tried to use "Date" column to group the data and merge them into JSON file. However, I am trapped in achieving the next step to narrow down the scope by specific date and topic.
My thought for this output would be like:

{
"Date": "January,1997",
"Specific_Date": "January 12,1997"
"Topic": {"GPO": "weather", "CAP": "rain & cloudy"}
"GPO": {
    "hearing_sub_type": "Oversight",
    and other columns
}
"CAP": {
    "majortopic": "25",
    "id": "79851",
    "Chamber": "1"
}
},
{
"Date": "January,1997",
"Specific_Date": "January 1,1997"
"Topic": {"GPO": "forest", "CAP": "many forest"}
"GPO": {
    "hearing_sub_type": "General",
    and other columns
}
"CAP": {
    "majortopic": "21",
    "id": "79846",
    "Chamber": "1"
}
and similar for others}

I have been thinking for three days and have no idea. Any idea for achieving this would be very helpful! Greatly appreciated!

Solution

There's a couple of issues with the matching of topics, so you'll need to expand the match_topic() method I used, but I added some logic to see what didn't match at the end.

The results variable contains a list of dict which you can easily save as a JSON file.

Check the inline comments for the reasoning of the logic I used.

Sidenote:

I would slightly restructure the JSON if I were you. Putting the topic as a key/value pair under the GPO and CAP keys makes more sense to me than having a Topic key with a separate GPO and CAP key/value pair...

import csv
from pprint import pprint
import json


# load gpo_full.csv into a list of dict using
# csv.DictReader & list comprehension
with open("path/to/file/gpo_full.csv") as infile:
    gpo_full = [item for item in csv.DictReader(infile)]


# do the same for CAP_cols.csv
with open("path/to/file/CAP_cols.csv") as infile:
    cap_cols = [item for item in csv.DictReader(infile)]


def match_topic(gpo_topic: str, cap_topic: str) -> bool:
    """We need a function as some of the mapping is not simple

    Args:
        gpo_topic (str): gpo topic
        cap_topic (str): CAP topic

    Returns:
        bool: True if topics match
    """
    # this one is simple
    if gpo_topic in cap_topic:
        return True
    # you need to repeat the below conditional check
    # for each custom topic matching
    elif gpo_topic == "weather" and cap_topic == "rain & cloudy":
        return True 
    # example secondary topic matching
    elif gpo_topic == "foo" and cap_topic == "bar":
        return True 
    # finally return false for no matches
    return False


# we need this later
gpo_length = len(gpo_full)
results = []
cap_left_over = []
# do the actual mapping
# this could've been done above, but I separated it intentionally
for cap in cap_cols:
    found = False
    # first find the corresponding gpo
    for index, gpo in enumerate(gpo_full):
        if (
            gpo["Specific_Date"] == cap["Specific_Date"] # check by date
            and match_topic(gpo["topic"], cap["topic"]) # check if topics match
        ):
            results.append({
                "Date": gpo["Date"],
                "Specific_Date": gpo["Specific_Date"],
                "Topic": {
                    "GPO": gpo["topic"],
                    "CAP": cap["topic"]
                },
                "GPO": {
                    "hearing_sub_type": gpo["hearing_sub_type"]
                },
                "CAP": {
                    "majortopic": cap["majortopic"],
                    "id": cap["id"],
                    "Chamber": cap["Chamber"]
                }
            })
            # pop & break to remove the gpo item
            # this is so you're left over with a list of
            # gpo items that didn't match
            # it also speeds up further matches
            gpo_full.pop(index)
            found = True
            break
    # this is to check if there's stuff left over
    if not found:
        cap_left_over.append(cap)


with open('path/to/file/combined_json.json', 'w') as outfile:
    json.dump(results, outfile, indent=4)


pprint(results)
print(f'\nLength:\n  Results: {len(results)}\n  CAP: {len(cap)}\n  GPO: {gpo_length}')
print('\nLeftover GPO:')
pprint(gpo_full)
print('\nLeftover CAP:')
pprint(cap_left_over)

OUTPUT
I've removed the pprint(results) from the output, see the JSON further down

Length:
  Results: 5
  CAP: 6
  GPO: 7

Leftover GPO:
[{'Date': 'April,2001',
  'Specific_Date': 'NaN ',
  'hearing_sub_type': 'Oversight',
  'topic': 'people'},
 {'Date': 'June,2000',
  'Specific_Date': 'June 6,2000',
  'hearing_sub_type': 'Oversight',
  'topic': 'depressed'}]

Leftover CAP:
[{'Chamber': '2',
  'Date': 'June,2000',
  'Specific_Date': 'June 6,2000',
  'id': '79847',
  'majortopic': '4',
  'topic': 'emotion'},
 {'Chamber': '1',
  'Date': 'May,2001',
  'Specific_Date': 'NaN',
  'id': '79848',
  'majortopic': '13',
  'topic': 'NaN'}]

path/to/file/gpo_full.csv

Date,hearing_sub_type,topic,Specific_Date
"January,1997",Oversight,weather,"January 12,1997"
"June,2000",General,life,"June 5,2000"
"January,1997",General,forest,"January 1,1997"
"April,2001",Oversight,people,NaN 
"June,2000",Oversight,depressed,"June 6,2000"
"January,1997",General,weather,"January 1,1997"
"June,2000",Oversight,depressed,"June 5,2000"

path/to/file/CAP_cols.csv

majortopic,id,Chamber,topic,Date,Specific_Date
21,79846,1,many forest,"January,1997","January 1,1997"
4,79847,2,emotion,"June,2000","June 6,2000"
13,79848,1,NaN,"May,2001","NaN"
7,79849,2,good life,"June,2000","June 5,2000"
21,79850,1,good weather,"January,1997","January 1,1997"
25,79851,1,rain & cloudy,"January,1997","January 12,1997"
6,79852,2,sad & depressed,"June,2000","June 5,2000"

path/to/file/combined_json.json

[
    {
        "Date": "January,1997",
        "Specific_Date": "January 1,1997",
        "Topic": {
            "GPO": "forest",
            "CAP": "many forest"
        },
        "GPO": {
            "hearing_sub_type": "General"
        },
        "CAP": {
            "majortopic": "21",
            "id": "79846",
            "Chamber": "1"
        }
    },
    {
        "Date": "June,2000",
        "Specific_Date": "June 5,2000",
        "Topic": {
            "GPO": "life",
            "CAP": "good life"
        },
        "GPO": {
            "hearing_sub_type": "General"
        },
        "CAP": {
            "majortopic": "7",
            "id": "79849",
            "Chamber": "2"
        }
    },
    {
        "Date": "January,1997",
        "Specific_Date": "January 1,1997",
        "Topic": {
            "GPO": "weather",
            "CAP": "good weather"
        },
        "GPO": {
            "hearing_sub_type": "General"
        },
        "CAP": {
            "majortopic": "21",
            "id": "79850",
            "Chamber": "1"
        }
    },
    {
        "Date": "January,1997",
        "Specific_Date": "January 12,1997",
        "Topic": {
            "GPO": "weather",
            "CAP": "rain & cloudy"
        },
        "GPO": {
            "hearing_sub_type": "Oversight"
        },
        "CAP": {
            "majortopic": "25",
            "id": "79851",
            "Chamber": "1"
        }
    },
    {
        "Date": "June,2000",
        "Specific_Date": "June 5,2000",
        "Topic": {
            "GPO": "depressed",
            "CAP": "sad & depressed"
        },
        "GPO": {
            "hearing_sub_type": "Oversight"
        },
        "CAP": {
            "majortopic": "6",
            "id": "79852",
            "Chamber": "2"
        }
    }
]