Search code examples
pythonibm-cloudwatson-assistant

Watson Assistant: How can I check all the URLs referenced still work?


I have a large skill which has URL references in the context variables and responses to the end user.

I would like to be able to check all these URLs and see if they still work. So that if one fails we can fix it as quickly as possible. Is there a way to do this?


Solution

  • The following code snippet will do what is mentioned above. You need to change SKILL_FILE_NAME_HERE with the downloaded json file of the Skill.

    It should work with dialog and action based skills.

    import re
    import requests
    from requests.exceptions import ConnectionError
    import pandas as pd
    from tqdm import tqdm
    
    file_name = 'SKILL_FILE_NAME_HERE'
    
    with open(file_name, 'r') as file:
        data = file.read()
    
    urls = re.findall('http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\(\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+', data)
    
    records = []
    print('Checking URLS')
    for url in tqdm(urls):
        try:
            response = requests.get(url)
            status_code = response.status_code
        except ConnectionError as e:
            status_code = 'Error'
    
        records.append({
            'url': url,
            'status': status_code
        })
    
    df = pd.DataFrame(records)
    
    df.to_csv(f'{file_name.replace(".json", ".csv")}', index=False)
    

    It does the following:

    • Loads the skill as a single string.
    • Uses a regex to extract URL references in the string to list.
    • Iterates through the list calling to the URL to get the status_code
    • If there is an error then it will set status_code to "Error"
    • Creates a record of the information.
    • When finished converts the records to a pandas dataframe.
    • Saves the dataframe to a CSV file of the same name as the downloaded skill.