Watson Assistant: How can I check all the URLs referenced still work?

I have a large skill which has URL references in the context variables and responses to the end user.

I would like to be able to check all these URLs and see if they still work. So that if one fails we can fix it as quickly as possible. Is there a way to do this?

Solution

The following code snippet will do what is mentioned above. You need to change SKILL_FILE_NAME_HERE with the downloaded json file of the Skill.

It should work with dialog and action based skills.

import re
import requests
from requests.exceptions import ConnectionError
import pandas as pd
from tqdm import tqdm

file_name = 'SKILL_FILE_NAME_HERE'

with open(file_name, 'r') as file:
    data = file.read()

urls = re.findall('http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\(\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+', data)

records = []
print('Checking URLS')
for url in tqdm(urls):
    try:
        response = requests.get(url)
        status_code = response.status_code
    except ConnectionError as e:
        status_code = 'Error'

    records.append({
        'url': url,
        'status': status_code
    })

df = pd.DataFrame(records)

df.to_csv(f'{file_name.replace(".json", ".csv")}', index=False)

It does the following:

Loads the skill as a single string.
Uses a regex to extract URL references in the string to list.
Iterates through the list calling to the URL to get the status_code
If there is an error then it will set status_code to "Error"
Creates a record of the information.
When finished converts the records to a pandas dataframe.
Saves the dataframe to a CSV file of the same name as the downloaded skill.