Search code examples
pythonjsonbeautifulsoupconverters

Make json file from html table


Here is an Html table in this website http://people.dbmi.columbia.edu/~friedma/Projects/DiseaseSymptomKB/

There is a column named "Disease Name" and the following column named "Symptoms". I want JSON formatted data from that table of that website like this manner and also remove the "UMLS:C00080" thing from the strings.

data = {
   {
    disease_name:'name',
    symptoms: [symptoms ]
   }
}

Is there any way to do it with python?


Solution

  • With BS4

    import requests
    from bs4 import BeautifulSoup
    
    r = requests.get(
        "http://people.dbmi.columbia.edu/~friedma/Projects/DiseaseSymptomKB/index.html")
    
    
    soup = BeautifulSoup(r.text, 'html.parser')
    
    for item in soup.findAll("p", {'class': 'MsoNormal'}):
        item = item.get_text(strip=True)
        if item.startswith("UMLS"):
            print(item)