Here is an Html table in this website http://people.dbmi.columbia.edu/~friedma/Projects/DiseaseSymptomKB/
There is a column named "Disease Name" and the following column named "Symptoms". I want JSON formatted data from that table of that website like this manner and also remove the "UMLS:C00080" thing from the strings.
data = {
{
disease_name:'name',
symptoms: [symptoms ]
}
}
Is there any way to do it with python?
With BS4
import requests
from bs4 import BeautifulSoup
r = requests.get(
"http://people.dbmi.columbia.edu/~friedma/Projects/DiseaseSymptomKB/index.html")
soup = BeautifulSoup(r.text, 'html.parser')
for item in soup.findAll("p", {'class': 'MsoNormal'}):
item = item.get_text(strip=True)
if item.startswith("UMLS"):
print(item)