I am trying to create a sheet where I want to scrape values from www.tapology.com of individual fighters and fill columns created like fighter's name, his record, where he is from, etc.
I would like to scrape some values of this fighter so I started with name:
https://www.tapology.com/fightcenter/fighters/119825-dominik-mazur
I put together something like this:
=IMPORTXML("https://www.tapology.com/fightcenter/fighters/119825-dominik-mazur", "//div[@id='stats']/ul[1]/li[1]/span[1]")
The XPath should lead to the name of the fighter which is Dominik Mazur, but for some reason it returns a different fighters name at random.
I guess it gets all the names that have the same structure, but I have no idea how to single out the value that I'm looking for.
Generally my goal is to create a google sheet where I paste a link to any fighter's tapology and it fills my sheet with the information I want instead of me manually typing everything.
Unfortunately I am not a programmer or very technical person so I would really appreciate your help. If anything is unclear please let me know.
Try this
=arrayformula(split(transpose(split(scraping("yourURLhere"),"♥")),"♦"))
with custom function
function scraping(url){
var source = UrlFetchApp.fetch(url).getContentText().replace(/(\r\n|\n|\r|\t| )/gm,"")
var data = source.match(/<div class='details details_two_columns' id='stats'>[\s\S\w]+?<\/div>/g)
return (data[0].replace(/<li>/g,'♥').replace(/<span>/g,'♦').replace(/<\/li>|<strong>|<\/strong>|<\/span>/g,'').replace(/(<([^>]+)>)/g,''))
}
function infoTapology(url){
var source = UrlFetchApp.fetch(url).getContentText().replace(/(\r\n|\n|\r|\t| )/gm,"")
var data = source.match(/<div class='details details_two_columns' id='stats'>[\s\S\w]+?<\/div>/g)
var infos = data[0].replace(/<li>/g,'♥').replace(/(<([^>]+)>)/g,'').replace(/'/g,'’').replace(/"/g,'”').split('♥')
var result = []
infos.forEach(function(info){
var x = info.split(':')[0]
if (info) {x=info.replace(x+':','')}
result.push(x)
})
return([result])
}
https://docs.google.com/spreadsheets/d/1se1lWH4g4OSvAW2NldjGiTDS5XB19_5QwSBpMqVRy84/copy