I'm scraping a website that was written in Polish, meaning it contains characters such as ź and ę.
When I attempt to parse the html, either using the html package or even by splitting the string of the response body, I get output like this:
���~♦�♀�����r�▬֭��↔��q���y���<p��19��lFۯ☻→Z�7��
Im currently using
bodyBytes, err := ioutil.Readall(resp.body)
if err != nil {
//handle
}
bodyString := string(bodyBytes)
In order to get the string
How can I get the text in readable format?
Update:
Since the content encoding of the response was gzip, the code below worked for getting the response as a printable string
gReader, err := gzip.NewReader(resp.Body)
if err != nil {
return err
}
gBytes, err := ioutil.ReadAll(gReader)
if err != nil {
return err
}
gReader.Close()
bodyStr := string(gBytes)