Search code examples
cutf-8asciilibcurllibiconv

Compare HTML from libcurl with text from file


I'm using libcurl to connect to a website, and getting the HTML, I'm also using LibTidy to extract the text. My purpose is to verify if a sentence from a text file is inside the HTML.

Thanks to LibTidy I have all the text file as one char*. I'm using : char *strstr(const char *one, const char *two) for comparing the two strings. The first one is the string from libcurl and libTidy parsing, and the second one is a string from a text file.

When I'm using the function strstr(..) I have NULL as result. Using the debugger show my that the two string aren't 'encode' in the same way.

enter image description here

I tried to found where the problem was for the String resulting of the Internet connection. And I tried different sample of code to tried to fix it.

The code given by the libcurl website, give me the same problem, the char *memory isn't encoded well, and I can't compare it properly. https://curl.haxx.se/libcurl/c/getinmemory.html

I also tried the code here : https://stackoverflow.com/a/2329792/10160890, and the char *ptr have the same problem.

I expect to be able to compare the String from libcurl and the String from text file.


Solution

  • There is no need to convert. Any ASCII text is UTF-8 text, so you just search for it as-is using strstr. This is pretty much the whole point of UTF-8.