Search code examples
pythonencodingfactor-lang

What's the effect of my mobile network on encoding?


I've a smartphone. On this smartphone, I've a mobile hotspot, essentially a portable WiFi network that pipes my phone's internet access to my laptop.

On my laptop, I've Python 3 and the requests library. Here's using Python and requests to get google.com, with my phone's hotspot. (result is exactly the same using "real wifi".)

>>> x = requests.get("http://google.com")
>>> x.apparent_encoding; x[:100]
'ISO-8859-2'
'<!doctype html><html itemscope="" itemtype="http://schema.org/WebPage" lang="en"><head><meta content'

Good! Everything is going as planned.

Also on my laptop, I've Factor, and it has an easy-to-use wgetter in the standard library. Here's http-get working on a "normal" WiFi network.

IN: scratchpad "http://google.com" http-get nip

--- Data stack:
"<!doctype html><html itemscope=\"\" itemtype=\"http://schema.org..."

Success!

Well, no. http-get on my phone's hotspot:

IN: scratchpad "http://google.com" http-get nip

--- Data stack:
"\x1f\b\0\0\0\0\0\0\x03Å<ëzÛ¶ÿÏSÐH+K+\"u\x17eÚ&iâÓ¤Ik§i7Íú\x03IHbÄIʲ#ë]öQw\x06\0..."

Uh.
And it's not just Google. http-getting Stack Overflow, or any other website over my phone's network gives rather similar results.

Printing that string:

enter image description here

...

No? Ah, well, OK.


Factor is 100% UTF-8 by default. ISO-8859 should be translatable to UTF-8, and indeed, it is when not using my phone's internet.

I know mobile service providers have a reputation of injecting Bad Things into served content. But if the encoding's the same, and Python treats them the same, and Python says they have the same encoding... what's going on here?


Factor is HEAD. Python is 3.5. Laptop is Ubuntu 15.10, Android is 5.1.something, and probably most importantly, my mobile service provider is StraightTalk.

As the Python demonstration shows, I don't normally experience issues with page content.


Solution

  • https://github.com/factor/factor/issues/1589

    I didn't think to look at the headers.

    The answer?

    content-encoding: Accept-Encoding on normal WiFi.

    content-encoding: gzip on hotspot.

    Now how to ungzip with Factor is another question.