The goal is to continuously read from stdin
and enforce utf8
in both Python2 and Python3.
I've tried solutions from:
I've tried:
#!/usr/bin/env python
from __future__ import print_function, unicode_literals
import io
import sys
# Supports Python2 read from stdin and Python3 read from stdin.buffer
# https://stackoverflow.com/a/23932488/610569
user_input = getattr(sys.stdin, 'buffer', sys.stdin)
# Enforcing utf-8 in Python3
# https://stackoverflow.com/a/16549381/610569
with io.TextIOWrapper(user_input, encoding='utf-8') as fin:
for line in fin:
# Reads the input line by line
# and do something, for e.g. just print line.
print(line)
The code works in Python3 but in Python2, the TextIOWrapper doesn't have a read function and it throws:
Traceback (most recent call last):
File "testfin.py", line 12, in <module>
with io.TextIOWrapper(user_input, encoding='utf-8') as fin:
AttributeError: 'file' object has no attribute 'readable'
That's because in Python the user_input
, i.e. sys.stdin.buffer
is an
_io.BufferedReader
object and its attribute has readable
:
<class '_io.BufferedReader'>
['__class__', '__del__', '__delattr__', '__dict__', '__dir__', '__doc__', '__enter__', '__eq__', '__exit__', '__format__', '__ge__', '__getattribute__', '__getstate__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__iter__', '__le__', '__lt__', '__ne__', '__new__', '__next__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '_checkClosed', '_checkReadable', '_checkSeekable', '_checkWritable', '_dealloc_warn', '_finalizing', 'close', 'closed', 'detach', 'fileno', 'flush', 'isatty', 'mode', 'name', 'peek', 'raw', 'read', 'read1', 'readable', 'readinto', 'readinto1', 'readline', 'readlines', 'seek', 'seekable', 'tell', 'truncate', 'writable', 'write', 'writelines']
While in Python2 the user_input
is a file object and its attributes don't have readable
:
<type 'file'>
['__class__', '__delattr__', '__doc__', '__enter__', '__exit__', '__format__', '__getattribute__', '__hash__', '__init__', '__iter__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'close', 'closed', 'encoding', 'errors', 'fileno', 'flush', 'isatty', 'mode', 'name', 'newlines', 'next', 'read', 'readinto', 'readline', 'readlines', 'seek', 'softspace', 'tell', 'truncate', 'write', 'writelines', 'xreadlines']
If you don't need a fully-fledged io.TextIOWrapper
, but just a decoded stream for reading, you can use codecs.getreader()
to create a decoding wrapper:
reader = codecs.getreader('utf8')(user_input)
for line in reader:
# do whatever you need...
print(line)
codecs.getreader('utf8')
creates a factory for a codecs.StreamReader
, which is then instantiated using the original stream.
I'm not sure the StreamReader
supports the with
context, but this might not be strictly necessary (there's no need to close STDIN after reading, I guess...).
I've successfully used this solution in situations where the underlying stream only offers a very limited interface.
From the comments, it became clear that you actually need an io.TextIOWrapper
to have proper line buffering etc. in interactive mode; codecs.StreamReader
only works for piped input and the like.
Using this answer, I was able to get interactive input work properly:
#!/usr/bin/env python
# coding: utf8
from __future__ import print_function, unicode_literals
import io
import sys
user_input = getattr(sys.stdin, 'buffer', sys.stdin)
with io.open(user_input.fileno(), encoding='utf8') as f:
for line in f:
# do whatever you need...
print(line)
This creates an io.TextIOWrapper
with enforced encoding from the binary STDIN buffer.