From PEP263:
To define a source code encoding, a magic comment must be placed into the source files either as first or second line in the file, such as:
# coding=<encoding name>
or (using formats recognized by popular editors):
#!/usr/bin/python
# -*- coding: <encoding name> -*-
What if there are cases where the licensing information comes at the top-most lines, e.g. from https://github.com/google/seq2seq/blob/master/seq2seq/training/utils.py:
# Copyright 2017 Google Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# -*- coding: utf-8 -*-
"""Miscellaneous training utility functions.
"""
Would the encoding definition be still "magically" accepted by the Python interpreter? It'll be great if the answer explains why must it be in the 1st two lines and pointer to the interpreter code would be awesome!
Yes, in Python 2, where that coding mark is required for UTF-8 encodings, if it's beyond the second line, and there are any non-ASCII characters in the file, you will raise an error like this:
File "encoded.py", line 5
SyntaxError: Non-ASCII character '\xe1' in file encoded.py on line 5, but
no encoding declared; see http://python.org/dev/peps/pep-0263/ for details
If the file contains only ASCII characters, it will still work, even if the UTF-8 coding mark is later than line 2. ASCII is a subset of UTF-8, and basically, the late coding directive is being ignored. (This seems to be the case for the specific utils.py
you referenced.)
Many parsers and other file processors require such magic commands to be at the start of file because they have to be scanned for and taken into account in order to properly interpret the files. Put them later on, and it'd be inefficient, requiring scanning the entire file to find a few "magic" special cases.
You will get some leeway in Python 3, which assumes a UTF-8 encoding. Though if your file is encoded some other way, you still would want to include it.