I am currently trying to get away from print()'s and start with centralized log collection using the ELK stack and the structlog module to generate structured json log lines. This is working perfectly fine for modules that I wrote myself using a loggingHelper module that I can import and use with
logger = Logger()
in other modules and scripts. This is the loggingHelper module class:
class Logger:
"""
Wrapper Class to import within other modules and scripts
All the config and log binding (script
"""
def __init__(self):
self.__log = None
logging.basicConfig(level=logging.DEBUG, format='%(message)s')
structlog.configure(logger_factory=LoggerFactory(),
processors=[structlog.stdlib.add_log_level,
structlog.processors.TimeStamper(fmt="iso"),
structlog.processors.JSONRenderer()])
logger = structlog.get_logger()
main_script = os.path.basename(sys.argv[0]) if sys.argv[0] else None
frame = inspect.stack()[1]
log_invocation = os.path.basename(frame[0].f_code.co_filename)
user = getpass.getuser()
"""
Who executed the __main__, what was the executed __main__ file,
where did the log event happen?
"""
self.__log = logger.bind(executedScript = main_script,
logBirth = log_invocation,
executingUser = user)
def info(self, msg, **kwargs):
self.__log.info(msg, **kwargs)
def debug(self, msg, **kwargs):
self.__log.debug(msg, **kwargs)
def error(self, msg, **kwargs):
self.__log.error(msg, **kwargs)
def warn(self, msg, **kwargs):
self.__log.warning(msg, **kwargs)
This produces nicely formatted output (one JSON per line) that filebeat is able to read and forward to Elasticsearch. However, third-party librariers completely crush the well-formatted logs.
{"executingUser": "xyz", "logBirth": "efood.py", "executedScript": "logAlot.py", "context": "SELECT displayname FROM point_of_sale WHERE name = '123'", "level": "debug", "timestamp": "2019-03-15T12:52:42.792398Z", "message": "querying local"}
{"executingUser": "xyz", "logBirth": "efood.py", "executedScript": "logAlot.py", "level": "debug", "timestamp": "2019-03-15T12:52:42.807922Z", "message": "query successful: got 0 rows"}
building service object
auth version used is: v4
Traceback (most recent call last):
File "logAlot.py", line 26, in <module>
ef.EfoodDataControllerMerchantCenter().get_displayname(123)
File "/home/xyz/src/toolkit/commons/connectors/efood.py", line 1126, in get_displayname
return efc.select_from_local(q)['displayname'].values[0]
IndexError: index 0 is out of bounds for axis 0 with size 0
As you can see both info level and error level messages from the third party librara (googleapiclient) are printed without going through the logging processors.
What would be the best way (and most pythonic) of capturing and formatting everything that happens within execution of one script using the loggingHelper module I wrote? Is this even best practice?
Edit: Currently the logger indeed writes to stdout itself, which is then redirected to a file in crontab using >> and 2>&1. This looks like bad practice to me if I want to redirect everything that is written to stdout/stderr by third-party library logging, because this would lead to a loop, correct? Thus my goal is not redirecting, but rather capturing everything in my logging processor. Changed the title accordingly.
Also, here is a rough overview of what I am trying to achieve. I am very open to general criticism and suggestions that diviate from this.
logging
moduleAs you already figured out,
structlog
requires configuration of the logging functionality already existing in python.
logging.basicConfig
supports options for stream
and filename
here
https://docs.python.org/3/library/logging.html#logging.basicConfig.
Either you specify a filename which the logger will create a handle to and direct all its output. Depending on how you are set up maybe this would be the file you normally redirect to
import logging
logging.basicConfig(level=logging.DEBUG, format='%(message)s', filename='output.txt')
Or you can pass a StringIO object to the builder, which you can later read from and then redirect to your wished output destination
import logging
import io
stream = io.StringIO()
logging.basicConfig(level=logging.DEBUG, format='%(message)s', stream=stream)
More about StringIO can be read here
As @bruno pointed out in his answer, do not do this in an __init__
as you may end up calling this piece of code several times in the same process.