Search code examples
pythonfilebeatpython-loggingstructlog

Capture all stdout/stderr within structlog to generate JSON logs


I am currently trying to get away from print()'s and start with centralized log collection using the ELK stack and the structlog module to generate structured json log lines. This is working perfectly fine for modules that I wrote myself using a loggingHelper module that I can import and use with

logger = Logger()

in other modules and scripts. This is the loggingHelper module class:

class Logger:
    """
    Wrapper Class to import within other modules and scripts
    All the config and log binding (script
    """
    def __init__(self):
        self.__log = None
        logging.basicConfig(level=logging.DEBUG, format='%(message)s')
        structlog.configure(logger_factory=LoggerFactory(),
                            processors=[structlog.stdlib.add_log_level,
                            structlog.processors.TimeStamper(fmt="iso"),
                            structlog.processors.JSONRenderer()])
        logger = structlog.get_logger()
        main_script = os.path.basename(sys.argv[0]) if sys.argv[0] else None
        frame = inspect.stack()[1]
        log_invocation = os.path.basename(frame[0].f_code.co_filename)
        user = getpass.getuser()

        """
        Who executed the __main__, what was the executed __main__ file, 
        where did the log event happen?
        """
        self.__log = logger.bind(executedScript = main_script,
                                 logBirth = log_invocation,
                                 executingUser = user)

    def info(self, msg, **kwargs):
        self.__log.info(msg, **kwargs)

    def debug(self, msg, **kwargs):
        self.__log.debug(msg, **kwargs)

    def error(self, msg, **kwargs):
        self.__log.error(msg, **kwargs)

    def warn(self, msg, **kwargs):
        self.__log.warning(msg, **kwargs)

This produces nicely formatted output (one JSON per line) that filebeat is able to read and forward to Elasticsearch. However, third-party librariers completely crush the well-formatted logs.

{"executingUser": "xyz", "logBirth": "efood.py", "executedScript": "logAlot.py", "context": "SELECT displayname FROM point_of_sale WHERE name = '123'", "level": "debug", "timestamp": "2019-03-15T12:52:42.792398Z", "message": "querying local"}
{"executingUser": "xyz", "logBirth": "efood.py", "executedScript": "logAlot.py", "level": "debug", "timestamp": "2019-03-15T12:52:42.807922Z", "message": "query successful: got 0 rows"}
building service object
auth version used is: v4
Traceback (most recent call last):
  File "logAlot.py", line 26, in <module>
    ef.EfoodDataControllerMerchantCenter().get_displayname(123)
  File "/home/xyz/src/toolkit/commons/connectors/efood.py", line 1126, in get_displayname
    return efc.select_from_local(q)['displayname'].values[0]
IndexError: index 0 is out of bounds for axis 0 with size 0

As you can see both info level and error level messages from the third party librara (googleapiclient) are printed without going through the logging processors.

What would be the best way (and most pythonic) of capturing and formatting everything that happens within execution of one script using the loggingHelper module I wrote? Is this even best practice?

Edit: Currently the logger indeed writes to stdout itself, which is then redirected to a file in crontab using >> and 2>&1. This looks like bad practice to me if I want to redirect everything that is written to stdout/stderr by third-party library logging, because this would lead to a loop, correct? Thus my goal is not redirecting, but rather capturing everything in my logging processor. Changed the title accordingly.

Also, here is a rough overview of what I am trying to achieve. I am very open to general criticism and suggestions that diviate from this. enter image description here


Solution

  • Configuring the logging module

    As you already figured out, structlog requires configuration of the logging functionality already existing in python.

    http://www.structlog.org/en/stable/standard-library.html

    logging.basicConfig supports options for stream and filename here

    https://docs.python.org/3/library/logging.html#logging.basicConfig.

    Either you specify a filename which the logger will create a handle to and direct all its output. Depending on how you are set up maybe this would be the file you normally redirect to

    import logging
    
    logging.basicConfig(level=logging.DEBUG, format='%(message)s', filename='output.txt')
    

    Or you can pass a StringIO object to the builder, which you can later read from and then redirect to your wished output destination

    import logging
    import io
    
    stream = io.StringIO()
    
    logging.basicConfig(level=logging.DEBUG, format='%(message)s', stream=stream)
    

    More about StringIO can be read here

    https://docs.python.org/3/library/io.html#io.TextIOBase

    As @bruno pointed out in his answer, do not do this in an __init__ as you may end up calling this piece of code several times in the same process.