Search code examples
apache-sparkpysparkibm-clouddata-science-experiencedsx

how to log to the kernel-pyspark-*.log from a scheduled notebook?


In my notebook, I have setup a utility for logging so that I can debug DSX scheduled notebooks:

# utility method for logging
log4jLogger = sc._jvm.org.apache.log4j
LOGGER = log4jLogger.LogManager.getLogger("CloudantRecommender")

def info(*args):

    # sends output to notebook
    print(args)

    # sends output to kernel log file
    LOGGER.info(args)

Using it like so:

info("some log output")

If I check the log files I can see my logout is getting written:

! grep 'CloudantRecommender' $HOME/logs/notebook/*pyspark* 

kernel-pyspark-20170105_164844.log:17/01/05 10:49:08 INFO CloudantRecommender: [Starting load from Cloudant: , 2017-01-05 10:49:08]
kernel-pyspark-20170105_164844.log:17/01/05 10:53:21 INFO CloudantRecommender: [Finished load from Cloudant: , 2017-01-05 10:53:21]

However, when the notebook runs as a scheduled job log output doesn't seem to be going to the kernel-pyspark-*.log file.

How can I write log output in DSX scheduled notebooks for debugging purposes?


Solution

  • The logging code actually works ok. The problem was that the schedule was pointing to an older version of the notebook that did not have any logging statements in it!