Search code examples
pythonapachelogparser

python - parse a bot true a list of bots ( bot_list )


We have a apache_log parser. We tried several whay to parse the bot true a list of bots ( bot_list ). But without success. We tried comparing two lists, but the bot comes or is not a list.

What we want to achieve is that the bot first goes through a bot_list. So that only the bots coming through that are not in the bot_list.

log = apache_log(lines)

for r in log: 
    bot = r['bot']



bot_list = [ "Googlebot/2.1", 
             "AhrefsBot/5.0", 
             "bingbot/2.0", 
             "DotBot/1.1", 
             "MJ12bot/v1.4.5", 
             "SearchmetricsBot", 
             "YandexBot/3.0", 
             ]

It is working for one bot on this way.

bot = r['bot'].strip()
if not bot.startswith("Googlebot/2.1"):

This is so to say our filter, bot.startwith. But how can we achieve that the goes first through the bot_list?

Hope someone can bring us in the right direction?


Solution

  • If I understand your problem, you may want to check if bot is not in the bot_list. I would suggest to get the bot name from the logfile:

    bot_name = r["bot"].split(" ")[22]
    if bot_name not in bot_list:
    

    Let 22 be the position of the UserAgent in your logfile, which you might have already customized.

    If the position is not clear you can use a function:

    if not len(filter(lambda x: x in r["bot"], bot_list)):
    

    Which is the same as

    return_list = []
    for i in bot_list:
        if i in r["bot"]:
            return_list.append(i)
    return len(return_list)