Search code examples
pythonslackslack-api

Pulling historical channel messages python


I am attempting to create a small dataset by pulling messages/responses from a slack channel I am a part of. I would like to use python to pull the data from the channel however I am having trouble figuring out my api key. I have created an app on slack but I am not sure how to find my api key. I see my client secret, signing secret, and verification token but can't find my api key

Here is a basic example of what I believe I am trying to accomplish:

import slack
sc = slack.SlackClient("api key")
sc.api_call(
  "channels.history",
  channel="C0XXXXXX"
)

I am willing to just download the data manually if that is possible as well. Any help is greatly appreciated.


Solution

  • messages

    See below for is an example code on how to pull messages from a channel in Python.

    • It uses the official Python Slack library and calls conversations_history with paging. It will therefore work with any type of channel and can fetch large amounts of messages if needed.
    • The result will be written to a file as JSON array.
    • You can specify channel and max message to be retrieved

    threads

    Note that the conversations.history endpoint will not return thread messages. Those have to be retrieved additionaly with one call to conversations.replies for every thread you want to retrieve messages for.

    Threads can be identified in the messages for each channel by checking for the threads_ts property in the message. If it exists there is a thread attached to it. See this page for more details on how threads work.

    IDs

    This script will not replace IDs with names though. If you need that here are some pointers how to implement it:

    • You need to replace IDs for users, channels, bots, usergroups (if on a paid plan)
    • You can fetch the lists for users, channels and usergroups from the API with users_list, conversations_list and usergroups_list respectively, bots need to be fetched one by one with bots_info (if needed)
    • IDs occur in many places in messages:
      • user top level property
      • bot_id top level property
      • as link in any property that allows text, e.g. <@U12345678> for users or <#C1234567> for channels. Those can occur in the top level text property, but also in attachments and blocks.

    Example code

    import os
    import slack
    import json
    from time import sleep
    
    CHANNEL = "C12345678"
    MESSAGES_PER_PAGE = 200
    MAX_MESSAGES = 1000
    
    # init web client
    client = slack.WebClient(token=os.environ['SLACK_TOKEN'])
    
    # get first page
    page = 1
    print("Retrieving page {}".format(page))
    response = client.conversations_history(
        channel=CHANNEL,
        limit=MESSAGES_PER_PAGE,
    )
    assert response["ok"]
    messages_all = response['messages']
    
    # get additional pages if below max message and if they are any
    while len(messages_all) + MESSAGES_PER_PAGE <= MAX_MESSAGES and response['has_more']:
        page += 1
        print("Retrieving page {}".format(page))
        sleep(1)   # need to wait 1 sec before next call due to rate limits
        response = client.conversations_history(
            channel=CHANNEL,
            limit=MESSAGES_PER_PAGE,
            cursor=response['response_metadata']['next_cursor']
        )
        assert response["ok"]
        messages = response['messages']
        messages_all = messages_all + messages
    
    print(
        "Fetched a total of {} messages from channel {}".format(
            len(messages_all),
            CHANNEL
    ))
    
    # write the result to a file
    with open('messages.json', 'w', encoding='utf-8') as f:
      json.dump(
          messages_all, 
          f, 
          sort_keys=True, 
          indent=4, 
          ensure_ascii=False
        )