Search code examples
pythonlangchainconfluencedocument-loader

langchain.document_loaders.ConfluenceLoader.load giving AttributeError: 'str' object has no attribute 'get' while reading all documents from space


When I try sample code given here:

from langchain.document_loaders import ConfluenceLoader

loader = ConfluenceLoader(
    url="<my confluence link>", username="<my user name>", 
    api_key="<my token>"
)
documents = loader.load(space_key="<my space>", include_attachments=True, limit=1, max_pages=1)

I get an error:

AttributeError: 'str' object has no attribute 'get'

Here is the last part of the stack:

    554     """
    555     Get all pages from space
    556 
   (...)
    568     :return:
    569     """
    570     return self.get_all_pages_from_space_raw(
    571         space=space, start=start, limit=limit, status=status, expand=expand, content_type=content_type
--> 572     ).get("results")

Any ideas? I see an issue here but it is still open.

I have now also opened bug specifically for this issue.

Here is the summary of the fixes required in the original code:

  1. Do not suffix the URL with /wiki/home
  2. suffix the user name with @ your domain name
  3. use ID of the space as in the URL and not its display name

then it works. The error handling is poor to point to these issues otherwise.


Solution

  • ConfluenceLoader uses
    atlassian-python-apigithub source and document reference. confluence.py expects a successful response from Confluence. examples

    1. verify that the token is still valid - API tokens
    2. On the browser if your page is - https://simpleappdesigner.atlassian.net/wiki/spaces/~61dc5d78e67ea2006b1efbc0/pages/65676/Debug+the+python+issue, then confluence_link='https://simpleappdesigner.atlassian.net' and space_key="~61dc5d78e67ea2006b1efbc0"
    3. so loader is
    loader = ConfluenceLoader(
        url=confluence_link, username="[email protected]", 
        api_key=api_key
    )
    

    and documents as: documents = loader.load(space_key=space_key, include_attachments=True, limit=5, max_pages=5)

    with the above changes, i was able to run the following code:

    confluence_link='https://simpleappdesigner.atlassian.net'
    
    space_key="~61dc5d78e67ea2006b1efbc0"
    
    loader = ConfluenceLoader(
        url=confluence_link, username="[email protected]", 
        api_key=api_key
    )
    documents = loader.load(space_key=space_key, include_attachments=True, limit=5, max_pages=5)
    

    One can access through postman, while debugging, following is the way(watch the api url in the below screen snap :) ): enter image description here

    Hope this helps.Will be happy to help further or in case have questions.