Search code examples
timeoutconfluenceconfluence-rest-api

How can I use `rest/api/content` to download all the 800k pages of my Confluence wiki without timing out?


I want to download all the 800k pages of my Confluence wiki.

I'd like to use:

curl -u wikiusername:wikipassword https://wiki.hostname.com/rest/api/content?start=1`

and simply increase start from 1 to 800000.

However, the response time increases as start increases, and from ~150,000 begins to timeout:

start response time (seconds)
1 0.4
1,000 2.5
10,000 9
50,000 112
100,000 286
200,000 timeout

How can I use rest/api/content to download all the 800k pages of my Confluence wiki without timing out?


Solution

  • Option 1:

    Use the limit parameter as in developer.atlassian.com/server/confluence/…Elazaron

    Option 2: Download space by space, as this Python 2 script to export Confluence spaces and pages recursively via its API did: https://github.com/siemens/confluence-dumper (mirror).

    I confirm option 2 works.