Search code examples
rcurlsftplibcurl

Cutomizing handles/requests for the CURL package


I am trying to interact with a SFTP server from inside R. The CURL package came highly recommended. Not RCURL but CURL.

One of the things I am trying to do is get a list of directories/files at an address. I have the code working so far:

# create a new curl handle 
han <- new_handle()

# set options for SFTP
handle_setopt(han, verbose = TRUE)

# execute the request 
result <- curl_fetch_memory(url = "{SFTP URL here}",handle = han)

# get the response data 
response <- rawToChar(result$content)

The SFTP server at this URL does not have passwords. The remote has SFTP protocol version 3

The above code almost does what I am looking for, curl_fetch_memory(url = "{SFTP URL here}",handle = han) produces a list with among other things result$content that has the the said list of directories/files but with everything as in file names, dates and permission data all in the chars.

  1. How to customize the request/handle to get the list of files in a cleaner manner? Just a plain list of files akin to ls on SFTP servers? If this is at all possible. (copies of result and response attached below.)

  2. If customizing the requests is not possible, is there a way to customize CURL objects to make them a bit more human readable?

Output for response

$url
[1] "sftp://data.cyverse.org/shared/"

$status_code
[1] 0

$type
[1] NA

$headers
raw(0)

$modified
[1] "2020-02-20 16:05:33 CST"

$times
     redirect    namelookup       connect   pretransfer starttransfer 
     0.000000      0.000029      0.000000      0.230600      0.000000 
        total 
     0.230608 

$content
  [1] 64 72 77 78 72 2d 78 72 2d 78 20 20 20 20 31 20 30 20 20 20 20 20 20 20 20
 [26] 30 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 30 20 44 65 63 20 33 31 20
 [51] 20 31 39 36 39 20 2e 0a 64 72 77 78 72 2d 78 72 2d 78 20 20 20 20 31 20 30
 [76] 20 20 20 20 20 20 20 20 30 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 30
[101] 20 44 65 63 20 33 31 20 20 31 39 36 39 20 2e 2e 0a 64 72 77 78 72 2d 78 72
[126] 2d 78 20 20 20 20 31 20 30 20 20 20 20 20 20 20 20 30 20 20 20 20 20 20 20
[151] 20 20 20 20 20 20 20 20 30 20 46 65 62 20 32 30 20 20 32 30 32 30 20 61 6c
[176] 69 67 6e 6d 65 6e 74 73 5f 61 6e 64 5f 74 72 65 65 73 0a 64 72 77 78 72 2d
[201] 78 72 2d 78 20 20 20 20 31 20 30 20 20 20 20 20 20 20 20 30 20 20 20 20 20
[226] 20 20 20 20 20 20 20 20 20 20 30 20 46 65 62 20 32 30 20 20 32 30 32 30 20
[251] 67 65 6e 65 5f 66 61 6d 69 6c 79 5f 65 76 6f 6c 75 74 69 6f 6e 0a 64 72 77
[276] 78 72 2d 78 72 2d 78 20 20 20 20 31 20 30 20 20 20 20 20 20 20 20 30 20 20
[301] 20 20 20 20 20 20 20 20 20 20 20 20 20 30 20 46 65 62 20 32 30 20 20 32 30
[326] 32 30 20 6d 61 70 73 5f 73 63 72 69 70 74 73 0a 64 72 77 78 72 2d 78 72 2d
[351] 78 20 20 20 20 31 20 30 20 20 20 20 20 20 20 20 30 20 20 20 20 20 20 20 20
[376] 20 20 20 20 20 20 20 30 20 46 65 62 20 32 30 20 20 32 30 32 30 20 74 72 61
[401] 6e 73 63 72 69 70 74 5f 61 73 73 65 6d 62 6c 69 65 73 0a 64 72 77 78 72 2d
[426] 78 72 2d 78 20 20 20 20 31 20 30 20 20 20 20 20 20 20 20 30 20 20 20 20 20
[451] 20 20 20 20 20 20 20 20 20 20 30 20 46 65 62 20 32 30 20 20 32 30 32 30 20
[476] 77 68 6f 6c 65 5f 67 65 6e 6f 6d 65 5f 64 75 70 6c 69 63 61 74 69 6f 6e 73
[501] 0a 2d 72 77 2d 72 2d 2d 72 2d 2d 20 20 20 20 31 20 30 20 20 20 20 20 20 20
[526] 20 30 20 20 20 20 20 20 20 20 20 20 20 20 20 36 36 39 20 4f 63 74 20 31 32
[551] 20 20 32 30 31 39 20 67 65 6e 65 5f 66 61 6d 69 6c 69 65 73 5f 6f 72 74 68
[576] 6f 66 69 6e 64 65 72 2e 74 78 74 0a 2d 72 77 2d 72 2d 2d 72 2d 2d 20 20 20
[601] 20 31 20 30 20 20 20 20 20 20 20 20 30 20 20 20 20 20 20 20 20 20 20 20 20
[626] 31 32 37 33 20 4f 63 74 20 31 32 20 20 32 30 31 39 20 72 65 61 64 6d 65 2e
[651] 74 78 74 0a

output for result$content

'drwxr-xr-x    1 0        0               0 Dec 31  1969 .\ndrwxr-xr-x    1 0        0               0 Dec 31  1969 ..\ndrwxr-xr-x    1 0        0               0 Nov 7  2020 curated\n'

Solution

  • You can set CURLOPT_DIRLISTONLY to only list names. Though you can also parse default response as a regular tabular text, i.e. with read.table(), or readr::read_table(). Options for curl package are general libcurl options from upstream, so libcurl documentation can be used as a reference - https://curl.se/libcurl/c/easy_setopt_options.html

    Using Rebex demo server as an example:

    library(curl)
    #> Using libcurl 7.84.0 with Schannel
    # https://test.rebex.net/
    SFTP_DEMO <- "sftp://demo:password@test.rebex.net:22"
    han <- new_handle()
    
    # list all libcurl options that include "list"
    curl_options("list")
    #>            cookielist           dirlistonly proxy_ssl_cipher_list 
    #>                 10135                    48                 10259 
    #>       ssl_cipher_list 
    #>                 10083
    # set dirlistonly
    handle_setopt(han, dirlistonly = TRUE)
    
    # dirlistonly request: 
    file_list <- curl_fetch_memory(url = SFTP_DEMO, handle = han)[["content"]] |> rawToChar()
    
    cat(file_list)
    #> .
    #> ..
    #> pub
    #> readme.txt
    read.table(text = file_list)
    #>           V1
    #> 1          .
    #> 2         ..
    #> 3        pub
    #> 4 readme.txt
    strsplit(file_list, "\n") |> unlist()
    #> [1] "."          ".."         "pub"        "readme.txt"
    
    # you can do the same with detailed file list:
    handle_setopt(han, dirlistonly = FALSE)
    curl_fetch_memory(url = SFTP_DEMO,
                      handle = han)[["content"]] |>
      rawToChar() |>
      read.table(text = _)
    #>           V1 V2   V3    V4  V5  V6 V7    V8         V9
    #> 1 drwx------  2 demo users   0 Mar 31 17:52          .
    #> 2 drwx------  2 demo users   0 Mar 31 17:52         ..
    #> 3 drwx------  2 demo users   0 Mar 31 17:52        pub
    #> 4 -rw-------  1 demo users 405 Dec 17  2021 readme.txt
    

    Created on 2023-05-12 with reprex v2.0.2