Search code examples
pythonrpython-requestshttr

httr translating working python connection to r


For a project i am trying to access data produced from an online model/algorithm. The owners provide the python code to access this data. This is as follows:

import requests
import random
import os
import pandas as pd
from rdkit import Chem

upload_url=r'site name'

def predict_pka(smi):
    param={"Smiles" : ("tmg", smi)}
    headers={'token':'tokenstring'}
    response=requests.post(url=upload_url, files=param, headers=headers)
    jsonbool=int(response.headers['ifjson'])
    if jsonbool==1:
        res_json=response.json()
        if res_json['status'] == 200:
            pka_datas = res_json['gen_datas']
            return pka_datas
        else:
            raise RuntimeError("Error for prediction")
    else:
        raise RuntimeError("Error for prediction")
        
if __name__=="__main__":
    smi = "CCOP(=S)(OCC)OC1=NC(=C(C=C1Cl)Cl)Cl"
    data_pka = predict_pka(smi)
    print(data_pka)

I took out the actual url and the token, since i don't know if its responsible to share those. This code works from R studio and using python, i can get the data.

However i want to get the data using an R script, so i tried translating the code to R:

getPKA = function(){
  upload_url="site name"
  
  param = rjson::toJSON(list('Smiles' = c("tmg", "CCOP(=S)(OCC)OC1=NC(=C(C=C1Cl)Cl)Cl")))

  response = httr::POST(url = upload_url, 
                        httr::add_headers(                          
                        'token' = 'tokenstring'
                        ),
                        encode = c("multipart", "form", "json", "raw"),
                        httr::content_type_json(),
                        body = param, 
                        httr::verbose()
  )

  return(response)
}

When i run the R code, i get the following output:

-> POST /modules/upload0/ HTTP/1.1
-> Host: host
-> User-Agent: libcurl/7.84.0 r-curl/5.0.0 httr/1.4.5
-> Accept-Encoding: deflate, gzip
-> Accept: application/json, text/xml, application/xml, */*
-> token: tokenstring
-> Content-Type: application/json
-> Content-Length: 56
-> 
>> {"Smiles":["tmg","CCOP(=S)(OCC)OC1=NC(=C(C=C1Cl)Cl)Cl"]}

<- HTTP/1.0 500 INTERNAL SERVER ERROR
<- Content-Type: text/html; charset=utf-8
<- X-XSS-Protection: 0
<- Connection: close
<- Server: Werkzeug/1.0.1 Python/3.6.12
<- Date: Sat, 13 May 2023 10:28:53 GMT
<- 

Once again i redacted the token and the host.

I got to this R code by reading up a bit on both the python requests package and the httr package, however i don't know much about API connections or web connections in general and i only need it for this data.

I think it might have to do with the param format. When i print param in the python code i get this: {'Smiles': ('tmg', 'CCOP(=S)(OCC)OC1=NC(=C(C=C1Cl)Cl)Cl')} while if i print it in the R code i get this: {"Smiles":["tmg","CCOP(=S)(OCC)OC1=NC(=C(C=C1Cl)Cl)Cl"]}. Normal brackets are used in the python version and square brackets are used in the R version.

I don't know if this is actually the problem or how to change this. I tried using different list types (vector, list) and i tried directly using the line {'Smiles': ('tmg', 'CCOP(=S)(OCC)OC1=NC(=C(C=C1Cl)Cl)Cl')} from the python code as a character string in the body, but i get the same error.

If i print the response itself (or the content using content(response)), it says: AttributeError: 'NoneType' object has no attribute 'filename' among the html file contents.

I do see a lot of questions on stackoverflow with similar questions, and i tried copying their code and molding it for my needs, but it does not really change anything.

thank you for your time!


Solution

  • That requests call POSTs a multipart-encoded file and request looks something like this:

    POST / HTTP/1.1
    Host: localhost:1234
    User-Agent: python-requests/2.28.2
    Accept-Encoding: gzip, deflate, br
    Accept: */*
    Connection: keep-alive
    token: tokenstring
    Content-Length: 176
    Content-Type: multipart/form-data; boundary=2202e29dea10e9ab00dcf55c67ed1817
    
    --2202e29dea10e9ab00dcf55c67ed1817
    Content-Disposition: form-data; name="Smiles"; filename="tmg"
    
    CCOP(=S)(OCC)OC1=NC(=C(C=C1Cl)Cl)Cl
    --2202e29dea10e9ab00dcf55c67ed1817--
    

    With httr2 / curl, this should be close enough:

    library(httr2)
    
    getPKA <- function(smi){
      upload_url <- "site name"
      
      # Seems that we need to read actual file from disk to include filename="tmg" 
      tmg_path <- file.path(tempdir(),"tmg")
      write(smi,tmg_path) 
      tmg_form_data <- curl::form_file(tmg_path, type = "text/plain")
      
      request(upload_url) %>% 
        req_headers(token = "tokenstring") %>% 
        req_body_multipart(Smiles = tmg_form_data) %>% 
        req_timeout(5) %>% 
        req_perform(verbosity = 2) %>% 
        resp_body_json()
    }
    getPKA("CCOP(=S)(OCC)OC1=NC(=C(C=C1Cl)Cl)Cl")  
    
    #> -> POST / HTTP/1.1
    #> -> Host: localhost:1234
    #> -> User-Agent: httr2/0.2.2 r-curl/5.0.0 libcurl/7.84.0
    #> -> Accept: */*
    #> -> Accept-Encoding: deflate, gzip
    #> -> token: tokenstring
    #> -> Content-Length: 220
    #> -> Content-Type: multipart/form-data; boundary=------------------------744c5a08426eb63b
    #> -> 
    #> >> --------------------------744c5a08426eb63b
    #> >> Content-Disposition: form-data; name="Smiles"; filename="tmg"
    #> >> Content-Type: text/plain
    #> >> 
    #> >> CCOP(=S)(OCC)OC1=NC(=C(C=C1Cl)Cl)Cl
    #> >> 
    #> >> --------------------------744c5a08426eb63b--
    #> Error:
    #> ! Timeout was reached: [localhost:1234] Operation timed out after 5001 milliseconds with 0 bytes received
    

    Created on 2023-05-13 with reprex v2.0.2