Search code examples
rsparklyrreticulatedatabricks-connect

Installing Databricks Connect fails because "LINK : fatal error LNK1181: cannot open input file 'R.lib'"


I am trying to set up database access to connect to my company's Databricks install, so I have been following the instructions here:

https://docs.databricks.com/en/dev-tools/databricks-connect/r/index.html

I have reached this line of the instructions:

pysparklyr::install_databricks(cluster_id = "<cluster-id>")

But am running into issues:

<snip>
Collecting rpy2
  Using cached rpy2-3.5.17.tar.gz (220 kB)
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'error'
  error: subprocess-exited-with-error
  
  × Getting requirements to build wheel did not run successfully.
  │ exit code: 1
  ╰─> [49 lines of output]
      test_pw_r.c
      LINK : fatal error LNK1181: cannot open input file 'R.lib'

Google suggests that this is something to do with the Visual Studio Build Tools, so I have reinstalled those in case there was a new version that resolved it, but it doesn't seem to have made any difference. I have also swapped in an older version of Python in case it helped, but it didn't seem to make any difference:

reticulate::install_python(version = "3.12.3")

became

reticulate::install_python(version = "3.11")

I think I have run up against similar issues in the past and got past them by simply side stepping them and letting sparklyr do the install instead as one of the steps in spark_connect:

sc <- sparklyr::spark_connect(
  cluster_id = Sys.getenv("DATABRICKS_CLUSTER_ID"),
  method     = "databricks_connect"
)

But if that was the solution last time, it doesn't seem to be helping this time around, and throws the same issue.

All of the info I'm finding online seems to assume I'm writing C++ code or similar, which... is making it not terribly applicable to my issue, heh.

Is this a bug, or am I making some sort of boneheaded error that is an easy fix to someone who knows what's going on?

Oh, in case it's relevant / not obvious, I'm on a Windows machine.


Solution

  • A friend of mine managed to find a Github issue regarding this:

    https://github.com/mlverse/pysparklyr/issues/125

    Apparently it's a known problem, but the fix is still only in the development version. I uninstalled the release version and replaced it with the development version:

    remotes::install_github("mlverse/pysparklyr@updates")

    And the error stopped flagging.