Search code examples
rhadoopazure-hdinsightmicrosoft-r

rxHadoopCopyFromLocal from Windows


What is the right syntax to copy from Windows to a remote HDFS?

I'm trying to copy a file from my local machine to a remote hadoop cluster using RStudio

rxHadoopCopyFromLocal("C:/path/to/file.csv", "/target/on/hdfs/")

This throws

copyFromLocal '/path/to/file.csv': no such file or directory`

Notice the C:/ disappeared.

This syntax also fails

rxHadoopCopyFromLocal("C:\\path\\to\\file.csv", "/target/on/hdfs/")

with error

-copyFromLocal: Can not create a Path from a null string

Solution

  • This is a common mistake.

    Turns out the rxHadoopCopyFromLocal command is a wrapper of the hdfs fs -copyFromLocal. All it does is copy from a local filesystem to an hdfs target.

    In this case the rxSetComputeContext(remotehost) was set to a remote cluster. On the remote machine, there is not a C:\path\to\file.csv

    Here are a couple of ways to get the files there.

    Configure local hdfs-site.xml for remote Hdfs Cluster

    • Ensure you have hadoop tools installed on your local machine
    • Edit your local hdfs-site.xml to point to the remote cluster
    • Ensure rxSetComputeContext("local")
    • Run rxHadoopCopyFromLocal("C:\local\path\to\file.csv", "/target/on/hdfs/")

    SCP and Remote Compute Context

    • Copy your file to the remote machine with scp C:\local\path\to\file.csv user@remotehost:/tmp
    • Ensure rxSetComputeContext(remotehost)
    • Run rxHadoopCopyFromLocal("/tmp/file.csv", "/target/on/hdfs/")