What is the right syntax to copy from Windows to a remote HDFS?
I'm trying to copy a file from my local machine to a remote hadoop cluster using RStudio
rxHadoopCopyFromLocal("C:/path/to/file.csv", "/target/on/hdfs/")
This throws
copyFromLocal '/path/to/file.csv': no such file or directory`
Notice the C:/ disappeared.
This syntax also fails
rxHadoopCopyFromLocal("C:\\path\\to\\file.csv", "/target/on/hdfs/")
with error
-copyFromLocal: Can not create a Path from a null string
This is a common mistake.
Turns out the rxHadoopCopyFromLocal command is a wrapper of the hdfs fs -copyFromLocal. All it does is copy from a local filesystem to an hdfs target.
In this case the rxSetComputeContext(remotehost)
was set to a remote cluster. On the remote machine, there is not a C:\path\to\file.csv
Here are a couple of ways to get the files there.
Configure local hdfs-site.xml for remote Hdfs Cluster
rxSetComputeContext("local")
rxHadoopCopyFromLocal("C:\local\path\to\file.csv", "/target/on/hdfs/")
SCP and Remote Compute Context
scp C:\local\path\to\file.csv user@remotehost:/tmp
rxSetComputeContext(remotehost)
rxHadoopCopyFromLocal("/tmp/file.csv", "/target/on/hdfs/")