So I have a few large .rdata
files that were generated through use of the R programming language. I currently have uploaded them to azure data lake using Azure storage explorer. But I have to convert these rdata files to parquet format and then reinsert them into the data lake. How would I go about doing this? I can't seem to find any information about converting from rdata to parquet.
If you can use python, there are some libraries, like pyreadr, to load rdata
files as pandas dataframes. You can then write to parquet using pandas or convert to pyspark dataframe. Something like this:
import pyreadr
result = pyreadr.read_r('input.rdata')
print(result.keys()) # check the object name
df = result["object"] # extract the pandas data frame for object name
sdf = spark.createDataFrame(df)
sdf.write.parquet("output")