I have a dataset I've read in from hive/orc in Spark, but I'm getting all kinds of errors I did not get when reading in from a csv. How can I tell spark to convert that dataset to something that's not orc without hitting the disk? Right now I'm using this:
FileSystem.get(sc.hadoopConfiguration).delete(new Path(name));
loadedTbl.write.json(name);
val q = hc.read.json(name);
You can rewrite to any format and use it.
df.write.json('json_file_name')
df.write.parquet('parquet_file_name')