I build my own hdfs with windows single cluster, got reference from this link
then I already input my parquet files, but I can't read the file from other computer.
here's my code with python
import pyarrow as pa
import pyarrow.parquet as pq
hdfs_path = "hdfs://10.35.105.35:9820/tampo/oee_tampo.parquet"
fs = pa.hdfs.connect()
table = pq.read_table(hdfs_path, filesystem=fs)
import pandas as pd
df = table.to_pandas()
fs.close()
error
1522 # pipe will not close when the child process exits and the
1523 # ReadFile will hang.
1524 self._close_pipe_fds(p2cread, p2cwrite,
1525 c2pread, c2pwrite,
1526 errread, errwrite)
FileNotFoundError: [WinError 2] The system cannot find the file specified
Anyone who can fix this, or is that any other ways to get the parquet file from hdfs?
Have you tried pandas' read_parquet()
?
df = pd.read_parquet('hdfs://10.35.105.35:9820/tampo/oee_tampo.parquet')
df