I am trying to read a '.tif' image which is of [m,n,4] (rows, columns, channels) dimension with 'uint16' data type from HDFS in PySpark using libraries like 'tifffile' using the following code:
import tifffile as tiff\ img = tiff.imread('hdfs://master:9000/image1.tif')
, but I always get the message:
FileNotFoundError: [Errno 2] No such file or directory: '/home/user/spark_files/tfos/hdfs:/master:9000/image1.tif'
.
The HDFS path of the image hdfs://master:9000/image1.tif
is correct and 'tifffile' lib works well when using the local file system instead of HDFS.
It looks like the image library does not understand HDFS paths!
How to solve this considering Spark API can't read this kind of image?
Finally, I could solve this problem using hdfs
and imagecodecs
libraries:
from pyarrow import hdfs
import imagecodecs
connect = hdfs.connect("master",9000)
img_file = connect.open('/img1.tif', mode='rb')
img_bytes = img_file.read()
numpy_img = imagecodecs.tiff_decode(img_bytes)