Search code examples
pythonnumpyspark-koalas

facing issues in installing koalas for Python version 3.8.10 (AttributeError: module 'numpy' has no attribute 'bool')


According to this document https://koalas.readthedocs.io/en/latest/getting_started/install.html

System info:

numpy   1.24.3  
koalas  1.8.2 
pyspark 3.4.0 
Python  3.8.10  

Facing Issue when trying to read csv file

import databricks.koalas as ks
import time
import numpy as np
df_koalas=ks.read_csv('train.csv') 

AttributeError: module 'numpy' has no attribute 'bool'

AttributeError: module 'numpy' has no attribute 'bool'.
`np.bool` was a deprecated alias for the builtin `bool`. To avoid this error in existing code, use `bool` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.bool_` here.
The aliases was originally deprecated in NumPy 1.20; for more details and guidance see the original release note at:

Solution

  • Koalas hasn't been maintained as an individual project in a while, as its functionality was incorporated directly into PySpark as of Spark 3.2.0. It is not compatible with recent NumPy versions. You need to migrate to the new Spark Pandas API.