python dataframe pyspark azure-synapse spark-koalas

AttributeError: 'DataFrame' object has no attribute 'randomSplit'

I am trying to split my data into train and test sets. The data is a Koalas dataframe. However, when I run the below code I am getting the error:

AttributeError: 'DataFrame' object has no attribute 'randomSplit'

Please find below the code I am using:

splits = Closed_new.randomSplit([0.7,0.3])

Besides I tried the usual way of splitting the data after converting the Koalas to pandas. But it takes a lot of time to get executed in Synapse. Below is the code:

state = 12  
test_size = 0.30  
from sklearn.model_selection import train_test_split
  
X_train, X_val, y_train, y_val = train_test_split(Closed_new,labels,  
    test_size=test_size, random_state=state)

Solution

I'm afraid that, at the time of this question, Pyspark's randomSplit does not have an equivalent in Koalas yet.

One trick you can use is to transform the Koalas dataframe into a Spark dataframe, use randomSplit and convert the two subsets to Koalas back again.

splits = Closed_new.to_spark().randomSplit([0.7, 0.3], seed=12)
df_train = splits[0].to_koalas()
df_test = splits[1].to_koalas()