How to create unique ID column in DASK_CUDF

How to create unique id column in dsak cudf dataframe across all the partitions So far I am using following technique, but if I increase data to more than 10cr rows it is giving me memory error.

def unique_id(df):
    rag = cupy.arrange(len(df))
    df['unique_id']=rag
    return df
    
part = data.npartitions
data = data.repartitions(npartitions=1)
cols_meta={c:str(data[c].dtype) for c in data.columns}
data = data.map_partitions(lambda df:unique_id(df), meta={**cols_meta,'unique_id'})
data = data.repartitions(npartitions=part)

If there's any other way, or any modification in code, please suggest. Thank you for help

Solution

I was doing that because wanted to create ids sequentially, till the length data.

The other suggestions will likely work. However, one of the easiest way to do this is to create a temporary column with value 1 and use cumsum, like the following:

import cudf
import dask_cudf

df = cudf.DataFrame({
    "a": ["dog"]*10
})
ddf = dask_cudf.from_cudf(df, 3)

ddf["temp"] = 1
ddf["monotonic_id"] = ddf["temp"].cumsum()
del ddf["temp"]

print(ddf.partitions[2].compute())
     a  monotonic_id
8  dog             9
9  dog            10

As expected, the two rows in the partition index 2 have IDs 9 and 10. If you need the indexes to start at 0, you can subtract 1.