Spark RAPIDS - Operation not replaced with GPU version

I am new to Rapids and I have trouble understanding the supported operations.

I have data in following format:

+------------+----------+
|        kmer|source_seq|
+------------+----------+
|TGTCGGTTTAA$|         4|
|ACCACCACCAC$|         8|
|GCATAATTTCC$|         1|
|CCGTCAAAGCG$|         7|
|CCGTCCCGTGG$|         6|
|GCGCTGTTATG$|         2|
|GAGCATAGGTG$|         5|
|CGGCGGATTCT$|         0|
|GGCGCGAGGGT$|         3|
|CCACCACCAC$A|         8|
|CACCACCAC$AA|         8|
|CCCAAAAAAAAA|         0|
|AAGAAAAAAAAA|         5|
|AAGAAAAAAAAA|         0|
|TGTAAAAAAAAA|         0|
|CCACAAAAAAAA|         8|
|AGACAAAAAAAA|         7|
|CCCCAAAAAAAA|         0|
|CAAGAAAAAAAA|         5|
|TAAGAAAAAAAA|         0|
+------------+----------+

And to I am trying to find out which "kmer"s have which "source_seq"'s, using the following code:

val w = Window.partitionBy("kmer")
x.withColumn("source_seqs", collect_list("source_seq").over(w))

// Result is something like this:
+------------+----------+-----------+                                           
|        kmer|source_seq|source_seqs|
+------------+----------+-----------+
|AAAACAAGACCA|         2|        [2]|
|AAAACAAGCAGC|         4|        [4]|
|AAAACCACGAGC|         3|        [3]|
|AAAACCGCCAAA|         7|        [7]|
|AAAACCGGTGTG|         1|        [1]|
|AAAACCTATATC|         5|        [5]|
|AAAACGACTTCT|         6|        [6]|
|AAAACGCGCAAG|         3|        [3]|
|AAAAGGCCTATT|         7|        [7]|
|AAAAGGCGTTCG|         3|        [3]|
|AAAAGGCTGTGA|         1|        [1]|
|AAAAGGTCTACC|         2|        [2]|
|AAAAGTCGAGCA|         7|     [7, 0]|
|AAAAGTCGAGCA|         0|     [7, 0]|
|AAAATCCGATCA|         0|        [0]|
|AAAATCGAGCGG|         0|        [0]|
|AAAATCGTTGAA|         7|        [7]|
|AAAATGGACAAG|         1|        [1]|
|AAAATTGCACCA|         3|        [3]|
|AAACACCGCCGT|         3|        [3]|
+------------+----------+-----------+

The Spark Rapids supported operators documentation mentions collect_list being supported only by windowing, which is what I am doing in my code as far as I know.

However, looking at the query plan, it is easy to see that the collect_list is not executed by the GPU:

scala> x.withColumn("source_seqs", collect_list("source_seq").over(w)).explain
== Physical Plan ==
Window [collect_list(source_seq#302L, 0, 0) windowspecdefinition(kmer#301, specifiedwindowframe(RowFrame, unboundedpreceding$(), unboundedfollowing$())) AS max_source#658], [kmer#301]
+- GpuColumnarToRow false
   +- GpuSort [kmer#301 ASC NULLS FIRST], false, RequireSingleBatch, 0
      +- GpuCoalesceBatches RequireSingleBatch
         +- GpuShuffleCoalesce 2147483647
            +- GpuColumnarExchange gpuhashpartitioning(kmer#301, 200), ENSURE_REQUIREMENTS, [id=#1496]
               +- GpuFileGpuScan csv [kmer#301,source_seq#302L] Batched: true, DataFilters: [], Format: CSV, Location: InMemoryFileIndex[file:/home/cloud-user/phase1/example/1620833755/part-00000], PartitionFilters: [], PushedFilters: [], ReadSchema: struct<kmer:string,source_seq:bigint>

Unlike a similar query with different function, where we can see the windowing executed with GPU:

scala> x.withColumn("min_source", min("source_seq").over(w)).explain
== Physical Plan ==
GpuColumnarToRow false
+- GpuWindow [gpumin(source_seq#302L) gpuwindowspecdefinition(kmer#301, gpuspecifiedwindowframe(RowFrame, gpuspecialframeboundary(unboundedpreceding$()), gpuspecialframeboundary(unboundedfollowing$()))) AS max_source#648L], [kmer#301], false
   +- GpuSort [kmer#301 ASC NULLS FIRST], false, RequireSingleBatch, 0
      +- GpuCoalesceBatches RequireSingleBatch
         +- GpuShuffleCoalesce 2147483647
            +- GpuColumnarExchange gpuhashpartitioning(kmer#301, 200), ENSURE_REQUIREMENTS, [id=#1431]
               +- GpuFileGpuScan csv [kmer#301,source_seq#302L] Batched: true, DataFilters: [], Format: CSV, Location: InMemoryFileIndex[file:/home/cloud-user/phase1/example/1620833755/part-00000], PartitionFilters: [], PushedFilters: [], ReadSchema: struct<kmer:string,source_seq:bigint>

Am I understanding the supported operations documentation wrong somehow, or have I written the code in a wrong way? Any help for this would be appreciated.

Solution

Yes as Mithun mentioned, the spark.rapids.sql.expression.CollectList started to be true starting from 0.5 release. However it is false in 0.4 release: https://github.com/NVIDIA/spark-rapids/blob/branch-0.4/docs/configs.md

Here is the plan i tested on 0.5+ version:

val w = Window.partitionBy("name")
val resultdf=dfread.withColumn("values", collect_list("value").over(w))
resultdf.explain

== Physical Plan ==
GpuColumnarToRow false
+- GpuWindow [collect_list(value#134L, 0, 0) gpuwindowspecdefinition(name#133, gpuspecifiedwindowframe(RowFrame, gpuspecialframeboundary(unboundedpreceding$()), gpuspecialframeboundary(unboundedfollowing$()))) AS values#138], [name#133], false
   +- GpuCoalesceBatches RequireSingleBatch
      +- GpuSort [name#133 ASC NULLS FIRST], false, com.nvidia.spark.rapids.OutOfCoreSort$@28e73bd1
         +- GpuShuffleCoalesce 2147483647
            +- GpuColumnarExchange gpuhashpartitioning(name#133, 200), ENSURE_REQUIREMENTS, [id=#563]
               +- GpuFileGpuScan csv [name#133,value#134L] Batched: true, DataFilters: [], Format: CSV, Location: InMemoryFileIndex[file:/tmp/df], PartitionFilters: [], PushedFilters: [], ReadSchema: struct<name:string,value:bigint>