Is it possible to take arbitrary number of elements from array in PySpark?

My dataframe has two array columns. I want to take elements from first column whose indices are in the second column. For example, I have the following dataset

df = spark.createDataFrame(
   [
      {
         'text': ['0', '1', '2', '3', '4', '5']
         'indices': [0, 2, 4],
      },
   ]
)

So I want to have columns with value `['0', '2', '4].

Is it possible to achieve this without writingUDF?

Solution

You can try using the expr function with TRANSFORM and element_at to select elements from the first array based on the indices provided in the second array.

E.g.:

from pyspark.sql import SparkSession
from pyspark.sql.functions import expr

df = df.withColumn(
    "selected_text",
    expr("TRANSFORM(indices, i -> element_at(text, i))")
)
df.show()

Unexpected list append
Force matrix_world to be recalculated in Blender
SQLAlchemy and empty columns
ValueError: time data '24:00' does not match format '%H:%M'
Convert RDD of LabeledPoint to DataFrame toDF() Error
How to cancel trigonometric expressions in SymPy
Get view used in Django tests
Precompiled sasl python 3.9+ package for windows
Regex: Substitute pattern in string multiple times without leftovers
How to render raw html in the PyHTML library
Why does my implementation of trilateration give wrong results?
Django admin: how to sort by one of the custom list_display fields that has no database field
TypeError: not all arguments converted during string formatting - psycopg2
Is there a Python equivalent of the C# null-coalescing operator?
Kraken API - Account balances request returning Invalid Nonce
configparser without whitespace surrounding operator
Pytorch tensor to numpy array
Django: How to get a person whose birthday is today from a database?
Performance impact of inheriting from many classes
How can I do a line break (line continuation) in Python (split up a long line of source code)?
Using pydantic to change int to string
Breaking long method chains into multiple lines in Python
What do ** (double star/asterisk) and * (star/asterisk) mean in a function call?
How to install Pygame on Python 3.4?
Rotating values in a list [Python]
Launch default image viewer from pygtk program
what's the inverse of the quantile function on a pandas Series?
How can I install packages using pip according to the requirements.txt file from a local directory?
Python generate all n-permutations of n lists
FastAPI error when handling file together with form-data defined in a Pydantic model