I have a PySpark DataFrame and I want to apply "Initcap" on a specific column. However, I want this transformation only on records whose value is not all capitals. For example ,in the sample dataset below, I don't want to apply "Initcap" on USA:
# Prepare Data
data = [(1, "Italy"), \
(2, "italy"), \
(3, "USA"), \
(4, "China"), \
(5, "china")
]
# Create DataFrame
columns= ["ID", "Country"]
df = spark.createDataFrame(data = data, schema = columns)
df.show(truncate=False)
The expected output will be:
ID | Country |
---|---|
1 | 'Italy' |
2 | 'Italy' |
3 | 'USA' |
4 | 'China' |
5 | 'China' |
df.withColumn('Country',when(df.Country==upper(df.Country),df.Country).otherwise( initcap('Country'))).show(truncate=False)
+---+-------+
|ID |Country|
+---+-------+
|1 |Italy |
|2 |Italy |
|3 |USA |
|4 |China |
|5 |China |
+---+-------+