I have created a Spark SQL temp view named old_ticket_df
. In the temp view old_ticket_df
I have a column "approver" which appears to have whitespace. I have tried to remove the whitespace using trim()
, but it seems to have no effect. If I try instead to remove the whitespace using trim(BOTH '\t' from approver)
it appears to remove the whitespace.
I have created a Spark dataframe called compare_df
from the compare_query
below, where I have unioned two examples together applying the two different versions of the trim
function to the same field and value. The output from the code is below.
Can someone please tell me what the difference is between TRIM(approver)
and TRIM(BOTH '\t' FROM approver)
in Spark SQL?
Code:
compare_query="""
select * from (
select 'ver1' as data_source, TRIM(BOTH '\t' FROM approver) as approver
from old_ticket_df
where ticketid='xxxxx'
and ticketlineid='yyyyyyy'
and storeid='00'
and dateclosed='2020-06-06'
union
select 'ver2' as data_source, TRIM(approver) as approver
from old_ticket_df
where ticketid='xxxxx'
and ticketlineid='yyyyyyy'
and storeid='00'
and dateclosed='2020-06-06'
)
"""
compare_df=spark.sql(compare_query)
compare_df.show(truncate=False)
Output:
+-----------+------------------------+
|data_source|approver |
+-----------+------------------------+
|ver2 |user1@GMAIL.COM |
|ver1 |user1@GMAIL.COM |
+-----------+------------------------+
trim(str)
removes only space characters.
trim(BOTH trimStr FROM str)
removes characters which you specify in trimStr
.
Example:
df = spark.createDataFrame([(' SPACES ', ), ('\tTABS\t', )], ['str'])
df.createOrReplaceTempView("sql_df")
spark.sql("""
SELECT str,
TRIM(str) as trim,
TRIM(BOTH '\t' FROM str) as trim_both_tab,
TRIM(BOTH ' \t' FROM str) as trim_both_tab_space
FROM sql_df
""").show()
#+--------+--------+-------------+-------------------+
#| str| trim|trim_both_tab|trim_both_tab_space|
#+--------+--------+-------------+-------------------+
#| SPACES | SPACES| SPACES | SPACES|
#|\tTABS\t|\tTABS\t| TABS| TABS|
#+--------+--------+-------------+-------------------+
str
is the original with value " SPACES " (having surrounding space characters) and value "\tTABS\t" (having surrounding tab characters).trim
shows the result of trim(str)
. Space characters were trimmed, but tab characters remain.trim_both_tab
shows the result of TRIM(BOTH '\t' FROM str)
. Tab characters were trimmed, but space characters remain.trim_both_tab_space
shows the result of TRIM(BOTH ' \t' FROM str)
. Both space and tab characters were removed, because they both were provided.In your case, TRIM(approver)
only removed spaces, so in ver2
line you still have tabs remaining. While TRIM(BOTH '\t' FROM approver)
only removed tabs leaving spaces untouched.