Search code examples
pythonpysparkpycharm

pyspark show doesn't work properly in PyCharm


I've recently updated to a newer version of pycharm PyCharm 2022.1.1 (Community Edition) and noticed that pyspark show() function is not working properly anymore. Here is what I'm dealing with right now, so when I try to show 10 rows from a dataframe it shows data like:

df.show(10)
+-------------------+------------+----------+----------+
|               date|     user_id|   prod_id|    counts|
+-------------------+------------+----------+----------+
|2022-05-31 00:00:00|       UUU91|     88888|       234|
|2022-05-31 00:00:00|       UUU92|     99999|       234|
|2022-05-31 00:00:00|       UUU93|     00000|       ...

So expected to see all 10 rows (there are millions of rows in there so I'm sure there is data), but as you can see it is sort of trimming the output. When I do the following though, the results show up correctly:

df.select('user_id', 'prod_id').show(10)
+------------+----------+
|     user_id|   prod_id|
+------------+----------+
|       UUU91|     88888|
|       UUU92|     99997|
|       UUU93|     99995|
|       UUU94|     99949|
|       UUU95|     99989|
|       UUU96|     99909|
|       UUU97|     99919|
|       UUU98|     99929|
|       UUU99|     99939|
|       UUU90|     99949|
+------------+----------+

This used to work just fine before I switch to newer version and I don't know whether this is pyspark issue or pycharm issue so any help is much appreciated.


Solution

  • This seems to be a known issue with the mentioned pycharm version as per : https://youtrack.jetbrains.com/issue/PY-53983/Debug-console-cuts-off-truncates-output

    So I had to install an older version of pycharm to fix this for now (PyCharm 2021.3.3 (Community Edition)).