Search code examples
Multiple condition on same column in sql or in pyspark...


mysqlsqlpysparkapache-spark-sql

Read More
Is there are difference between PySpark and SparkSQL? If so, what's the difference?...


pysparkapache-spark-sql

Read More
REGEX of hive output is different from spark sql regex output...


pythonregexpysparkhive

Read More
Performing filtering in PySpark...


pysparkfiltering

Read More
How to partition by groups of N in PySpark...


apache-sparkpysparkgroup-bygrouping

Read More
weekofyear() returning seemingly incorrect results for January 1...


apache-sparkpysparkapache-spark-sqlweek-number

Read More
Pyspark columns incorrectly converted to string after unnesting...


pythonpyspark

Read More
Pyspark - How to calculate the average on the text data...


pythonpyspark

Read More
PySpark replace column value with another column value on multiple conditions...


pythondataframepyspark

Read More
Py4JJavaError: Cannot detect ES version when connecting PYSpark to Elasticsearch...


pythonapache-sparkelasticsearchpyspark

Read More
GCP Cloud Composer 2.1.15, getting Exception: Java gateway process exited before sending its port nu...


google-cloud-platformpysparkgoogle-cloud-composer

Read More
Reading highest version of delta lake table...


pysparkdelta-lake

Read More
Convert array column to struct with indices from static list in PySpark...


apache-sparkpyspark

Read More
AWS Glue Pyspark Python UDFRunner timing info total/boot/init/finish...


pythonapache-sparkpysparkuser-defined-functionsaws-glue

Read More
Apply custom function to cells of selected columns of a data frame in PySpark...


pythonapache-sparkpysparkapache-spark-sql

Read More
Pyspark: check if the consecutive values of a column are the same...


pythondataframepysparkapache-spark-sql

Read More
Is it faster to cast within filter() or cast new withColumn(), then filter in Spark?...


apache-sparkpysparkfiltercasting

Read More
Programmatically cancelling a pyspark dataproc batch job...


gogoogle-cloud-platformpysparkgoogle-cloud-dataprocgoogle-cloud-dataproc-serverless

Read More
Pyspark filter the dataframe if a row is part of a list's value...


pythonapache-sparkpysparkdatabricks

Read More
net.razorvine.pickle.PickleException: expected zero arguments for construction of ClassDict (for num...


pythonpysparkdatabricksparquetgeopandas

Read More
How to use date values in a spark-sql query?...


sqlapache-sparkpysparkapache-spark-sql

Read More
Regular expression to extract Hyphenated text including other special characters between multiple hy...


regexpyspark

Read More
No module named 'pyspark.testing' when running CI/CD...


apache-sparkpysparkazure-devopscontinuous-integration

Read More
How to create new column dynamically in pandas like we do in pyspark withColumn...


pythonpandaspyspark

Read More
AssertionError: all exprs should be Column...


pythonapache-sparkpyspark

Read More
botocore.exceptions.NoRegionError: You must specify a region for EmrServerlessCreateApplicationOpera...


amazon-web-servicespysparkairflowamazon-emremr-serverless

Read More
Rank base on date repeating elements...


pyspark

Read More
Count words from a list within array columns without invoking a shuffle...


arraysapache-sparkpysparkapache-spark-sqlspark-shuffle

Read More
Longest common substring in PySpark...


dataframeapache-sparkpysparkapache-spark-sqlsubstring

Read More
needs assistance in writing cte recursive in py spark azure databricks which is in below sql format...


apache-sparkpysparkdatabricksazure-databricksdatabricks-sql

Read More
BackNext