I am having some issues with trying to suppress pyspark warnings, specifically pandas on spark API. What I currently have:
import warnings
warnings.simplefilter(action='ignore', category=Warning)
warnings.filterwarnings("ignore")
import pandas as pd
from pyspark.sql import SparkSession
from pyspark.sql import functions as F
import pyspark.pandas as
%%capture
spark = SparkSession.builder\
.master("local[32]")\
.config("spark.driver.memory", "150g")
.config("spark.driver.maxResultSize", "40g")\
.config("spark.python.worker.memory", "1g")\
.config("spark.num.executors","(3x-2)")\
.config("spark.num.executor.cores","5")\
.config("spark.driver.cores", "5")\
.appName("Analysis")\
.getOrCreate()
spark.sparkContext.setLogLevel("OFF")
Then followed by the actual data analysis:
spark.catalog.clearCache()
enc = ps.read_parquet("/example_path/")
enc.columns = [i.lower() for i in enc.columns]
print(enc.en_end_date.min())
print(enc.en_end_date.max())
enc['year'] = enc.en_end_date.apply(lambda x: x.strftime('%Y') if pd.notnull(x) else np.nan)
enc['month'] = enc.en_end_date.apply(lambda x: x.strftime('%m') if pd.notnull(x) else np.nan)
enc['day'] = enc.en_end_date.apply(lambda x: x.strftime('%d') if pd.notnull(x) else np.nan)
enc[(enc.year >= 2024) & (enc.month >= 1) & (enc.day >= 1)]
And here is where the actual issue is happening. I am getting absolutely bombarded with:
/example/miniconda/lib/python3.8/site-packages/pyspark/python/lib/pyspark.zip/pyspark/pandas/internal.py:1573: FutureWarning: iteritems is deprecated and will be removed in a future version. Use .items instead.
/example/miniconda/lib/python3.8/site-packages/pyspark/python/lib/pyspark.zip/pyspark/pandas/internal.py:1573: FutureWarning: iteritems is deprecated and will be removed in a future version. Use .items instead.
/example/miniconda/lib/python3.8/site-packages/pyspark/python/lib/pyspark.zip/pyspark/pandas/internal.py:1573: FutureWarning: iteritems is deprecated and will be removed in a future version. Use .items instead.
/example/miniconda/lib/python3.8/site-packages/pyspark/python/lib/pyspark.zip/pyspark/pandas/internal.py:1573: FutureWarning: iteritems is deprecated and will be removed in a future version. Use .items instead.
Hundreds of times. I would just like to turn this off. Any suggestions.
To anyone having this problem, roll back your Pandas versions until the warnings stop, unfortunatley there is no other way to suppress this.