I have defined the following helper methods
def load_excel(file_path: str, sheet_name: str = ''):
if sheet_name == '':
df = pd.read_excel(file_path).fillna('').apply(lambda x: x.astype(str).str.lower())
else:
df = pd.read_excel(file_path, sheet_name).fillna('').apply(lambda x: x.astype(str).str.lower())
return df
def build_score_dict(keywords_df: pd.DataFrame, tokens: list):
"""
Returns a tuple of two dictionories. i.e. tuple[dict, dict]
"""
matched_keywords_by_cat_dict={}
score_dict={}
cnt_cols = keywords_df.shape[1]
for col_idx in range(0, cnt_cols):
keyword_list=list(keywords_df.iloc[:,col_idx])
matched_keywords=[]
parent_cat=0
for j in range(0,len(tokens)):
token = tokens[j]
if token in keyword_list:
parent_cat= parent_cat + 1
matched_keywords.append(token)
parent_cat_name = keywords_df.columns[col_idx]
matched_keywords_by_cat_dict[parent_cat_name]=matched_keywords
score_dict[parent_cat_name]=parent_cat
return matched_keywords_by_cat_dict, score_dict
My call to build_score_dict
, as shown below
third_level_closing=load_excel(input_dir+'third_level_keywords.xlsx',sheet_name='closing')
_, level3_score_dict = build_score_dict(third_level_closing, tokens)
is giving me the following warning/error by Pylance in VSCode. What is happening here and how to fix it?
Argument of type "Series[Dtype]" cannot be assigned to parameter "keywords_df" of type "DataFrame" in function "build_score_dict"
"Series[Dtype]" is incompatible with "DataFrame"Pylance (reportGeneralTypeIssues)
If you give axis
a value in your call to apply
it should fix the issue:
def load_excel(file_path: str, sheet_name: str = ''):
if sheet_name == '':
df = pd.read_excel(file_path).fillna('').apply(lambda x: x.astype(str).str.lower(), axis='index')
else:
df = pd.read_excel(file_path, sheet_name).fillna('').apply(lambda x: x.astype(str).str.lower(), axis='index')
return df
If you add type information to the return value of the function load_excel
you will see that the type checker considers df
a Series
and not a DataFrame
:
And if we write the function code as follows we can quickly spot that the apply
method is the origin of the problem:
def load_excel(file_path: str, sheet_name: str = "") -> pd.DataFrame:
if sheet_name == "":
df: pd.DataFrame = pd.read_excel(file_path)
else:
df: pd.DataFrame = pd.read_excel(file_path, sheet_name)
df = df.fillna("")
df = df.apply(lambda x: x.astype(str).str.lower())
return df
If we ctrl-click on VSCode (on Windows) on apply
we can see the following:
This shows that if the only argument the apply
method receives is f
the type checker can't say which version of the apply
method is the one you want. It seems the Pylance implementation goes for the first definition it finds and this is I guess why you end up with the return of apply
assumed to be a Series
. When you add the axis
argument the type checker can now go for the second definition that returns a DataFrame
.