Search code examples
pythonpython-3.xpandasdataframepylance

Argument of type "Series[Dtype]" cannot be assigned to parameter of type "DataFrame"


I have defined the following helper methods

def load_excel(file_path: str, sheet_name: str = ''):
    if sheet_name == '':
        df = pd.read_excel(file_path).fillna('').apply(lambda x: x.astype(str).str.lower())
    else:
        df = pd.read_excel(file_path, sheet_name).fillna('').apply(lambda x: x.astype(str).str.lower())
        
    return df

def build_score_dict(keywords_df: pd.DataFrame, tokens: list):
    """
    Returns a tuple of two dictionories. i.e. tuple[dict, dict]
    """
    matched_keywords_by_cat_dict={}
    score_dict={}

    cnt_cols = keywords_df.shape[1]
    
    for col_idx in range(0, cnt_cols):
        keyword_list=list(keywords_df.iloc[:,col_idx])
        matched_keywords=[]
        parent_cat=0
        for j in range(0,len(tokens)):
            token = tokens[j]
            if token in keyword_list:
                parent_cat= parent_cat + 1
                matched_keywords.append(token)
                parent_cat_name = keywords_df.columns[col_idx]
                matched_keywords_by_cat_dict[parent_cat_name]=matched_keywords
                score_dict[parent_cat_name]=parent_cat
    
    return matched_keywords_by_cat_dict, score_dict

My call to build_score_dict, as shown below

third_level_closing=load_excel(input_dir+'third_level_keywords.xlsx',sheet_name='closing')     
_, level3_score_dict = build_score_dict(third_level_closing, tokens)

is giving me the following warning/error by Pylance in VSCode. What is happening here and how to fix it?

Argument of type "Series[Dtype]" cannot be assigned to parameter "keywords_df" of type "DataFrame" in function "build_score_dict"
  "Series[Dtype]" is incompatible with "DataFrame"Pylance (reportGeneralTypeIssues)

Solution

  • Workaround

    If you give axis a value in your call to apply it should fix the issue:

    def load_excel(file_path: str, sheet_name: str = ''):
        if sheet_name == '':
            df = pd.read_excel(file_path).fillna('').apply(lambda x: x.astype(str).str.lower(), axis='index')
        else:
            df = pd.read_excel(file_path, sheet_name).fillna('').apply(lambda x: x.astype(str).str.lower(), axis='index')
            
        return df
    

    Explication

    If you add type information to the return value of the function load_excel you will see that the type checker considers df a Series and not a DataFrame:

    enter image description here

    And if we write the function code as follows we can quickly spot that the apply method is the origin of the problem:

    def load_excel(file_path: str, sheet_name: str = "") -> pd.DataFrame:
        if sheet_name == "":
            df: pd.DataFrame = pd.read_excel(file_path)
        else:
            df: pd.DataFrame = pd.read_excel(file_path, sheet_name)
    
        df = df.fillna("")
        df = df.apply(lambda x: x.astype(str).str.lower())
    
        return df
    

    enter image description here

    If we ctrl-click on VSCode (on Windows) on apply we can see the following: enter image description here

    This shows that if the only argument the apply method receives is f the type checker can't say which version of the apply method is the one you want. It seems the Pylance implementation goes for the first definition it finds and this is I guess why you end up with the return of apply assumed to be a Series. When you add the axis argument the type checker can now go for the second definition that returns a DataFrame.