Search code examples
pandasdataframepylint

Dataframe reported by Pylint as unsubscriptable or not supporting item assignement, but working as expected


Have a dataframe called topmentions. Here some data relative to it:

    <class 'pandas.core.frame.DataFrame'>
    Int64Index: 30 entries, 22 to 29
    Data columns (total 2 columns):
     #   Column     Non-Null Count  Dtype 
    ---  ------     --------------  ----- 
     0   reference  30 non-null     object
     1   freq       30 non-null     int64 
    dtypes: int64(1), object(1)
    memory usage: 720.0+ bytes
    None
                        reference  freq
    22         Giorgia Meloni|PER     4
    16      Matteo Piantedosi|PER     3
    10           Donald Trump|PER     3
    28  Gianfranco Baruchello|PER     3
    3        Tomaso Montanari|PER     3

Despite being a valid dataframe and despite the code below works as expected, the variable topmentions gets flagged by Pylint as Value 'topmentions' is unsubscriptable.

Here the code that gets flagged:

json_string = topmentions[
    topmentions["freq"].cumsum() < topmentions["freq"].sum() / 2
].to_json(orient="records")

All three topmentions variable names in the snippet are flagged as errors. What's wrong?

PS: I know I can suppress those errors adding # pylint: disable=unsubscriptable-object, but I'd like not to resort to such a trick


Solution

  • Nothing is wrong with your code, this seems to be an ongoing issue with Pylint, which wrongly thinks that your dataframe "does not support item assignment (i.e. doesn’t define setitem method)".

    Rather than disabling the warning, you can use Pandas loc property, which is probably preferable anyway (see Note here and this post).

    So, in the following (hopefully) reproducible example (as of the date of this answer, using Python 3.10.9, Pandas 1.5.2, Pylint 2.15.9):

    import pandas as pd
    
    df = pd.DataFrame({"col0": [1, 2], "col1": ["a", "b"]})
    df = df.set_index("col0")
    print(df[df["col1"] == "a"])
    
    # Output
         col1
    col0     
    1       a
    

    Running Pylint on the script prints out:

    script.py:4:6: E1136: Value 'df' is unsubscriptable (unsubscriptable-object)
    script.py:4:9: E1136: Value 'df' is unsubscriptable (unsubscriptable-object)
    ------------------------------------------------------------------
    Your code has been rated at 0.00/10 (previous run: 0.00/10, +0.00)
    

    Now, if you replace df[df["col1"] == "a"] with df.loc[df.loc[:, "col1"] == "a", :] and run Pylint again, everything is fine:

    
    --------------------------------------------------------------------
    Your code has been rated at 10.00/10 (previous run: 10.00/10, +0.00)
    

    Similarly:

    df["col2"] = ["c", "d"]
    

    Raises:

    script.py:4:9: E1137: 'df' does not support item assignment (unsupported-assignment-operation)
    

    But df.loc[:, "col2"] = ["c", "d"] does not.