Search code examples
pythonpandasnumpypandas-groupbypandas-datareader

How to perform below task using pandas which is faster and does not give warning 'SettingWithCopyWarning'


I am accessing the timestamp from the data. If several timestamps have same value, then changing them. In that case add 2 to the second timestamp if two are same. If three are same then add 2 to second and add 4 to third and so on. I get a warning /anaconda/lib/python3.6/site-packages/ipykernel/main.py:8: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy

My problem is: it takes too long to perform the task. I want to know if there is some specific way to perform the task in a more precise way in pandas. Please consider I am new to panda.

dftime =df.time 
for i in range (len(dftime)):
    if i!=0:
        if dftime[i]==dftime[i-1]:
            dftime[i]=dftime[i]+2
        if dftime[i]<dftime[i-1]:
            dftime[i]=dftime[i-1]+2

Solution

  • Generally, you should never use the for i in range(len(collection)) construct to iterate over a collection in Python since you can simply use for item in collection.

    Particularly in pandas, you rarely have to iterate over series and if you do, you should never modify something you are iterating over. Depending on the data types, the iterator might return a copy and writing to it will have no effect. Instead, you should opt for operations on entire arrays.

    The pandas way to rewrite your code would be

    dftime = df.time
    dftime[dftime == dftime.shift()] += 2
    dftime[dftime < dftime.shift()] += 2