Search code examples
pythonpandasfunctionlambdadataset

Is there a more efficient way to apply this custom function to the entire dataset?


I have a dataset that looks like this with IP addresses (for security's sake, these are all made up):

0 1 2
100.0.200.0 160.60.30.0 NaN
NaN 101.60.10.0 10.0.0.1

I want to apply a function that would take these IP addresses (where they exist) and essentially return a sliced version of them by removing the fourth octet so it should look like this:

0 1 2
100.0.200 160.60.30 NaN
NaN 101.60.10 10.0.0

I have written the below code that does the job but it is very slow since it uses recursion and I want to be able to do this faster.

def sliceip(row):
 row = str(row)
 return row.rsplit(".",1)[0]

def applysliceip(rowx):
 for i, item in enumerate(rowx):
     rowx[i] = sliceip(item)
 return rowx


# And I apply this to the entire dataframe as such:

split_IPs = IPs.apply(lambda row: applysliceip(row))

So my Question is there a more pythonic and faster way to accomplish the above and return the same output without having to use so much memory?


Solution

  • You can use a regular expression to match and replace instead of using a custom function.

    IPs.replace(r"(\d+\.\d+\.\d+)\.\d+", r"\1", regex=True)