Search code examples
pythonregexpandasdataframestrip

How to strip/replace "domain\" from Pandas DataFrame Column?


I have a pandas DataFrame that's being read in from a CSV that has hostnames of computers including the domain they belong to along with a bunch of other columns. I'm trying to strip out the Domain information such that I'm left with ONLY the Hostname.

DataFrame ex:

name
domain1\computername1
domain1\computername45
dmain3\servername1
dmain3\computername3
domain1\servername64
....

I've tried using both str.strip() and str.replace() with a regex as well as a string literal, but I can't seem to correctly target the domain information correctly.

Examples of what I've tried thus far:

df['name'].str.strip('.*\\')

df['name'].str.replace('.*\\', '', regex = True)

df['name'].str.replace(r'[.*\\]', '', regex = True)

df['name'].str.replace('domain1\\\\', '', regex = False)
df['name'].str.replace('dmain3\\\\', '', regex = False)

None of these seem to make any changes when I spit the DataFrame out using logging.debug(df)


Solution

  • You are already close to the answer, just use:

    df['name'] = df['name'].str.replace(r'.*\\', '', regex = True)
    

    which just adds using r-string from one of your tried code.

    Without using r-string here, the string is equivalent to .*\\ which will be interpreted to only one \ in the final regex. However, with r-string, the string will becomes '.*\\\\' and each pair of \\ will be interpreted finally as one \ and final result becomes 2 slashes as you expect.

    Output:

    0     computername1
    1    computername45
    2       servername1
    3     computername3
    4      servername64
    Name: name, dtype: object