I am working on a data frame that contains computer names and I am trying to anonymize the computer names. Here is an example of the dataframe, I am working with
df = pd.DataFrame({'id': [1, 2, 3, 4, 5], 'computer_name': [u'LENOVO 09 X32H0GB', u'LENOVO vmhsbpmh613.xyz.biz', u'Dell Inc. PowerEdge R910 XKF2S75', u'HP ppesfesxb203.corp.123.com', 'IBM SoftLayer 13 L89P4567']})
Here is what it is required to anonymize it.
Pick the first set of strings from the RIGHT after the first SPACE from the RIGHT .. eg : for "LENOVO vmhsbpmh613.xyz.biz" it would be "vmhsbpmh613.xyz.biz"
After getting the first set of strings from the RIGHT eg "vmhsbpmh613.xyz.biz", remove all characters from the first Dot (.) , which would give "vmhsbpmh613" and if there are no Dot(.) then retain only the last set of string , Please note it is important to remove only the strings after dot (.) from first set of strings from the RIGHT, otherwise like in this example " Dell Inc. PowerEdge R910 XKF2S75 " it would result in removing everything after Dot " Dell Inc. "
Lastly replace the first 3 characters with xxx , like xxxsbpmh613
Here is how the output should look like
df = pd.DataFrame({'id': [1, 2, 3, 4, 5], 'computer_name': [u'LENOVO 09 xxxH0GB', u'LENOVO xxxsbpmh613', u'Dell Inc. PowerEdge R910 xxx2S75', u'HP xxxsfesxb203', 'IBM SoftLayer 13 xxxP4567']})
I hope, I was able to articulate the requirement clearly, thanks.
Series.str.replace
df['computer_name'].str.replace(r'\S{3}(\S+?)(?:\.\S+|$)', r'xxx\1')
0 LENOVO 09 xxxH0GB
1 LENOVO xxxsbpmh613
2 Dell Inc. PowerEdge R910 xxx2S75
3 HP xxxsfesxb203
4 IBM SoftLayer 13 xxxP4567
Name: computer_name, dtype: object
Regex details
\S{3}
: Matches any non-whitespace character extactly 3
times.(\S+?)
: Capturing group matches any non-whitespace character between 1 and unlimited times but as few times as possible (lazy match)(?:
: Begining of non-capturing group\.
: Matches .
character\S+
: Mathes any non-whitespace character$
: Asserts position at the end of line)
: Ending of non capturing groupSee the regex demo