I'm in desperate need of help have a data that I would like to split into columns using regex(python), it must use regular expression
Jan 9 01:04:49 syntax sshd(pam_unix)[21354]: authentication failure; logname= uid=0 euid=0 tty=NODEVssh ruser= rhost=120-123-141-4.hinet-ip.hinet.com
Jul 10 04:17:11 syntax su(pam_unix)[95367]: session opened for user abc by (uid=0)
May 1 14:06:19 syntax su(pam_unix)[95367]: session closed for user abc
Oct 23 18:08:26 syntax logrotate: ALERT exited abnormally with [1]
Jun 14 21:42:52 syntax su(pam_unix)[95367]: session opened for user cbx by (uid=0)
It's actually from a url, I made it into a pandas dataFrame and tried using re.split but it gives me error
*ValueError: 1 columns passed, passed data had 24 columns*
Hope can I get the output I need?
Use regex as follows
Data
df=pd.DataFrame({'Text':['Jan 9 01:04:49 syntax sshd(pam_unix)[21354]: authentication failure; logname= uid=0 euid=0 tty=NODEVssh ruser= rhost=120-123-141-4.hinet-ip.hinet.com','Jul 10 04:17:11 syntax su(pam_unix)[95367]: session opened for user abc by (uid=0)','May 1 14:06:19 syntax su(pam_unix)[95367]: session closed for user ab']})
regex= ([A-Za-z]+\s+\d+\s+\d+:\d+:\d+)\s+|(?<=\])[:\s+]+|(?<=[x])\s+
df2=df.Text.str.split('([A-Za-z]+\s+\d+\s+\d+:\d+:\d+)\s+|(?<=\])[:\s+]+|(?<=[x])\s+', n=3, expand=True)
df2.rename(columns=({0:'DROP1',1:'Timestamp', 2:'A1', 3:'DROP', 4:'A2', 5:'DROP2',6:'A3'}),inplace=True)#Rename columns
df2.drop(columns=['DROP2','DROP1','DROP'],inplace=True)#Drop unwanted columns
Basically;
(?<=\])[:\s+]+
Split by space that comes after ]:
or -|
(?<=[x])\s+
Split by space that comes after x
or- |
([A-Za-z]+\s+\d+\s+\d+:\d+:\d+)\s+
Split the timestamp
Outcome