Search code examples
apache-pigbigdata

Pig Processing log file using


I have following logs : Can any one tell me how can I processes it using PigLatin ?

**

SYSTEM IP:192.168.68.78 
Distro info:Red Hat Enterprise Linux Server release 6.6 (Santiago)
Kernel:Linux bugzilla-blr-in 2.6.32-504.16.2.el6.x86_64 #1 SMP Tue Mar 10 17:01:00 EDT 2015 x86_64 x86_64 x86_64 GNU/Linux
Uptime:12:27:42 up 8 days, 17:57,  0 users,  load average: 0.00, 0.00, 0.00
Memory:Total:1869Mb Memory:Used:1567Mb  Memory:Free:302Mb
Swap:Total:1999Mb   Swap:Used:0Mb   Swap:Free: 1999Mb
Architecture:x86_64
  Processor:0:Intel(R) Xeon(R) CPU E5-2640 0 @ 2.50GHz
Date:Wed Jun 29 12:27:42 IST 2016

SCRIPT USER
User:aimsadm (uid:503)
Groups:aimsadm
Working dir:/home/aimsadm
Home dir:/home/aimsadm

NETWORK DETAILS
Hostname:bugzilla-blr-in
IP (    ):127.0.0.1/8
IP (eth0):192.168.68.78/24
Gateway:192.168.68.1
Name Server:8.8.8.8
Name Server:192.168.68.80

LIST OF USERS:sdudam,sudutha,djegathesa,aimsadm,krishnang,

CLAMD STATUS: CLAM AV service is stopped or not installed

NAGIOS STATUS: Nagios service is running

OSSEC STATUS: Ossec service is stopped or not installed

NTPD STATUS: NTP service is running

HARDENING STATUS:Hardening Done

AD INTEGRATION STATUS:AD Integration Not Done

HARDWARE/PLATFORM DETAILS
Hardware Platform:64Bit
Hardware Info :DMI 2.3 present.
DMI: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090006  05/23/2012

OS DETAILS
Red Hat Enterprise Linux Server release 6.6 (Santiago)
Linux bugzilla-blr-in 2.6.32-504.16.2.el6.x86_64 #1 SMP Tue Mar 10 17:01:00 EDT 2015 x86_64 x86_64 x86_64 GNU/Linux

CPU INFO
model name  : Intel(R) Xeon(R) CPU E5-2640 0 @ 2.50GHz

MEMORY INFO
MemTotal:        1914776 kB
RAM:1 GB

HARD DISK DETAILS

MOUNT DETAILS
Filesystem:/dev/mapper/vg_bugzillablrin-LogVol00,Type:ext4,Total Size:22G,Used:2.4G,Avail:19G,Use%:12%,Mounted on:/
Filesystem:tmpfs,Type:tmpfs,Total Size:981M,Used:0,Avail:981M,Use%:0%,Mounted on:/dev/shm
Filesystem:/dev/sda1,Type:ext4,Total Size:297M,Used:95M,Avail:186M,Use%:34%,Mounted on:/boot
Filesystem:/dev/mapper/vg_bugzillablrin-LogVol01,Type:ext4,Total Size:21G,Used:5.8G,Avail:14G,Use%:30%,Mounted on:/var

LSBLK OUTPUT
NAME:sr0,
MAJ:MIN:11:0,RM:1,SIZE:1024M,RO:0,TYPE:rom,MOUNTPOINT::
NAME:sda,
MAJ:MIN:8:0,RM:0,SIZE:60G,RO:0,TYPE:disk,MOUNTPOINT::
NAME:sda1,
MAJ:MIN:8:1,RM:0,SIZE:300M,RO:0,TYPE:part,MOUNTPOINT::/boot
NAME:sda2,
MAJ:MIN:8:2,RM:0,SIZE:59.7G,RO:0,TYPE:part,MOUNTPOINT::

RUNNING SERVICES
auditd running...
crond running...
messagebus running...
nrpe running...
ntpd running...
rhnsd running...
rhsmcertd running...
rpcbind running...
openssh-daemon running...




**

SYSTEM IP:192.168.68.35 
Distro info:CentOS release 6.6 (Final)
Kernel:Linux altifin-ci-app 2.6.32-504.16.2.el6.x86_64 #1 SMP Wed Apr 22 06:48:29 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
Uptime:12:28:06 up 48 days, 20:31,  0 users,  load average: 0.00, 0.00, 0.00
Memory:Total:11903Mb    Memory:Used:1277Mb  Memory:Free:10625Mb
Swap:Total:8191Mb   Swap:Used:0Mb   Swap:Free: 8191Mb
Architecture:x86_64
  Processor:0:Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz
  Processor:1:Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz
Date:Wed Jun 29 12:28:06 IST 2016

SCRIPT USER
User:aimsadm (uid:509)
Groups:aimsadm
Working dir:/home/aimsadm
Home dir:/home/aimsadm

NETWORK DETAILS
Hostname:altifin-ci-app
IP (lo):127.0.0.1/8
IP (eth0):192.168.68.35/24
Gateway:192.168.68.1
Name Server:192.168.68.10
Name Server:192.168.68.4

LIST OF USERS:altipay,aramesh,sdudam,nagios,kpankaj,sudutha,miyappan,skosanam,djegathesa,aimsadm,

CLAMD STATUS: CLAM AV service is stopped or not installed

NAGIOS STATUS: Nagios service is running

OSSEC STATUS: Ossec service is stopped or not installed

NTPD STATUS: NTP service is running

HARDENING STATUS:Hardening Done

AD INTEGRATION STATUS:AD Integration Not Done

HARDWARE/PLATFORM DETAILS
Hardware Platform:64Bit
Hardware Info :DMI 2.3 present.
DMI: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090006  05/23/2012

OS DETAILS
CentOS release 6.6 (Final)
Linux altifin-ci-app 2.6.32-504.16.2.el6.x86_64 #1 SMP Wed Apr 22 06:48:29 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

CPU INFO
model name  : Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz
model name  : Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz

MEMORY INFO
MemTotal:       12189032 kB
RAM:11 GB

HARD DISK DETAILS

MOUNT DETAILS
Filesystem:/dev/mapper/vg_altifinci-LogVol01,Type:ext4,Total Size:203G,Used:80G,Avail:113G,Use%:42%,Mounted on:/
Filesystem:tmpfs,Type:tmpfs,Total Size:6.3G,Used:0,Avail:6.3G,Use%:0%,Mounted on:/dev/shm
Filesystem:/dev/sda1,Type:ext4,Total Size:500M,Used:64M,Avail:410M,Use%:14%,Mounted on:/boot

LSBLK OUTPUT
NAME:sr0,
MAJ:MIN:11:0,RM:1,SIZE:1024M,RO:0,TYPE:rom,MOUNTPOINT::
NAME:sda,
MAJ:MIN:8:0,RM:0,SIZE:200G,RO:0,TYPE:disk,MOUNTPOINT::
NAME:sda1,
MAJ:MIN:8:1,RM:0,SIZE:500M,RO:0,TYPE:part,MOUNTPOINT::/boot
NAME:sda2,
MAJ:MIN:8:2,RM:0,SIZE:199.5G,RO:0,TYPE:part,MOUNTPOINT::

RUNNING SERVICES
abrtd running...
abrt-dump-oops running...
acpid running...
atd running...
auditd running...
automount running...
crond running...
cupsd running...
hald running...
mcelog running...
messagebus running...
MySQL but
rpc.statd running...
nrpe running...
ntpd running...
rpcbind running...
openssh-daemon running...

Solution

  • Yes.
    There is way. Let me explain this.
    Though the given sample data falls into the category of 'unstructured', we always look for 'some thing' in it.
    Having said that we look for a pattern, say line or lines having the required data you are looking into!
    To achieve this we need to identify the 'pattern' from the sample data and use appropriate 'RegEx' (regular expression) to pull it.
    Also, Pig comes with built-in jar 'piggybank' to support various pre-defined file formats including unstructured one like you said.
    Try with 'RegExLoader' class that is part of the below package from PIG's piggybank!!! (Package org.apache.pig.piggybank.storage) https://pig.apache.org/docs/r0.15.0/api/

    Also, let all know the exact output you are looking into.