log.txt
will be as below, which are the ID data with its own timestamp (detection_time) that will continuously update in this log.txt file. The ID data will be unpredictable number. It could be from 0000-9999 and the same ID could be appeared in the log.txt again.
My goal is to filter the ID that appears again in the log.txt
within 15 sec from its first appearance by using shell script. Can anyone help me with this?
ID = 4231
detection_time = 1595556730
ID = 3661
detection_time = 1595556731
ID = 2654
detection_time = 1595556732
ID = 3661
detection_time = 1595556733
To be more clear, from log.txt
above, the ID 3661 first appear at time 1595556731 and then appear again at 1595556733 which is just 2 sec after the first appearance. So it is matched to my condition which is want the ID that appear again within 15sec. I would like this ID 3661 to be filtered by my shell script
The output after running the shell script will be
ID = 3661
My problem is I don't know how to develop the programming algorithm in shell script.
Heres what i try by using ID_new
and ID_previous
variable but ID_previous=$(ID_new) detection_previous=$(detection_new)
are not working
input="/tmp/log.txt"
ID_previous=""
detection_previous=""
while IFS= read -r line
do
ID_new=$(echo "$line" | grep "ID =" | awk -F " " '{print $3}')
echo $ID_new
detection_new=$(echo "$line" | grep "detection_time =" | awk -F " " '{print $3}')
echo $detection_new
ID_previous=$(ID_new)
detection_previous=$(detection_new)
done < "$input"
EDIT
log.txt
actually the data is in a set contain ID, detection_time, Age and Height. Sorry for not mention this in the first place
ID = 4231
detection_time = 1595556730
Age = 25
Height = 182
ID = 3661
detection_time = 1595556731
Age = 24
Height = 182
ID = 2654
detection_time = 1595556732
Age = 22
Height = 184
ID = 3661
detection_time = 1595556733
Age = 27
Height = 175
ID = 3852
detection_time = 1595556734
Age = 26
Height = 156
ID = 4231
detection_time = 1595556735
Age = 24
Height = 184
I've tried the Awk solution. the result is
4231
3661
2654
3852
4231
which are all the IDs in the log.txt
The correct output should be 4231
3661
From this, I think Age and Height data might affect to the Awk solution because its inserted between the focused data which are ID and detection_time.
Assuming the time stamps in the log file are increasing monotonically, you only need a single pass with Awk. For each id
, keep track of the latest time it was reported (use an associative array t
where the key is the id
and the value is the latest timestamp). If you see the same id
again and the difference between the time stamps is less than 15, report it.
For good measure, keep a second array p
of the ones we have already reported so we don't report them twice.
awk '/^ID = / { id=$3; next }
# Skip if this line is neither ID nor detection_time
!/^detection_time = / { next }
(id in t) && (t[id] >= $3-15) && !(p[id]) { print id; ++p[id]; next }
{ t[id] = $3 }' /tmp/log.txt
If you really insist on doing this natively in Bash, I would refactor your attempt to
declare -A dtime printed
while read -r field _ value
do
case $field in
ID) id=$value;;
detection_time)
if [[ dtime["$id"] -ge $((value - 15)) ]]; then
[[ -v printed["$id"] ]] || echo "$id"
printed["$id"]=1
fi
dtime["$id"]=$value ;;
esac
done < /tmp/log.txt
Notice how read -r
can easily split a line on whitespace just as well as Awk can, as long as you know how many fields you can expect. But while read -r
is typically an order of magnitude slower than Awk, and you'll have to agree that the Awk attempt is more succinct and elegant, as well as portable to older systems.
(Associative arrays were introduced in Bash 4.)
Tangentially, anything that looks like grep 'x' | awk '{ y }'
can be refactored to awk '/x/ { y }'
; see also useless use of grep
.
Also, notice that $(foo)
attempts to run foo
as a command. To simply refer to the value of the variable foo
, the syntax is $foo
(or, optionally, ${foo}
, but the braces add no value here). Usually you will want to double-quote the expansion "$foo"
; see also When to wrap quotes around a shell variable
Your script would only remember a single earlier event; the associative array allows us to remember all the ID
values we have seen previously (until we run out of memory).
Nothing prevents us from using human-readable variable names in Awk either; feel free to substitute printed
for p
and dtime
for t
to have complete parity with the Bash alternative.