Search code examples
jsonjq

How to add consecutive numbers of messages in jq command


I couldn't figure out how to do this, so I asked here.

[
  {
    "message" : "This is",
    "Identity" : "1"
  },
  {
    "message" : " a car",
    "Identity" : "2"
  },
  {
    "message" : "My Job",
    "Identity" : "11"
  },
  {
    "message" : "Is a ",
    "Identity" : "12"
  },
  {
    "message" : "Student",
    "Identity" : "13"
  }
]



1 and 2, 11 and 12 are made up of consecutive numbers.
I would like to group messages made up of consecutive numbers like this and add them starting from the smallest order.
[
  {
    "message" : "This is a car"
  },
  {
    "message" : "My Job Is a Student"
  }
]

I can't remember the mathematical formula, so I would appreciate if you could help me with the formula.




# cat a.json
{
    "events": [
        {
            "message": "# Time: 2024-09-11T17:42:56.069052Z\n# User@Host: test_account[test_account] @  [127.0.0.1]  Id: 1760882\n# Query_time: 1.090218  Lock_time: 0.000012 Rows_sent: 0  Rows_examined: 0 Thread_id: 1760883 Errno: 0 Killed: 0 Bytes_received: 0 Bytes_sent: 0 Read_first: 0 Read_last: 0 Read_key: 0 Read_next: 0 Read_prev: 0 Read_rnd: 0 Read_rnd_next: 0 Sort_merge_passes: 0 Sort_range_count: 0 Sort_rows: 0 Sort_scan_count: 0 Created_tmp_disk_tables: 0 Created_tmp_tables: 0 Start: 2024-09-11T17:42:54.978834Z End: 2024-09-11T17:42:56.069052Z\nuse testdb;\nSET timestamp=1726076574;", 
            "eventId": "38492793916066929644073647519700311240525639687799308288"
        }, 
        {
            "message": "INSERT INTO test_table (c1, c2, c3) ",
            "eventId": "38492793916066929644073647519700311240525639687799308289"
        },
        {
            "message": "# Time: 2024-09-11T17:42:56.069052Z\n# User@Host: test_account[test_account] @  [127.0.0.1]  Id: 1760883\n# Query_time: 1.090218  Lock_time: 0.000012 Rows_sent: 0  Rows_examined: 0 Thread_id: 1760883 Errno: 0 Killed: 0 Bytes_received: 0 Bytes_sent: 0 Read_first: 0 Read_last: 0 Read_key: 0 Read_next: 0 Read_prev: 0 Read_rnd: 0 Read_rnd_next: 0 Sort_merge_passes: 0 Sort_range_count: 0 Sort_rows: 0 Sort_scan_count: 0 Created_tmp_disk_tables: 0 Created_tmp_tables: 0 Start: 2024-09-11T17:42:54.978834Z End: 2024-09-11T17:42:56.069052Z\nuse testdb;\nSET timestamp=1726076574;", 
            "eventId": "38492793916066929644073647519700311240525639687799308291"
        }, 
        {
            "message": "INSERT INTO test_table (c1, c2, c3) ",
            "eventId": "38492793916066929644073647519700311240525639687799308292"
        },
        {
            "message": "# Time: 2024-09-11T17:42:56.069052Z\n# User@Host: test_account[test_account] @  [127.0.0.1]  Id: 1760884\n# Query_time: 1.090218  Lock_time: 0.000012 Rows_sent: 0  Rows_examined: 0 Thread_id: 1760883 Errno: 0 Killed: 0 Bytes_received: 0 Bytes_sent: 0 Read_first: 0 Read_last: 0 Read_key: 0 Read_next: 0 Read_prev: 0 Read_rnd: 0 Read_rnd_next: 0 Sort_merge_passes: 0 Sort_range_count: 0 Sort_rows: 0 Sort_scan_count: 0 Created_tmp_disk_tables: 0 Created_tmp_tables: 0 Start: 2024-09-11T17:42:54.978834Z End: 2024-09-11T17:42:56.069052Z\nuse testdb;\nSET timestamp=1726076574;", 
            "eventId": "38492793916066929644073647519700311240525639687799308293"
        }
    ]
}

# cat b.json
{
    "events": [
        {
            "message": "# Time: 2024-09-11T16:00:09.408435Z\n# User@Host: test_account[test_account] @  [127.0.0.1]  Id: 5436754\n# Query_time: 0.612487  Lock_time: 0.000003 Rows_sent: 1  Rows_examined: 130154 Thread_id: 5436754 Errno: 0 Killed: 0 Bytes_received: 0 Bytes_sent: 0 Read_first: 0 Read_last: 0 Read_key: 65078 Read_next: 130154 Read_prev: 0 Read_rnd: 0 Read_rnd_next: 0 Sort_merge_passes: 0 Sort_range_count: 0 Sort_rows: 0 Sort_scan_count: 0 Created_tmp_disk_tables: 0 Created_tmp_tables: 0 Start: 2024-09-11T16:00:08.795948Z End: 2024-09-11T16:00:09.408435Z\nuse testdb;\nSET timestamp=1726070408;\nselect * from test_table;",
            "eventId": "38492656394931242928022607514130903113177258647838851072"
        }, 
        {
            "message": "# Time: 2024-09-11T16:00:09.408435Z\n# User@Host: test_account[test_account] @  [127.0.0.1]  Id: 5436201\n# Query_time: 0.610625  Lock_time: 0.000003 Rows_sent: 1  Rows_examined: 130154 Thread_id: 5436201 Errno: 0 Killed: 0 Bytes_received: 0 Bytes_sent: 0 Read_first: 0 Read_last: 0 Read_key: 65078 Read_next: 130154 Read_prev: 0 Read_rnd: 0 Read_rnd_next: 0 Sort_merge_passes: 0 Sort_range_count: 0 Sort_rows: 0 Sort_scan_count: 0 Created_tmp_disk_tables: 0 Created_tmp_tables: 0 Start: 2024-09-11T16:00:08.797810Z End: 2024-09-11T16:00:09.408435Z\nSET timestamp=1726070408;\nselect * from test_table;",
            "eventId": "38492656394931242928022607514130903113177258647838851073"
        }, 
        {
            "message": "# Time: 2024-09-11T16:00:12.461401Z\n# User@Host: test_account[test_account] @  [127.0.0.1]  Id: 5436209\n# Query_time: 0.528123  Lock_time: 0.000003 Rows_sent: 1  Rows_examined: 130156 Thread_id: 5436209 Errno: 0 Killed: 0 Bytes_received: 0 Bytes_sent: 0 Read_first: 0 Read_last: 0 Read_key: 65079 Read_next: 130156 Read_prev: 0 Read_rnd: 0 Read_rnd_next: 0 Sort_merge_passes: 0 Sort_range_count: 0 Sort_rows: 0 Sort_scan_count: 0 Created_tmp_disk_tables: 0 Created_tmp_tables: 0 Start: 2024-09-11T16:00:11.933278Z End: 2024-09-11T16:00:12.461401Z\nSET timestamp=1726070411;\nselect * from test_table;",
            "eventId": "38492656463015418019136599965239450999572706325597061122"
        }
    ]
}

This is when I ran the script provided in the guide.
The ending digits 88, 89 and 91, 92, and 93 of the eventId of a.json are consecutive.
So I should get 2 results, but I get 3.
# cat c.sh
cat a.json | /usr/local/bin/jq -r '
.events | map(.eventId |= tonumber) | sort_by(.eventId) | reduce .[1:][] as $i (.[:1];
    if ((last.eventId + 1 != $i.eventId)) then . += [$i]
    else last.eventId = last.eventID + 1 | last.message += " " + $i.message end
)'
#) | del(.[].eventId)' 

echo "=========================================================================================="
echo "=========================================================================================="
echo "=========================================================================================="

cat b.json | /usr/local/bin/jq -r '
.events | map(.eventId |= tonumber) | sort_by(.eventId) | reduce .[1:][] as $i (.[:1];
    if ((last.eventId + 1 != $i.eventId)) then . += [$i]
    else last.eventId = last.eventID + 1 | last.message += " " + $i.message end
)'
#) | del(.[].eventId)' 


# sh c.sh
[
  {
    "message": "# Time: 2024-09-11T17:42:56.069052Z\n# User@Host: test_account[test_account] @  [127.0.0.1]  Id: 1760882\n# Query_time: 1.090218  Lock_time: 0.000012 Rows_sent: 0  Rows_examined: 0 Thread_id: 1760883 Errno: 0 Killed: 0 Bytes_received: 0 Bytes_sent: 0 Read_first: 0 Read_last: 0 Read_key: 0 Read_next: 0 Read_prev: 0 Read_rnd: 0 Read_rnd_next: 0 Sort_merge_passes: 0 Sort_range_count: 0 Sort_rows: 0 Sort_scan_count: 0 Created_tmp_disk_tables: 0 Created_tmp_tables: 0 Start: 2024-09-11T17:42:54.978834Z End: 2024-09-11T17:42:56.069052Z\nuse testdb;\nSET timestamp=1726076574; INSERT INTO test_table (c1, c2, c3) ",
    "eventId": 1
  },
  {
    "message": "# Time: 2024-09-11T17:42:56.069052Z\n# User@Host: test_account[test_account] @  [127.0.0.1]  Id: 1760883\n# Query_time: 1.090218  Lock_time: 0.000012 Rows_sent: 0  Rows_examined: 0 Thread_id: 1760883 Errno: 0 Killed: 0 Bytes_received: 0 Bytes_sent: 0 Read_first: 0 Read_last: 0 Read_key: 0 Read_next: 0 Read_prev: 0 Read_rnd: 0 Read_rnd_next: 0 Sort_merge_passes: 0 Sort_range_count: 0 Sort_rows: 0 Sort_scan_count: 0 Created_tmp_disk_tables: 0 Created_tmp_tables: 0 Start: 2024-09-11T17:42:54.978834Z End: 2024-09-11T17:42:56.069052Z\nuse testdb;\nSET timestamp=1726076574; INSERT INTO test_table (c1, c2, c3) ",
    "eventId": 1
  },
  {
    "message": "# Time: 2024-09-11T17:42:56.069052Z\n# User@Host: test_account[test_account] @  [127.0.0.1]  Id: 1760884\n# Query_time: 1.090218  Lock_time: 0.000012 Rows_sent: 0  Rows_examined: 0 Thread_id: 1760883 Errno: 0 Killed: 0 Bytes_received: 0 Bytes_sent: 0 Read_first: 0 Read_last: 0 Read_key: 0 Read_next: 0 Read_prev: 0 Read_rnd: 0 Read_rnd_next: 0 Sort_merge_passes: 0 Sort_range_count: 0 Sort_rows: 0 Sort_scan_count: 0 Created_tmp_disk_tables: 0 Created_tmp_tables: 0 Start: 2024-09-11T17:42:54.978834Z End: 2024-09-11T17:42:56.069052Z\nuse testdb;\nSET timestamp=1726076574;",
    "eventId": 38492793916066929644073647519700311240525639687799308293
  }
]
==========================================================================================
==========================================================================================
==========================================================================================
[
  {
    "message": "# Time: 2024-09-11T16:00:09.408435Z\n# User@Host: test_account[test_account] @  [127.0.0.1]  Id: 5436754\n# Query_time: 0.612487  Lock_time: 0.000003 Rows_sent: 1  Rows_examined: 130154 Thread_id: 5436754 Errno: 0 Killed: 0 Bytes_received: 0 Bytes_sent: 0 Read_first: 0 Read_last: 0 Read_key: 65078 Read_next: 130154 Read_prev: 0 Read_rnd: 0 Read_rnd_next: 0 Sort_merge_passes: 0 Sort_range_count: 0 Sort_rows: 0 Sort_scan_count: 0 Created_tmp_disk_tables: 0 Created_tmp_tables: 0 Start: 2024-09-11T16:00:08.795948Z End: 2024-09-11T16:00:09.408435Z\nuse testdb;\nSET timestamp=1726070408;\nselect * from test_table; # Time: 2024-09-11T16:00:09.408435Z\n# User@Host: test_account[test_account] @  [127.0.0.1]  Id: 5436201\n# Query_time: 0.610625  Lock_time: 0.000003 Rows_sent: 1  Rows_examined: 130154 Thread_id: 5436201 Errno: 0 Killed: 0 Bytes_received: 0 Bytes_sent: 0 Read_first: 0 Read_last: 0 Read_key: 65078 Read_next: 130154 Read_prev: 0 Read_rnd: 0 Read_rnd_next: 0 Sort_merge_passes: 0 Sort_range_count: 0 Sort_rows: 0 Sort_scan_count: 0 Created_tmp_disk_tables: 0 Created_tmp_tables: 0 Start: 2024-09-11T16:00:08.797810Z End: 2024-09-11T16:00:09.408435Z\nSET timestamp=1726070408;\nselect * from test_table;",
    "eventId": 1
  },
  {
    "message": "# Time: 2024-09-11T16:00:12.461401Z\n# User@Host: test_account[test_account] @  [127.0.0.1]  Id: 5436209\n# Query_time: 0.528123  Lock_time: 0.000003 Rows_sent: 1  Rows_examined: 130156 Thread_id: 5436209 Errno: 0 Killed: 0 Bytes_received: 0 Bytes_sent: 0 Read_first: 0 Read_last: 0 Read_key: 65079 Read_next: 130156 Read_prev: 0 Read_rnd: 0 Read_rnd_next: 0 Sort_merge_passes: 0 Sort_range_count: 0 Sort_rows: 0 Sort_scan_count: 0 Created_tmp_disk_tables: 0 Created_tmp_tables: 0 Start: 2024-09-11T16:00:11.933278Z End: 2024-09-11T16:00:12.461401Z\nSET timestamp=1726070411;\nselect * from test_table;",
    "eventId": 38492656463015418019136599965239450999572706325597061122
  }
]

Second, it has continuous eventID values, but I tried to separate them when the message starts at # Time.
a.json does not apply, but b.json produces the desired result.
For this reason, version 1.7 does not seem to be able to calculate large numbers well.

# cat c.sh
#!/bin/bash
cat a.json | /usr/local/bin/jq -r '
.events | map(.eventId |= tonumber) | sort_by(.eventId) | reduce .[1:][] as $i (.[:1];
    if ((last.eventId + 1 != $i.eventId) or (last.message | contains("Time"))) then . += [$i]
    else last.eventId = last.eventID + 1 | last.message += " " + $i.message end
)'
#) | del(.[].eventId)' 

echo "=========================================================================================="
echo "=========================================================================================="
echo "=========================================================================================="

cat b.json | /usr/local/bin/jq -r '
.events | map(.eventId |= tonumber) | sort_by(.eventId) | reduce .[1:][] as $i (.[:1];
    if ((last.eventId + 1 != $i.eventId) or (last.message | contains("Time"))) then . += [$i]
    else last.eventId = last.eventID + 1 | last.message += " " + $i.message end
)'
#) | del(.[].eventId)' 



# sh c.sh
[
  {
    "message": "# Time: 2024-09-11T17:42:56.069052Z\n# User@Host: test_account[test_account] @  [127.0.0.1]  Id: 1760882\n# Query_time: 1.090218  Lock_time: 0.000012 Rows_sent: 0  Rows_examined: 0 Thread_id: 1760883 Errno: 0 Killed: 0 Bytes_received: 0 Bytes_sent: 0 Read_first: 0 Read_last: 0 Read_key: 0 Read_next: 0 Read_prev: 0 Read_rnd: 0 Read_rnd_next: 0 Sort_merge_passes: 0 Sort_range_count: 0 Sort_rows: 0 Sort_scan_count: 0 Created_tmp_disk_tables: 0 Created_tmp_tables: 0 Start: 2024-09-11T17:42:54.978834Z End: 2024-09-11T17:42:56.069052Z\nuse testdb;\nSET timestamp=1726076574;",
    "eventId": 38492793916066929644073647519700311240525639687799308288
  },
  {
    "message": "INSERT INTO test_table (c1, c2, c3)  # Time: 2024-09-11T17:42:56.069052Z\n# User@Host: test_account[test_account] @  [127.0.0.1]  Id: 1760883\n# Query_time: 1.090218  Lock_time: 0.000012 Rows_sent: 0  Rows_examined: 0 Thread_id: 1760883 Errno: 0 Killed: 0 Bytes_received: 0 Bytes_sent: 0 Read_first: 0 Read_last: 0 Read_key: 0 Read_next: 0 Read_prev: 0 Read_rnd: 0 Read_rnd_next: 0 Sort_merge_passes: 0 Sort_range_count: 0 Sort_rows: 0 Sort_scan_count: 0 Created_tmp_disk_tables: 0 Created_tmp_tables: 0 Start: 2024-09-11T17:42:54.978834Z End: 2024-09-11T17:42:56.069052Z\nuse testdb;\nSET timestamp=1726076574;",
    "eventId": 1
  },
  {
    "message": "INSERT INTO test_table (c1, c2, c3)  # Time: 2024-09-11T17:42:56.069052Z\n# User@Host: test_account[test_account] @  [127.0.0.1]  Id: 1760884\n# Query_time: 1.090218  Lock_time: 0.000012 Rows_sent: 0  Rows_examined: 0 Thread_id: 1760883 Errno: 0 Killed: 0 Bytes_received: 0 Bytes_sent: 0 Read_first: 0 Read_last: 0 Read_key: 0 Read_next: 0 Read_prev: 0 Read_rnd: 0 Read_rnd_next: 0 Sort_merge_passes: 0 Sort_range_count: 0 Sort_rows: 0 Sort_scan_count: 0 Created_tmp_disk_tables: 0 Created_tmp_tables: 0 Start: 2024-09-11T17:42:54.978834Z End: 2024-09-11T17:42:56.069052Z\nuse testdb;\nSET timestamp=1726076574;",
    "eventId": 1
  }
]
==========================================================================================
==========================================================================================
==========================================================================================
[
  {
    "message": "# Time: 2024-09-11T16:00:09.408435Z\n# User@Host: test_account[test_account] @  [127.0.0.1]  Id: 5436754\n# Query_time: 0.612487  Lock_time: 0.000003 Rows_sent: 1  Rows_examined: 130154 Thread_id: 5436754 Errno: 0 Killed: 0 Bytes_received: 0 Bytes_sent: 0 Read_first: 0 Read_last: 0 Read_key: 65078 Read_next: 130154 Read_prev: 0 Read_rnd: 0 Read_rnd_next: 0 Sort_merge_passes: 0 Sort_range_count: 0 Sort_rows: 0 Sort_scan_count: 0 Created_tmp_disk_tables: 0 Created_tmp_tables: 0 Start: 2024-09-11T16:00:08.795948Z End: 2024-09-11T16:00:09.408435Z\nuse testdb;\nSET timestamp=1726070408;\nselect * from test_table;",
    "eventId": 38492656394931242928022607514130903113177258647838851072
  },
  {
    "message": "# Time: 2024-09-11T16:00:09.408435Z\n# User@Host: test_account[test_account] @  [127.0.0.1]  Id: 5436201\n# Query_time: 0.610625  Lock_time: 0.000003 Rows_sent: 1  Rows_examined: 130154 Thread_id: 5436201 Errno: 0 Killed: 0 Bytes_received: 0 Bytes_sent: 0 Read_first: 0 Read_last: 0 Read_key: 65078 Read_next: 130154 Read_prev: 0 Read_rnd: 0 Read_rnd_next: 0 Sort_merge_passes: 0 Sort_range_count: 0 Sort_rows: 0 Sort_scan_count: 0 Created_tmp_disk_tables: 0 Created_tmp_tables: 0 Start: 2024-09-11T16:00:08.797810Z End: 2024-09-11T16:00:09.408435Z\nSET timestamp=1726070408;\nselect * from test_table;",
    "eventId": 38492656394931242928022607514130903113177258647838851073
  },
  {
    "message": "# Time: 2024-09-11T16:00:12.461401Z\n# User@Host: test_account[test_account] @  [127.0.0.1]  Id: 5436209\n# Query_time: 0.528123  Lock_time: 0.000003 Rows_sent: 1  Rows_examined: 130156 Thread_id: 5436209 Errno: 0 Killed: 0 Bytes_received: 0 Bytes_sent: 0 Read_first: 0 Read_last: 0 Read_key: 65079 Read_next: 130156 Read_prev: 0 Read_rnd: 0 Read_rnd_next: 0 Sort_merge_passes: 0 Sort_range_count: 0 Sort_rows: 0 Sort_scan_count: 0 Created_tmp_disk_tables: 0 Created_tmp_tables: 0 Start: 2024-09-11T16:00:11.933278Z End: 2024-09-11T16:00:12.461401Z\nSET timestamp=1726070411;\nselect * from test_table;",
    "eventId": 38492656463015418019136599965239450999572706325597061122
  }
]

However, when I saw it separated, it didn't seem like it would work, so I'm continuing to test it.
Thanks a lot for your help.

In conclusion, a consecutive number of eventIds are grouped. At this time, if the message field is # Time, a new field is created.


Solution

  • Convert all .Identity strings into numbers, and remove leading or trailing spaces from the .message fields (for clean spacing as in the desired output). Then iterate over the array and either append an item's message to the last message (with a re-included single space in between), or the item itself to the array, depending on the difference in their .Identity fields. Finally, remove all leftover .Identity fields.

    map(.Identity |= tonumber | .message |= (ltrimstr(" ") | rtrimstr(" ")))
    | reduce .[] as $i ([];
        if last.Identity != $i.Identity - 1 then . += [$i] else
          last.Identity += 1 | last.message += " " + $i.message
        end
      )
    | del(.[].Identity)
    
    [
      {
        "message": "This is a car"
      },
      {
        "message": "My Job Is a Student"
      }
    ]
    

    Demo

    This assumes that the input array is already sorted. If not, perform the sorting initially by starting with map(.Identity |= tonumber | .message |= (…)) | sort_by(.Identity) | reduce … instead.