Search code examples
arraysbashawksedtext-manipulation

Dynamic continuous numbering in bash


I have a text file that acts as a database for my script. The file has a column for an "ID" in example.

The database has a format of UID:Item Name:Quantity:Price:Date Added

cat FirstDB.txt

output:

0001:Fried Tarantula:45:100:2017-08-03
0002:Wasp Crackers:18:25:2017-08-04
0003:Century Egg:19:50:2017-08-05
0004:Haggis Flesh:20:90:2017-08-06
0005:Balut (Egg):85:15:2017-08-07
0006:Bear Claw:31:550:2017-08-08
0007:Durian Fruit:70:120:2017-08-09
0008:Live Cobra heart:20:375:2017-08-10
0009:Monkey Brains:30:200:2017-08-11
0010:Casu Marzu:25:1030:2017-08-12

Now the feature that i'm creating allows a certain user to put in new entries in the text file using the same format (I have already created this). However, the real trick here is that the user is also given the option to delete a certain item. In example the user would like to delete Century Egg from the text file the output would be this:

0001:Fried Tarantula:45:100:2017-08-03
0002:Wasp Crackers:18:25:2017-08-04
0004:Haggis Flesh:20:90:2017-08-06
0005:Balut (Egg):85:15:2017-08-07
0006:Bear Claw:31:550:2017-08-08
0007:Durian Fruit:70:120:2017-08-09
0008:Live Cobra heart:20:375:2017-08-10
0009:Monkey Brains:30:200:2017-08-11
0010:Casu Marzu:25:1030:2017-08-12

Then if the user wishes to add any item in the database I would like the user to take the UID 0003 since it's already free. How do I go about in achieving this? I'm stuck with it as of the moment. I believe awk can be useful here but i'm not keeping my options closed and i'm pretty new to scripting and awk im not really that good with awk yet. So if you would have a solution that would be using awk please guide me through it as well. Thank you very much!


Solution

  • awk to the rescue!

    assuming after edits the sequence will not be ordered anymore

    awk -F: '{a[$1+0]} END{for(i=1;i<=NR;i++) if(!(i in a)) print i}'
    

    will return you the first missing number from the first column (assumes numerical field).

    test

    create a shuffled list of formatted sequence numbers with "0003" missing.

    awk 'BEGIN{for(i=1;i<=10;i++) printf "%04d\n",i}' | shuf | awk '$1!=3' 
    
    0009
    0001
    0006
    0004
    0002
    0005
    0008
    0010
    0007
    

    pipe to the script

    ... | awk -F: '{a[$1+0]} END{for(i=1;i<=NR;i++) if(!(i in a)) print i}'
    

    returns as expected

    3
    

    however, this won't return anything if your list does not have gaps. To handle that case, you need to return the largest number + 1. With this modification the test case and script becomes

    $ awk 'BEGIN{for(i=1;i<=10;i++) printf "%04d\n",i}' | 
      shuf | 
      awk -F: '{a[$1+0]} $1>max{max=$1} 
           END {for(i=1;i<=NR;i++) if(!(i in a)) {print i; exit} 
                print max+1}'
    
    11
    

    Note if you're sorting the file after each record insertion you can avoid much of the complexity.