I have following file and I want to sort it alphanumerically based on 6 th column such that an E1 is followed by I1 and then E2 and so on of a specific ID before the ' : ', when I do sort -V -k6 file it puts all the ID:Is at the end and not where they should be.However when I do sort -k6 it does put the Es and Is of the IDs together but with some IDs belonging to different series interspersed (I have highlighted them here), how can I get the sorting such that no two IDs are mixed and the column is in the order it should be:
chr1 259017 259121 104 - ENSG00000228463:E2
chr1 259122 267095 7973 - ENSG00000228463:I1
chr1 267096 267253 157 - ENSG00000228463:E1
chr1 317720 317781 61 + ENSG00000237094:E1
chr1 317782 320161 2379 + ENSG00000237094:I1
chr1 320162 320653 491 + ENSG00000237094:E2
chr1 320654 320880 226 + ENSG00000237094:I2
chr1 320881 320938 57 + ENSG00000237094:E3
chr1 320939 321031 92 + ENSG00000237094:I3
chr1 321032 321290 258 + ENSG00000237094:E4
chr1 321291 322037 746 + ENSG00000237094:I4
chr1 322038 322228 190 + ENSG00000237094:E5
chr1 322229 322671 442 + ENSG00000237094:I5
chr1 322672 323073 401 + ENSG00000237094:E6
chr1 323074 323860 786 + ENSG00000237094:I6
chr1 323861 324060 199 + ENSG00000237094:E7
chr1 324061 324287 226 + ENSG00000237094:I7
chr1 324288 324345 57 + ENSG00000237094:E8
chr1 324346 324438 92 + ENSG00000237094:I8
chr1 324439 326514 2075 + ENSG00000237094:E9
**chr1 326096 326569 473 + ENSG00000250575:E1**
chr1 326515 327551 1036 + ENSG00000237094:I9
**chr1 326570 327347 777 + ENSG00000250575:I1**
**chr1 327348 328112 764 + ENSG00000250575:E2**
chr1 327552 328453 901 + ENSG00000237094:E10
chr1 328454 329783 1329 + ENSG00000237094:I10
**chr1 329431 329620 189 - ENSG00000233653:E2**
**chr1 329621 329949 328 - ENSG00000233653:I1**
chr1 329784 329976 192 + ENSG00000237094:E11
Original answer:
sed 's/:[EI]/&_ /' foo.txt | #separate the number at the end with a space
sort -k6 | sort -n -k7 | #sort by code, then by [EI] number
sed 's/_ //' #remove the underscore space
I like to do things like this by 'protecting' strings with a placeholder to isolate what I'm interested in, then replacing them later.
Closer:
sed 's/:[EI]/_ &_ /' foo.txt | sort -n -k8 | sort -k6,6 | sed 's/_ //g'
But this naively assumes that sort works in a very specific way that it doesn't... so sometimes E2 will come before E1...
I'm not sure it can be done with sort alone, awk might be the way to go...