How to remove last character from a line only if it's a number in R or Linux

I have a list of ~28,000 gene transcripts, e.g.:

4R79.1b 
4R79.2b 
AC3.1a 
AC3.2 
AC3.3 
AC3.5a

I need to get gene names by removing the last character only if it's a letter. I've been googling for days and haven't found a solution that would remotely help, I must have missed something.

I thought there must be a simple solution but my best attempt was sed 's/[[:alpha:]]$//' transcripts.txt > genes.txt but it did not seem to do anything and the size of the file has not changed from the original.

Solution

With awk:

$ echo '4R79.1b 4R79.2b AC3.1a AC3.2 AC3.3 AC3.5a' | 
awk '{for(i=1;i<=NF;i++) sub(/[[:alpha:]]$/,"",$i)} 1'

Prints:

4R79.1 4R79.2 AC3.1 AC3.2 AC3.3 AC3.5

Or sed:

sed -E 's/[[:alpha:]]([[:space:]]|$)/\1/g'

For a new file, just redirect:

sed -E 's/[[:alpha:]]([[:space:]]|$)/\1/g' file > new_file

If you want to replace inplace you can use sed:

sed -i bak -E 's/[[:alpha:]]([[:space:]]|$)/\1/g' file

Or awk by redirecting to a new temp file then overwriting the original (which is what sed -i is doing...):

awk '{for(i=1;i<=NF;i++) sub(/[[:alpha:]]$/,"",$i)} 1' file > TEMP_FILE && mv -f TEMP_FILE file

You can also use GNU awk which has an inplace option as well.