I have a text file which has approximately 25 millions of lines included. Data on the lines are similiar below:
12ertwrtrdfger
897erterterte
545ret3w2trewt345
968587563453345
89753647565344553
I want to analyze most frequent prefixes and suffixes. In example above you can see that 2 lines are starting with 897 and two lines are ending with 345, I want to see which prefix/suffixes are the most frequent. I also want to get the results as bar/piechart. Any data analysis program does that kind of analysis?
I've solved my problem with the code below:
sed abc.txt <abc.txt | cut -c 1-5 | sort | uniq -cd | sort -nbr > pre5.txt