Search code examples
databasesublimetext2duplication

Sublime duplicate removal (Array)


Ran into an issue that I couldnt solve. Say I have a text file with thousands of entries such as:

12.04.2013 krispy
11.2.2013 krispy
11.2.2013 peter
11.2.2013 william
23.4.2014 krispy

How can I select and permute unique so that only 1 date for krispy is selected (doesnt matter which), so that the output is:

12.04.2013 krispy
11.2.2013 peter
11.2.2013 william

Meaning somehow I'm selecting the 2nd value after the " " space character and permuting it to remove the entire line.

Any help would be great, thanks!


Solution

  • This is (normally) not the job of an editor, you should do it in a programming language, but since you agreed to check also other solutions, let's go for it.

    bash

    Just use the sort method:

    sort -k2 -u filename -o filename
    

    This will sort lines based on the second column (-k2) and return only the ones that are unique for that column (-u). You read the file filename and output on the file filename (to overwrite it).

    If you are on a non-unix system you could use git-bash or cygwin to use unix commands.

    python

    Otherwise you could use the omnipresent python to accomplish this. Actually sublime text is written in python, so it's trivial to turn this code in a plugin for sublime.

    removedups.py

    from fileinput import input
    import sys
    
    seen = set()
    filename = sys.argv[1]
    
    for line in input(filename, inplace=True):
        date, name = line.split()
        if name not in seen:
            seen.add(name)
            print line,
    

    Then you can use it like this:

    python removedups.py filename