Search code examples
javaperformancesortinglinked-listsearch-engine

An efficient algorithm to sort millions of words in a DoubleLinkedList


I'm working on a simple search engine in Java.

The search engine takes first as input the name of the directory that contains the files(txt files) to be searched , and inside each file many words.

The search engine then creates an inverted index for all the words encountered in the directory. The engine reads each file and insert each word in the doubleLinkedList.

The problem is, when I deal with a directory that contains 100 .txt files:

Indexing Time: ~201ms sort Time: 2463ms


sorting a directory contains 1000 files

Indexing Time: 2461ms sort Time: 922654ms


sorting a directory contains 10000 files

around 10 hours :(


  1. Is there any possible way to reduce the execution time?

  2. I used the insertion sort, so any suggestions for the sort algorithm?

The implementation of the DoubleLinkedList class

public class DoubleLinkedList<T> {
    private Node<T> head;
    private Node<T> current;

    public DoubleLinkedList(){
        head = current = null;
    }
    public boolean empty(){
        return head == null;
    }
    public boolean last(){
        return current.next==null;
    }
    public boolean first(){
        return current.previous == null;
    }
    public boolean full(){
        return false;
    }
    public void findFirst(){
        current = head;
    }
    public void findNext(){
        current = current.next;
    }
    public void findPrevious(){
        current = current.previous;
    }
    public T retrieve(){
        return current.data;
    }
    public void update(T val){
        current.data = val;
    }
    public void insert(T val){
        if(head == null){
            head = current = new Node<T>(val);
        }else{
            Node<T> tmp = new Node<T>(val);
            tmp.next = current.next;
            tmp.previous = current;
            if(current.next != null)
                current.next.previous = tmp;
            current.next = tmp;
            current = tmp;
        }
    }
    public void remove(){
        if(current == head){
            head = head.next;
            if(head!=null){
                head.previous=null;
            }
        }else{
            current.previous.next = current.next;
            if(current.next!=null){
                current.next.previous = current.previous;
            }
        }
        if(current.next == null){
            current = head;
        }else{
            current = current.next;
        }
    }


}

Solution

  • Insertion sort runs in (worst case) O(n^2) time.

    You could try something like Mergesort, QuickSort, or HeapSort, which run in (IIRC) O(nlogn) time. This will be much faster.