Search code examples
javascriptarraysperformancesortingprimitive

Why is it faster to sort javascript strings than numbers?


I was investigating a preconception I had that sorting strings in javascript would be slower than sorting integers. This is based on something I read (and now can't find), which appears to be erroneous, which stated that javascript stores strings as Array<Array<int>> instead of just Array<int>. The MDN documentation seems to contradict this:

JavaScript's String type is used to represent textual data. It is a set of "elements" of 16-bit unsigned integer values. Each element in the String occupies a position in the String. The first element is at index 0, the next at index 1, and so on. The length of a String is the number of elements in it.

If we define the "size" of an element (number or string) as the length of its textual representation (so size = String(x).length for either a numeric element or a string element), then for a large array of elements of the same size (one numeric, and one strings), I was expecting the sorting of the strings to be equal to or slightly slower than sorting the arrays, but when I ran a simple test (code below), it turned out that the strings were about twice as fast to sort.

I want to know what it is about strings and numbers, and how javascript does its sorting, that makes string sorting faster than numeric sorting. Perhaps there is something I am misunderstanding.

Results:

~/sandbox > node strings-vs-ints.js 10000 16
Sorting 10000 numbers of magnitude 10^16
Sorting 10000 strings of length 16
Numbers: 18
Strings: 9
~/sandbox > node strings-vs-ints.js 1000000 16
Sorting 1000000 numbers of magnitude 10^16
Sorting 1000000 strings of length 16
Numbers: 3418
Strings: 1529
~/sandbox > node strings-vs-ints.js 1000000 32
Sorting 1000000 numbers of magnitude 10^32
Sorting 1000000 strings of length 32
Numbers: 3634
Strings: 1474

Source:

"use strict";
const CHARSET = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghjijklmnopqrstuvwxyz0123456789:.";

function generateString(L) {
    const chars = [];
    while(chars.length < L) {
        chars.push(CHARSET[Math.floor(Math.random() * CHARSET.length)]);
    }
    return chars.join("");
}

function generateNumber(L) {
    return Math.floor(Math.random() * Math.pow(10, (L - 1))) + Math.pow(10, L - 1);
}

function generateList(generator, L, N) {
    const elements = [];
    while(elements.length < N) {
        elements.push(generator.call(null, L));
    }
    return elements;
}

function now() {
    return Date.now();
}

function getTime(baseTime) {
    return now() - baseTime;
}

function main(count, size) {
    console.log(`Sorting ${count} numbers of magnitude 10^${size}`);
    const numbers = generateList(generateNumber, size, count);
    const numBaseTime = now();
    numbers.sort();
    const numTime = getTime(numBaseTime);

    console.log(`Sorting ${count} strings of length ${size}`);
    const strings = generateList(generateString, size, count);
    const strBaseTime = now();
    strings.sort();
    const strTime = getTime(strBaseTime);

    console.log(`Numbers: ${numTime}\nStrings: ${strTime}`);
}

main(process.argv[2], process.argv[3]);

Solution

  • I was investigating a preconception I had that sorting strings in javascript would be slower than sorting integers.

    That's true, string comparison is way more costly than number comparison.

    This is based on something I read which stated that javascript stores strings as Array<Array<int>> instead of just Array<int>. The MDN documentation seems to contradict this.

    Yes, what you read seems to be erroneous indeed. Strings are just sequences of characters (each character being a 16 bit value), so they're usually stored as arrays of integers, or rather pointers to them. Your array of strings could indeed be treated as an array of an arrays though.

    When I ran a simple test, it turned out that the strings were about twice as fast to sort.

    The problem with your code is that you are sorting your numbers as strings, which casts every number to a string and then compares that. See How to sort an array of integers correctly. When you fix that, notice that the call to the comparison function still has quite a bit of overhead to the builtin string comparison, so if you really benchmarked relational operators (<, ==, >) on the different types I'd expect numbers to perform even better.