Haskell vs C performance in a linspace function

Implementing a linspace function, which takes Start, End and optionally Steps as its arguments and outputs a list of evenly distributed (numerical) values with first element being Start and the last being End. Length of resulting list is (Steps + 1). It's similar to MATLAB's linspace.

One implementation is in Haskell, the other is in C. Guess who's billion times faster. Need help understanding what is wrong with the Haskell code.

Update for people, who asked some relevant information.
To compile C: gcc -o linspace linspace.c
gcc: 12.2.0
To compile Haskell: ghc -O2 -o linspace linspace.hs
ghc: 9.0.2

For benchmark use sh's time linspace 0 10 100000. My results are as follows.
C (test 1):

real    0m0.187s
user    0m0.076s
sys 0m0.012s

C (test 2):

real    0m0.132s
user    0m0.068s
sys 0m0.010s

Only two tests for C, because result is consistently within the range of these two.
Haskell:

*takes forever*

Haskell code.

{-# LANGUAGE BangPatterns #-}

import System.Environment

linspace :: Fractional a => a -> a -> Maybe Int -> [a]
linspace start end maybe = linspace' [] 0
  where
    stepCount =
        case maybe of
            Just i
                | i > 0 -> i
                | otherwise -> errorWithoutStackTrace "Number of steps must be > 0"
            Nothing -> 100
    stepSize = abs $ (end - start) / (fromIntegral stepCount)
    linspace' !acc n
        | n > stepCount = acc
        | otherwise =
            let !el = start + stepSize * (fromIntegral n)
             in linspace' (acc ++ [el]) (n + 1)

main = do
    argv <- getArgs
    let argc = length argv
    if argc /= 2 && argc /= 3
        then errorWithoutStackTrace "linspace <start> <end> [steps]"
        else return ()
    let start = read $ argv !! 0 :: Double
        end = read $ argv !! 1 :: Double
        steps =
            if argc == 3
                then Just $ read (argv !! 2) :: Maybe Int
                else Nothing
    putStrLn . show $ linspace start end steps

C code.

#include <stdio.h>
#include <stdlib.h>
#include <errno.h>
#include <math.h>

int
main (int argc, char **argv) {
    if (argc != 3 && argc != 4) {
        fprintf(stderr, "Usage: linspace <start> <end> [steps]\n");
        return EXIT_FAILURE;
    }
    errno = 0;
    double start, end;
    int steps;
    start = strtod(argv[1], NULL);
    if (errno) {
        perror(argv[1]);
        return EXIT_FAILURE;
    }
    errno = 0;
    end = strtod(argv[2], NULL);
    if (errno) {
        perror(argv[2]);
        return EXIT_FAILURE;
    }
    errno = 0;
    if (argc == 4) {
        steps = (int) strtol(argv[3], NULL, 10);
        if (errno) {
            perror(argv[3]);
            return EXIT_FAILURE;
        } else if (steps <= 0) {
            fprintf(stderr, "Number of steps must be > 0\n");
            return EXIT_FAILURE;
        }
    } else
        steps = 100;
    
    double stepSize;
    stepSize = fabs((start - end) / steps);
    double *vals;
    vals = malloc(sizeof(double) * (steps + 1));
    for (int i = 0; i <= steps; i++) {
        vals[i] = start + stepSize * i;
    }
    printf("[");
    for (int i = 0; i <= steps; i++) {
        if (i == steps)
            printf("%f", vals[i]);
        else
            printf("%f, ", vals[i]);
    }
    printf("]\n");
}

Don't mind that I'm taking absolute values. Lets say it's not indented to be used with negative values.

Solution

The main problem with your Haskell code is here:

linspace' !acc n
    | n > stepCount = acc
    | otherwise =
        let !el = start + stepSize * (fromIntegral n)
         in linspace' (acc ++ [el]) (n + 1)

This is an O(n^2) algorithm to construct the list because each acc ++ [e1] expression requires the entire acc list to be traversed before the [e1] singleton can be added to the end.

If you delete the definition of linspace' and replace the first linspace line with:

linspace start end maybe = [start + fromIntegral n * stepSize | n <- [0..stepCount]]

you'll find that the resulting Haskell program is a little more competitive with the C version, maybe around 5 times slower instead of a billion.