Context: At work, I am compiling a set of packages for Intel64 and ARM64, and bundling them up into Linux packages (.rpm, .deb, .apk). I'm building out a whole pipeline to enable this, which will feed into our Artifactory installation.
We're building out self-hosted, native ARM64 runners for our GitHub Enterprise Server system, to pair alongside our existing self-hosted Intel64 runners. Our runners are built atop Amazon EC2 instances. Both are compute-optimized.
CPU Arch | Instance type | vCPUs | Memory | uname -m |
---|---|---|---|---|
amd64 /x86_64 |
c5.2xlarge |
8 | 16 GB | x86_64 |
arm64 /aarch64 |
c6g.2xlarge |
8 | 16 GB | aarch64 |
arm64 /aarch64 |
c7g.2xlarge |
8 | 16 GB | aarch64 |
Hosts are EKS clusters running EKS-optimized Amazon Linux 2.
Our "runners" are K8S/EKS pods (~Docker containers) that die/re-spawn after each individual workflow. The Docker image is a multi-platform image — same software, same configuration, multiple CPUs. The container OS is Ubuntu "Focal Fossa" 20.04 LTS.
Using our self-hosted GitHub Actions runners, I wrote the GHA workflow to download the source of a Rust project and compile it — once on the Intel runner, and once on the ARM64 runner. I run uname -m
and output the result as the first step in the workflow, and I see what I'm expecting. I also run file
against the compiled binary, and I also see what I'm expecting.
(I'm working very hard to have an apples-to-apples comparison here.)
I'm building https://github.com/lycheeverse/lychee as a test project of the pipeline. I've not (yet) tested other compilations, but this felt complex enough to put the new ARM64 runners through the paces.
Here is the build script (${ARCH}
is either x86-64
or aarch64
, as appropriate):
sudo apt-get -y update
sudo apt-get -y install --no-install-recommends \
build-essential \
ca-certificates \
curl \
file \
git \
gpg \
gpg-agent \
gzip \
libssl-dev \
openssh-client \
pkg-config \
software-properties-common \
tar \
wget \
;
# Install Rust
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y
source "${HOME}/.cargo/env"
# shellcheck disable=2154
wget --header "Authorization: Bearer ${GITHUB_TOKEN}" \
"https://github.com/lycheeverse/lychee/archive/refs/tags/v0.15.1.tar.gz"
tar zxvf "v0.15.1.tar.gz"
### Start measuring time
cd "lychee-0.15.1/" || true
cargo fetch --target="${ARCH}-unknown-linux-gnu" --locked
cargo install cargo-auditable --locked
# The `mold` linker is pre-installed.
mold -run cargo auditable build --timings --frozen --release
sudo install -Dm755 target/release/lychee -t "/usr/local/bin/"
### Stop measuring time
The Intel runner is one generation older than the Graviton/ARM64 runner. Same vCPUs, same amount of memory available. In the measured amount of time (script, above), I'm seeing these results (average of 5 builds):
c5
): 6m 18s (baseline)c6g
): 12m 2s (~2x)c7g
): 9m 36s (~1.5x)I was expecting parity between the two CPU architectures, or maybe a slight edge for ARM64 seeing that the Graviton instance is one generation newer. I also know that languages like Haskell are still working on bringing things to ARM64, and I wonder if the same is true for Rust.
For Rustaceans: are there parts of the Rust build pipeline that are not yet optimized for ARM64 on glibc-based Linuxes?
Next, I'm going to try building a significant project in Go, just to try another language that I know is optimized for ARM64, and attempt to rule-out issues with the Graviton processors. I'm also going to set up another representative Rust project to see if I get different results.
Update (same day): I performed the same test on a Go project (OpenTofu). It has over 300,000 lines of code, and also depends on several external dependencies that have to be downloaded and compiled.
c5
): 3m 35s (baseline)c7g
): 2m 18s (~0.64x)Here, arm64
was a 36% improvement over the 2-generation-old Intel instance. So I don't think my Rust issue is related to Amazon lying about Graviton price-performance. I think is has to do with something about Rust or lychee specifically.
To improve performance of Rust on Graviton, you should specify the use of the large-system extensions (LSE) via RUSTFLAGS
before building your project. LSE is included in the Armv8.1 architecture, and improves overall system throughput. Graviton 2 and beyond, based on the Neoverse CPU line, all include the LSE feature.
Enable LSE with the following line of code before your cargo build --release
command:
export RUSTFLAGS="-Ctarget-feature=+lse"
So your final code should look like this after your sudo apt-get installs:
# Install Rust
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y
source "${HOME}/.cargo/env"
# shellcheck disable=2154
wget --header "Authorization: Bearer ${GITHUB_TOKEN}" \
"https://github.com/lycheeverse/lychee/archive/refs/tags/v0.15.1.tar.gz"
tar zxvf "v0.15.1.tar.gz"
### Start measuring time
cd "lychee-0.15.1/" || true
cargo fetch --target="${ARCH}-unknown-linux-gnu" --locked
cargo install cargo-auditable --locked
# Set RUSTFLAGS to enable LSE
export RUSTFLAGS="-Ctarget-feature=+lse"
# The `mold` linker is pre-installed.
mold -run cargo auditable build --timings --frozen --release
sudo install -Dm755 target/release/lychee -t "/usr/local/bin/"
### Stop measuring time