Good packages for large panels with many 'fixed effects' / individual specific intercepts

I have made very good experiences with plm() in small datasets like this one:

library("data.table")
library("plm")
set.seed(123)
smalldata <- data.table(id = rep(1:100, each = 10), x = rnorm(1000))
smalldata[, y := x + 0.3*x + (rnorm(1000)/100)]

plm(smalldata,
    formula = y ~ x,
    effect = c("individual"),
    model = c("within"),
    index = "id")

But if I have a larger dataset, I quickly run into ram issues and things become very(!) slow

set.seed(123)
largedata <- data.table(id = rep(1:100000, each = 10), x = rnorm(1000000))
largedata[, y := x + 0.3*x + (rnorm(1000000)/100)]

time_pre <- Sys.time()
plm(largedata,
    formula = y ~ x,
    effect = c("individual"),
    model = c("within"),
    index = "id")
time_post <- Sys.time()
time_post-time_pre

This already takes longer then a minute on my machine and my real life datasets are much larger than 1 million observations. If you want to try it out, just add a few 0's and wait until your RAM fills up and the code simply doesn't run through anymore. Of course the problem gets worse faster as more level of fixed effects are added (you can do that in plm using dummy regressions with as.factor()) etc pp but the problem is already visible in this simple example.

This slowness is also odd because the plm() documentation tells me that it uses within transformations which should be fast. They must be doing that using complicated functions because here is that same 'fixed effects' panel using a data.table() within transformations and then lm():

time_pre_dt <- Sys.time()
largedata[, x_demeaned := x - mean(x), by = id]
largedata[, y_demeaned := y - mean(y), by = id]
lm(largedata, formula = y_demeaned ~ -1 + x_demeaned)
time_post_dt <- Sys.time()
time_post_dt - time_pre_dt
as.double(time_post-time_pre, unit = "secs")/as.double(time_post_dt - time_pre_dt, unit = "secs")

On my machine, this is 70-125 times faster.

So the ingredients for much faster panels are there but plm() does not seem to take advantage of them. Are there any other panel packages that are more useful for large 'fixed effects' panels? I am hesitant of workarounds (like this within transformation) because I am not always sure how standard errors are calculated in different versions of coeftest() etc (same problem in Stata with small sample adjustments etc). I'm looking for developed packages and would like to avoid coding everything from scratch.

edit:

for benchmarking with the answer of @Helix123 below, here the attempt with plm() 'fast':

install.packages("collapse")
install.packages("fixest")
install.packages("lfe")

# (restart R session)

options("plm.fast" = TRUE)

time_pre_fast <- Sys.time()
plm(largedata,
    formula = y ~ x,
    effect = c("individual"),
    model = c("within"),
    index = "id")
time_post_fast <- Sys.time()

time_post_fast - time_pre_fast

as.double(time_post_fast - time_pre_fast, unit = "secs")/as.double(time_post_dt - time_pre_dt, unit = "secs")

The within transformation example is still 90 times faster on my machine. However, felm() does speed up things quite considerably! Here the example:

library("lfe")

time_pre_felm <- Sys.time()
felm(largedata, formula = y ~ x | id)
time_post_felm <- Sys.time()

as.double(time_post_felm - time_pre_felm, unit = "secs")/as.double(time_post_dt - time_pre_dt, unit = "secs")

This takes only twice as long as the data.table() within transformation followed by lm()

feols() from the fixest package is the winner, see answer below.

Solution

Based on @Helix123's hint to look into the fixest package, here is the answer:

library("fixest")

time_pre_fixest <- Sys.time()
feols(y ~ x | id, largedata)
time_post_fixest <- Sys.time()

as.double(time_post_fixest - time_pre_fixest, unit = "secs")/as.double(time_post_dt - time_pre_dt, unit = "secs")

Takes a tenth of the time of the fastest version in my question.

edit: You can still run into RAM issues of course but they are much less likely.