Matched Sampling package or function for large dataset

I need an R package or function that will allow me to match controls to cases for a large dataset, 5 million subjects. I have tried a few packages, my problems are summarized below. I only tried to match on a single covariate and I most likely will need to match on several.

Package MatchIt: The nearest neighbor, optimal, and genetic methods all just run for hours and hours. The "cem" method runs really quickly but I need to know which cases were matched/unmatched so I can do further analysis with the matched subset. Running the match.data() on the cem results only supplies the weights to be used in a regression and not the matched subset. The paired function in cem would work if I wanted one to one matching but I want to retain as many controls as possible.

matchControls() in the e1071 package: runs for a long time and them returns "not able to allocate vector of size 1352 GB"

Match() function from Matching package: Just runs and runs...

quickmatch() from the quickmatch package: It ran quickly but I am not sure I'm using the function correctly or how to extract the matched data from the "qm_matching" object returned. Below is my attempt using quickmatch on fake data.

library(MatchIt)
library(cem)
library(Matching)
library(rgenoud)
library(quickmatch)

set.seed(100)
control_df=data.frame(Group=factor("Control"),value=rnorm(1400000,95,2))
set.seed(101)
treatment_df=data.frame(Group=factor("Treatment"),value=c(rnorm(500000,92,2),rnorm(100000,50,5)))
dat=rbind(control_df,treatment_df)
covariate_balance(dat$Group, dat$value, matching = NULL,
                   normalize = TRUE, all_differences = TRUE)
my_distances <- distances(dat, dist_variables = c("value"))
matchedDat=quickmatch(my_distances,dat$Group )
matchedDat.df=data.frame(matchedDat)

Not sure what to do with the returned object. I think quickmatch may be the most viable option. The covariate_balance result shows a decent amount of imbalance between the Control and Treatment groups so some amount of matching can be done.

Specifically how do I obtain matched results,i.e. flag the subjects that were successfully matched between the Control and Treatment? The cluster_label from matchedDat.df implies that the function is creating a large number of clusters how/can I restrict this?

Any help with respect to speeding up some of the functions above or new suggestions would be appreciated.

Solution

After a more careful reading of the cem documentation I think I have the solution to my problem using the Matchit package or the cem package.

library(cem)
library(tidyverse)
set.seed(100)
control_df=data.frame(Group=factor("Control"),value=rnorm(1400000,95,2))
set.seed(101)
treatment_df=data.frame(Group=factor("Treatment"),value=c(rnorm(500000,92,2),rnorm(100000,50,5)))
dat=rbind(control_df,treatment_df)%>% rownames_to_column()
cem.match=cem(treatment="Group", baseline.group="Control",data=dat,keep.all=TRUE, drop ="rowname")
matchedData=data.frame(Group.check=cem.match$groups, matched=cem.match$matched,weights=cem.match$w)%>% 
  rownames_to_column()%>% 
  inner_join(dat,by="rowname") %>% 
  filter(matched==TRUE)