LLVM Optimization Using C++ API

I'm trying to figure out how to perform all optimizations on an LLVM Module (e.g., all -O3 optimizations). I've tried the following but I'm not sure that all possible optimizations are being applied (e.g., inlining).

//take string "llvm" (LLVM IR) and return "output_llvm" (optimized LLVM IR)
static string optimize(string llvm) {
    LLVMContext &ctx = getGlobalContext();
    SMDiagnostic err;
    Module *ir = ParseIR(MemoryBuffer::getMemBuffer(llvm), err, ctx);
    PassManager *pm = new PassManager();
    PassManagerBuilder builder;
    builder.OptLevel = 3;
    builder.populateModulePassManager(*pm);
    pm->run(*ir);
    delete pm;
    string output_llvm;
    raw_string_ostream buff(output_llvm);
    ir->print(buff, NULL);
    return output_llvm;
}

Is there anything else I can do to improve the performance of the output LLVM IR?

EDIT: I have tried to add all of the optimizations from the AddOptimizationPasses() function in opt.cpp, as shown below:

PassManager *pm = new PassManager();
int optLevel = 3;
int sizeLevel = 0;
PassManagerBuilder builder;
builder.OptLevel = optLevel;
builder.SizeLevel = sizeLevel;
builder.Inliner = createFunctionInliningPass(optLevel, sizeLevel);
builder.DisableUnitAtATime = false;
builder.DisableUnrollLoops = false;
builder.LoopVectorize = true;
builder.SLPVectorize = true;
builder.populateModulePassManager(*pm);
pm->run(*module);

Also, I create a FunctionPassManager before I create the PassManager and add several passes like so:

FunctionPassManager *fpm = new FunctionPassManager(module);
// add several passes
fpm->doInitialization();
for (Function &f : *ir)
    fpm->run(f);
fpm->doFinalization();

However, the performance is the same as running on the command line with -O1 whereas I can get much better performance on the command line using -O3. Any suggestions?

Solution

Follow the logic in the function AddOptimizationPasses in opt.cpp. This is the source of truth.