Search code examples
c++assemblygccclangcompiler-explorer

How to generate godbolt like clean assembly locally?


I want to generate clean assembly like Compiler Explorer locally. Note that, I read How to remove “noise” from GCC/clang assembly output? before attempting this. The output using that method isn't as clean or dense compared to godbolt and still has a lot of asm directives and unused labels in it.

How can I get clean assembly output without any unused labels or directives?


Solution

  • A while ago, I needed something like this locally so I wrote a small tool to make the asm readable.

    It attempts to 'clean' and make the 'asm' output from 'gcc' readable using C++ itself. It does something similar to Compiler Explorer and tries to remove all the directives and unused labels, making the asm clean. Only standard library is used for this.

    Some things I should mention:

    • Will only with gcc and clang
    • Only tested with C++ code
    • compile with -S -fno-asynchronous-unwind-tables -fno-dwarf2-cfi-asm -masm=intel, (remove -masm= if you want AT&T asm) AT&T syntax will probably work but I didn't test it much. The other two options are to remove the .cfi directives. It can be handled using the code below but the compiler itself does a much better job of this. See the answer by Peter Cordes above.
    • This program can work as standalone, but I would highly recommend reading this SO answer to tune your asm output and then process it using this program to remove unused labels / directives etc.
    • abi::__cxa_demangle() is used for demangling
    • Disclaimer: This isn't a perfect solution, and hasn't been tested extensively.

    The strategy used for cleaning the asm(There are probably better, faster more efficient ways to do this):

    1. Collect all the labels
    2. Go through the asm line by line and check if the labels are used/unused
    3. If the labels are unused, they get deleted
    4. Every line beginning with '.' gets deleted, unless it is a used somewhere

    Update 1: Not all static data gets removed now.

    #include <algorithm>
    #include <cxxabi.h>
    #include <fstream>
    #include <iostream>
    #include <regex>
    #include <string>
    #include <sstream>
    #include <unordered_map>
    
    // trim from both ends (in place)
    std::string_view trim(std::string_view s)
    {
        s.remove_prefix(std::min(s.find_first_not_of(" \t\r\v\n"), s.size()));
        s.remove_suffix(std::min(s.size() - s.find_last_not_of(" \t\r\v\n") - 1, s.size()));
        return s;
    }
    
    static inline bool startsWith(const std::string_view s, const std::string_view searchString)
    {
        return (s.rfind(searchString, 0) == 0);
    }
    
    std::string demangle(std::string &&asmText)
    {
        int next = 0;
        int last = 0;
        while (next != -1) {
            next = asmText.find("_Z", last);
            //get token
            if (next != -1) {
                int tokenEnd = asmText.find_first_of(":,.@[]() \n", next + 1);
                int len = tokenEnd - next;
                std::string tok = asmText.substr(next, len);
                int status = 0;
                char* name = abi::__cxa_demangle(tok.c_str(), 0, 0, &status);
                if (status != 0) {
                    std::cout << "Demangling of: " << tok << " failed, status: " << status << '\n';
                    continue;
                }
                std::string demangledName{name};
                demangledName.insert(demangledName.begin(), ' ');
                asmText.replace(next, len, demangledName);
                free((void*)name);
            }
        }
        return std::move(asmText);
    }
    
    std::string clean_asm(const std::string& asmText)
    {
        std::string output;
        output.reserve(asmText.length());
        std::stringstream s{asmText};
    
        //1. collect all the labels
        //2. go through the asm line by line and check if the labels are used/unused
        //3. if the labels are unused, they get deleted
        //4. every line beginning with '.' gets deleted, unless it is a used label
    
        std::regex exp {"^\\s*[_|a-zA-Z]"};
        
        std::regex directiveRe { "^\\s*\\..*$" };
        std::regex labelRe { "^\\.*[a-zA-Z]+[0-9]+:$" };
        std::regex hasOpcodeRe { "^\\s*[a-zA-Z]" };
        std::regex numericLabelsRe { "\\s*[0-9]:" };
    
        const std::vector<std::string> allowedDirectives =
        {
            ".string", ".zero", ".byte", ".value", ".long", ".quad", ".ascii"
        };
    
        //<label, used>
        std::unordered_map<std::string, bool> labels;
    
        //1
        std::string line;
        while (std::getline(s, line)) {
            if (std::regex_match(line, labelRe)) {
                trim(line);
                // remove ':'
                line = line.substr(0, line.size() - 1);
                labels[line] = false;
            }
        }
    
        s.clear();
        s.str(asmText);
        line = "";
    
        //2
        while (std::getline(s, line)) {
            if (std::regex_match(line, hasOpcodeRe)) {
                auto it = labels.begin();   
                for (; it != labels.end(); ++it) {
                    if (line.find(it->first)) {
                        labels[it->first] = true;
                    }
                }
            }
        }
    
        //remove false labels from labels hash-map
        for (auto it = labels.begin(); it != labels.end();) {
            if (it->second == false)
                it = labels.erase(it);
            else
                ++it;
        }
    
        s.clear();
        s.str(asmText);
        line = "";
    
        std::string currentLabel;
    
        //3
        while (std::getline(s, line)) {
            trim(line);
    
            if (std::regex_match(line, labelRe)) {
                auto l = line;
                l = l.substr(0, l.size() - 1);
                currentLabel = "";
                if (labels.find(l) != labels.end()) {
                    currentLabel = line;
                    output += line + "\n";
                }
                continue;
            }
    
            if (std::regex_match(line, directiveRe)) {
                //if we are in a label
                if (!currentLabel.empty()) {
                    auto trimmedLine = trim(line);
                    for (const auto& allowedDir : allowedDirectives) {
                        if (startsWith(trimmedLine, allowedDir)) {
                            output += line;
                            output += '\n';
                        }
                    }
                }
                continue;
            }
    
            if (std::regex_match(line, numericLabelsRe)) {
                continue;
            }
    
            if (line == "endbr64") {
                continue;
            }
    
            if (line[line.size() - 1] == ':' || line.find(':') != std::string::npos) {
                currentLabel = line;
                output += line + '\n';
                continue;
            }
    
            line.insert(line.begin(), '\t');
    
            output += line + '\n';
        }
    
        return output;
    }
    
    int main(int argc, char* argv[])
    {
        if (argc < 2) {
            std::cout << "Please provide more than asm filename you want to process.\n";
        }
        std::ifstream file(argv[1]);
        std::string output;
        if (file.is_open()) {
            std::cout << "File '" << argv[1] << "' is opened\n";
            std::string line;
            while (std::getline(file, line)) {
                output += line + '\n';
            }
        }
    
        output = demangle(std::move(output));
        output = clean_asm(output);
    
        std::string fileName = argv[1];
        auto dotPos = fileName.rfind('.');
        if (dotPos != std::string::npos)
            fileName.erase(fileName.begin() + dotPos, fileName.end());
    
        std::cout << "Asm processed. Saving as '"<< fileName <<".asm'";
        std::ofstream out;
        out.open(fileName + ".asm");
        out << output;
    
        return 0;
    }