Search code examples
c++sortingipweblog

C++ sort same IPs together, web log


I need to sort web log file by IP, so I need to connect same IPs under next. I'm lazy but I want to learn ways in C++ so I don't want to sort it in excel. I did some changes in log so for example after IP in every line is (8 q [symbols] { qqqqqqqq }) after that goes another address - so I can sort string in lines by numbers for every string, because IPs don't have same length - so i need to give only 16 characters in line to array and compare - at least I thought it would be good idea.

Example of log:

85.xx.xx.58 qqqqqqqq    85.xx.xx.58.xxxxxxxxx   bla,bla,bla,bla,
105.216.xx.xx   qqqqqqqq    - bla,bla,bla,bla,bla,bla,bla,
85.xx.xx.58 qqqqqqqq    85.xx.xx.58.xxxxxxxxx   bla,bla,bla,bla,

Log have more than 60 000 lines, and I used C++ to erase robot.txt, .js, .gif, .jpg etc. lines so I kind of want to recycle old code. example for "robot.txt" delete-line.

#include <iostream>
#include <string>
#include <fstream>

using namespace std;

int main()
{
ifstream infile("C:\\ips.txt");
ofstream myfile;
string line;

while (getline(infile, line)) {

    myfile.open("C:\\ipout.txt");

    for (string line; getline(infile, line); ) {
        if (line.find("robots.txt") != string::npos)
                myfile << line << "\n";
    }
}

infile.close();
myfile.close();

cout << " \n";
cin.get();

return 0;
}

I know this code looks horrible but it did its work, I'm still learnig, and of course I want to have old file, and another file (new).

I found help around this topic, but it was kind of off the road for me...

I'm thinking about changing "if" statement to read only 16 characters, compare them and connect them (under each other, to lines) of course the whole line should be intact - if it is possible.


Solution

  • I'm not sure I really understood the log format but I guess you can adapt this to fit your needs.

    This assumes a line based log format where each line starts with the key that you want to group on (the ip number for example). It uses an unordered_map, but you can try a normal map too. The key in the map is the IP number and the rest of the line will be put in a vector of strings.

    #include <iostream>
    #include <vector>
    #include <sstream>
    #include <unordered_map>
    
    // alias for the map
    using logmap = std::unordered_map<std::string, std::vector<std::string>>;
    
    logmap readlog(std::istream& is) {
        logmap rv;
        std::string line;
        while(std::getline(is, line)) {
            // put the line in a stringstream to extract ip and the rest
            std::stringstream ss(line);
            std::string ip;
            std::string rest;
            ss >> ip >> std::ws;
            std::getline(ss, rest);
            // add your filtering here 
            // put the entry in the map using ip as key
            rv[ip].push_back(rest);
        }
        return rv;
    }
    
    int main() {
        logmap lm = readlog(std::cin);
        for(const auto& m : lm) {
            std::cout << m.first << "\n";
            for(const auto& l : m.second) {
                std::cout << " " << l << "\n";
            }
        }
    }
    

    Given this input:

    127.0.0.1 first ip first line
    192.168.0.1 first line of second ip
    127.0.0.1 this is the second for the first ip
    192.168.0.1 second line of second ip
    127.0.0.1 and here's the third for the first
    192.168.0.1 third line of second ip
    

    This is a possible output:

    192.168.0.1
     first line of second ip
     second line of second ip
     third line of second ip
    127.0.0.1
     first ip first line
     this is the second for the first ip
     and here's the third for the first