I need to sort web log file by IP, so I need to connect same IPs under next. I'm lazy but I want to learn ways in C++ so I don't want to sort it in excel. I did some changes in log so for example after IP in every line is (8 q [symbols] { qqqqqqqq }) after that goes another address - so I can sort string in lines by numbers for every string, because IPs don't have same length - so i need to give only 16 characters in line to array and compare - at least I thought it would be good idea.
Example of log:
85.xx.xx.58 qqqqqqqq 85.xx.xx.58.xxxxxxxxx bla,bla,bla,bla,
105.216.xx.xx qqqqqqqq - bla,bla,bla,bla,bla,bla,bla,
85.xx.xx.58 qqqqqqqq 85.xx.xx.58.xxxxxxxxx bla,bla,bla,bla,
Log have more than 60 000 lines, and I used C++ to erase robot.txt, .js, .gif, .jpg etc. lines so I kind of want to recycle old code. example for "robot.txt" delete-line.
#include <iostream>
#include <string>
#include <fstream>
using namespace std;
int main()
{
ifstream infile("C:\\ips.txt");
ofstream myfile;
string line;
while (getline(infile, line)) {
myfile.open("C:\\ipout.txt");
for (string line; getline(infile, line); ) {
if (line.find("robots.txt") != string::npos)
myfile << line << "\n";
}
}
infile.close();
myfile.close();
cout << " \n";
cin.get();
return 0;
}
I know this code looks horrible but it did its work, I'm still learnig, and of course I want to have old file, and another file (new).
I found help around this topic, but it was kind of off the road for me...
I'm thinking about changing "if" statement to read only 16 characters, compare them and connect them (under each other, to lines) of course the whole line should be intact - if it is possible.
I'm not sure I really understood the log format but I guess you can adapt this to fit your needs.
This assumes a line based log format where each line starts with the key that you want to group on (the ip number for example). It uses an unordered_map
, but you can try a normal map
too. The key in the map is the IP number and the rest of the line will be put in a vector of strings.
#include <iostream>
#include <vector>
#include <sstream>
#include <unordered_map>
// alias for the map
using logmap = std::unordered_map<std::string, std::vector<std::string>>;
logmap readlog(std::istream& is) {
logmap rv;
std::string line;
while(std::getline(is, line)) {
// put the line in a stringstream to extract ip and the rest
std::stringstream ss(line);
std::string ip;
std::string rest;
ss >> ip >> std::ws;
std::getline(ss, rest);
// add your filtering here
// put the entry in the map using ip as key
rv[ip].push_back(rest);
}
return rv;
}
int main() {
logmap lm = readlog(std::cin);
for(const auto& m : lm) {
std::cout << m.first << "\n";
for(const auto& l : m.second) {
std::cout << " " << l << "\n";
}
}
}
Given this input:
127.0.0.1 first ip first line
192.168.0.1 first line of second ip
127.0.0.1 this is the second for the first ip
192.168.0.1 second line of second ip
127.0.0.1 and here's the third for the first
192.168.0.1 third line of second ip
This is a possible output:
192.168.0.1
first line of second ip
second line of second ip
third line of second ip
127.0.0.1
first ip first line
this is the second for the first ip
and here's the third for the first