Search code examples
c++xmlclasscoding-styleprogram-entry-point

What is the right way to parse large amount of input for main function in C++


Suppose there are 30 numbers I had to input into an executable, because of the large amount of input, it is not reasonable to input them via command line. One standard way is to save them into a single XML file and use XML parser like tinyxml2 to parse them. The problem is if I use tinyxml2 to parse the input directly I will have a very bloated main function, which seems to contradict the common good practice.

For example:

int main(int argc, char **argv){

  int a[30];      

  tinyxml2::XMLDocument doc_xml;

  if (doc_xml.LoadFile(argv[1])){
    std::cerr << "failed to load input file";
  }
  else {
    tinyxml2::XMLHandle xml(&doc_xml);

    tinyxml2::XMLHandle a0_xml =
        xml.FirstChildElement("INPUT").FirstChildElement("A0");

    if (a0_xml.ToElement()) {
      a0_xml.ToElement()->QueryIntText(&a[0]);
    }
    else {
      std::cerr << "A0 missing";
    }

    tinyxml2::XMLHandle a1_xml =
        xml.FirstChildElement("INPUT").FirstChildElement("A1");

    if (a1_xml.ToElement()) {
      a1_xml.ToElement()->QueryIntText(&a[1]);
    }
    else {
      std::cerr << "A1 missing";
    }

    // parsing all the way to A29 ... 
  }

  // do something with a

  return 0;
}

But on the other hand, if I write an extra class just to parse these specific type of input in order to shorten the main function, it doesn't seem to be right either, because this extra class will be useless unless it's used in conjunction with this main function since it can't be reused elsewhere.

int main(int argc, char **argv){

  int a[30];      

  ParseXMLJustForThisExeClass ParseXMLJustForThisExeClass_obj;

  ParseXMLJustForThisExeClass_obj.Run(argv[1], a);

  // do something with a

  return 0;
}

What is the best way to deal with it?


Solution

  • Note, besides reading XML files you can also pass lots of data through stdin. It's pretty common practice to use e.g. mycomplexcmd | hexdump -C, where hexdump is reading from stdin through the pipe.

    Now up to the rest of the question: there's a reason to go with the your multiple-functions example (here it's not very important whether they're constructors or usual functions). It's pretty much the same as why would you want any function to be smaller — readability. That said, I don't know about the "common good practice", and I've seen many terminal utilities with very big main().

    Imagine someone new is reading 1-st variant of main(). They'd be going through the hoops of figuring out all these handles, queries, children, parents — when all they wanted is to just look at the part after // do something with a. It's because they don't know if it's relevant to their problem or not. But in the 2-nd variant they'll quickly figure it out "Aha, it's the parsing logic, it's not what I am looking for".

    That said, of course you can break the logic with detailed comments. But now imagine something went wrong, someone is debugging the code, and they pinned down the problem to this function (alright, it's funny given the function is main(), maybe they just started debugging). The bug turned out to be very subtle, unclear, and one is checking everything in the function. Now, because you're dealing with mutable language, you'd often find yourself in situation where you think "oh, may be it's something with this variable, where it's being changed?"; and you first look up every use of the variable through this large function, then conditions that could lead to blocks where it's changed; then you figuring out what does this another big block, relevant to the condition, that could've been extracted to a separate function, what variables are used in there; and to the moment you figured out what it's doing you already forgot half of what you were looking before!

    Of course sometimes big functions are unavoidable. But if you asked the question, it's probably not your case.

    Rule of thumb: you see a function doing two different things having little in common, you want to break it to 2 separate functions. In your case it's parsing XML and "doing something with a". Though if that 2-nd part is a few lines, probably not worth extracting — speculate a bit. Don't worry about the overhead, compilers are good at optimizing. You can either use LTO, or you can declare a function in .cpp file only as static (non-class static), and depending on optimization options a compiler may inline the code.

    P.S.: you seem to be in the state where it's very useful to learn'n'play with some Haskell. You don't have to use it for real serious projects, but insights you'd get can be applied anywhere. It forces you into better design, in particular you'd quickly start feeling when it's necessary to break a function (aside of many other things).