Search code examples
sortinghierarchy

Sort two text files with its indented text aligned to it


I would like to compare two of my log files generated before and after an implementation to see if it has impacted anything. However, the order of the logs I get is not the same all the time. Since, the log file also has multiple indented lines, when I tried to sort, everything is sorted. But, I would like to keep the child intact with the parent. Indented lines are spaces and not tab.

Any help would be greatly appreciated. I am fine with any windows solution or Linux one.

Eg of the file:

#This is a sample code

Parent1 to be verified

    Child1 to be verified

    Child2 to be verified
        Child21 to be verified
        Child23 to be verified
        Child22 to be verified
            Child221 to be verified

    Child4 to be verified

    Child5 to be verified
        Child53 to be verified
        Child52 to be verified
            Child522 to be verified
            Child521 to be verified

    Child3 to be verified

Solution

  • I am posting another answer here to sort it hierarchically, using python.

    The idea is to attach the parents to the children to make sure that the children under the same parent are sorted together.

    See the python script below:

    """Attach parent to children in an indentation-structured text"""
    from typing import Tuple, List
    import sys
    
    # A unique separator to separate the parent and child in each line
    SEPARATOR = '@'
    # The indentation
    INDENT = '    '
    
    def parse_line(line: str) -> Tuple[int, str]:
        """Parse a line into indentation level and its content
        with indentation stripped
    
        Args:
            line (str): One of the lines from the input file, with newline ending
    
        Returns:
            Tuple[int, str]: The indentation level and the content with
                indentation stripped.
    
        Raises:
            ValueError: If the line is incorrectly indented.
        """
        # strip the leading white spaces
        lstripped_line = line.lstrip()
        # get the indentation
        indent = line[:-len(lstripped_line)]
    
        # Let's check if the indentation is correct
        # meaning it should be N * INDENT
        n = len(indent) // len(INDENT)
        if INDENT * n != indent:
            raise ValueError(f"Wrong indentation of line: {line}")
    
        return n, lstripped_line.rstrip('\r\n')
    
    
    def format_text(txtfile: str) -> List[str]:
        """Format the text file by attaching the parent to it children
    
        Args:
            txtfile (str): The text file
    
        Returns:
            List[str]: A list of formatted lines
        """
        formatted = []
        par_indent = par_line = None
    
        with open(txtfile) as ftxt:
            for line in ftxt:
                # get the indentation level and line without indentation
                indent, line_noindent = parse_line(line)
    
                # level 1 parents
                if indent == 0:
                    par_indent = indent
                    par_line = line_noindent
                    formatted.append(line_noindent)
    
                # children
                elif indent > par_indent:
                    formatted.append(par_line +
                                     SEPARATOR * (indent - par_indent) +
                                     line_noindent)
    
                    par_indent = indent
                    par_line = par_line + SEPARATOR + line_noindent
    
                # siblings or dedentation
                else:
                    # We just need first `indent` parts of parent line as our prefix
                    prefix = SEPARATOR.join(par_line.split(SEPARATOR)[:indent])
                    formatted.append(prefix + SEPARATOR + line_noindent)
                    par_indent = indent
                    par_line = prefix + SEPARATOR + line_noindent
    
        return formatted
    
    def sort_and_revert(lines: List[str]):
        """Sort the formatted lines and revert the leading parents
        into indentations
    
        Args:
            lines (List[str]): list of formatted lines
    
        Prints:
            The sorted and reverted lines
        """
        sorted_lines = sorted(lines)
        for line in sorted_lines:
            if SEPARATOR not in line:
                print(line)
            else:
                leading, _, orig_line = line.rpartition(SEPARATOR)
                print(INDENT * (leading.count(SEPARATOR) + 1) + orig_line)
    
    def main():
        """Main entry"""
        if len(sys.argv) < 2:
            print(f"Usage: {sys.argv[0]} <file>")
            sys.exit(1)
    
        formatted = format_text(sys.argv[1])
        sort_and_revert(formatted)
    
    if __name__ == "__main__":
        main()
    
    

    Let's save it as format.py, and we have a test file, say test.txt:

    parent2
        child2-1
            child2-1-1
        child2-2
    parent1
        child1-2
            child1-2-2
            child1-2-1
        child1-1
    

    Let's test it:

    $ python format.py test.txt
    parent1
        child1-1
        child1-2
            child1-2-1
            child1-2-2
    parent2
        child2-1
            child2-1-1
        child2-2
    

    If you wonder how the format_text function formats the text, here is the intermediate results, which also explains why we could make file sorted as we wanted:

    parent2
    parent2@child2-1
    parent2@child2-1@child2-1-1
    parent2@child2-2
    parent1
    parent1@child1-2
    parent1@child1-2@child1-2-2
    parent1@child1-2@child1-2-1
    parent1@child1-1
    

    You may see that each child has its parents attached, all the way along to the root. So that the children under the same parent are sorted together.