Search code examples
pythonlistslicers

Reconciling an array slicer


I've built a function to cut the extraneous garbage out of text entries. It uses an array slicer. I now need to reconcile the lines that've been removed by my cleanup function so all the lines_lost + lines_kept = total lines. Source code below:

def header_cleanup(entry_chunk):
    # Removes duplicate headers due to page-continuations
    entry_chunk = entry_chunk.replace("\r\n\r\n","\r\n")
    header = lines[1:5]
    lines[:] = [x for x in lines if not any(header == x for header in headers)]
    lines = headers + lines
    return("\n".join(lines))

How could I count the lines that do not show up in lines after the slice/mutation, i.e:

original_length = len(lines)
lines = lines.remove_garbage
garbage = lines.garbage_only_plz
if len(lines) + len(garbage) == original_length:
    print("Good!")
else:
    print("Bad!  ;(")

Final answer ended up looking like this:

def header_cleanup(entry_chunk):
    lines = entry_chunk.replace("\r\n\r\n","\r\n")
    line_length = len(lines)
    headers = lines[1:5]
    saved_lines = []
    bad_lines = []
    saved_lines[:] = [x for x in lines if not any(header == x for header in headers)]
    bad_lines[:] = [x for x in lines if any(header == x for header in headers)]
    total_lines = len(saved_lines) + len(bad_lines)
    if total_lines == line_length:
        print("Yay!")
    else:
        print("Boo.")
        print(f"{rando_trace_info}")
        sys.exit()
    final_lines = headers + saved_lines
    return("\n".join(final_lines))

Okokokokok - I know you're thinking: that's redundant, but it's required. Open to edits after solution for anything more pythonic. Thanks for consideration.


Solution

  • Don't reuse the lines variable, use a different variable, so you can get the garbage out of the original lines.

    clean_lines = remove_garbage(lines)
    garbage = garbage_only(lines)
    if len(clean_lines) + len(garbage) == len(lines):
        print("Good!")
    else:
        print("Bad!")
    

    You might want to have a single function that returns both:

    clean_lines, garbage = filter_garbage(lines)