I want to make a copy of a file with fixed-width records with multiple numeric ranges removed. For example a file has fixed width records 1600 long, and I want to keep columns 0-83, 89-1517, 1526-end. This is for use in a larger problem, standalone utilities like cut and awk won't help here.
I have this which I apply to each line/record; it works okay, wonder if anything obviously better.
"".join([full[:84], full[89:1518], full[1526:]])
In particular I'd find it more natural to specify what to cut than what to keep, if there's standard library or easy to read quick function that is more like
# hypothetical
cut(line, [ [84,88], [1519, 25] ])
ADDITION
To accepted answer, use sorted list of cuts, so caller can give in any order. Would be nice to add overlap detection as well
def cut(line, cuts):
sorted_cuts = sorted(cuts, key=lambda x: x[0])
return ''.join(line[slice(keep_start, keep_end)]
for keep_start, keep_end in zip(
[None] + [cut_end for cut_start, cut_end in sorted_cuts],
[cut_start for cut_start, cut_end in sorted_cuts] + [None]))
origline = "0123456789"
assert (cut(origline, [[1,2], [3,4]]) ==
cut(origline, ([3,4], (1,2))) ==
cut(origline, [[3,4], [1,2]]))
print(cut(origline, [[1,2], [3,4]]))
Here is an implementation of your hypothetical cut
function.
def cut(line, cuts):
return ''.join(line[slice(keep_start, keep_end)]
for keep_start, keep_end in zip(
[None] + [cut_end for cut_start, cut_end in cuts],
[cut_start for cut_start, cut_end in cuts] + [None]))
print(cut('abcdefghijklmnopqrstuvwxyz', [[1,3], [9,10]]))
gives:
adefghiklmnopqrstuvwxyz
(bc
and j
were cut)
So:
The [None] + [cut_end for cut_start, cut_end in cuts]
is the start of each slice to keep, in this example [None, 3, 10]
The [cut_start for cut_start, cut_end in cuts] + [None]
is the end of each slice to keep, in this example [1, 9, None]
where None
means start/end of string as used by the slice
builtin.
Note: to implement the cuts given in your example, you would supply the arguments to this cut
function as:
cut(line, [[84, 89], [1519, 1526]])
where the second element of each 2-element list is the index after the end of the cut, in keeping with normal python indexing conventions.
If you really want not to have to do this (in order to get exactly the cut
function that you describe above), then in the above code you would replace:
[cut_end for cut_start, cut_end in cuts]
with:
[cut_end + 1 for cut_start, cut_end in cuts]
For convenience, here is the full code of the function in that case, and the calling code that you would use in your example:
def cut(line, cuts):
return ''.join(line[slice(keep_start, keep_end)]
for keep_start, keep_end in zip(
[None] + [cut_end + 1 for cut_start, cut_end in cuts],
[cut_start for cut_start, cut_end in cuts] + [None]))
print(cut(line, [[84, 88], [1519, 1525]])