Search code examples
sortingsequentialtaocp

How do I extract an algorithm from these instructions?


I've been reading through The Art of Computer Programming, and though it has its moments of higher maths that I just can't get, some exercises have been fun to do.

After I've done one of them I go over to the answer, see if I did better or worse than what the book suggests (usually worse), But I don't get what the answer for the current one I'm on is trying to convey at all.

The book's question and proposed solution can be found here

What I've understood is that t may be the number of 'missing' elements or may be a general constant, but what I really don't understand is the seemingly arbitrary instruction to sort them based on their components, which to me looks like spinning your wheels in place since at first glance it doesn't get you closer to the original order. And the decision (among others) to replace one part of the paired names with a number ( file G contains all pairs (i,xi) for n−t < i ≤ n).

So my question is, simply, How do I extract an algorithm from this answer?

Bit of a clarification:

I understand what it aims to do, and how I would go into translating it into C++. What I do not understand is why I am supposed to sort the copies of the input file, and if so which criteria do I sort by, as well as the reasons to changing one side of the pairs to a number.


Solution

  • It's assumed that names are sortable, and that there are a sufficient number of tape drives to solve the problem. Define a pair as (name, next_name), where next_name is the name of person to the west. A copy of the file of pairs is made to another tape. The first file is sorted by name, the second file is sorted by next_name. Tape sorts are bottom up merge sort or a more complex variation called polyphase merge sort, but for this problem, standard bottom up merge sort is good enough. For C++, you could use std::stable_sort() to emulate a tape sort, using a lambda function for the compare, sorting by name for the first file and sorting by next_name for the second file.

    The terminology for indexing uses name[1] to represent the eastern most name, and name[n] to represent the western most name.

    After the initial sorting of the two files of pairs, the solution states that "passing over the files" is done to identify the next to last name, name[n-1], but doesn't specify how. In the process, I assume name[n] is also identified. The files are compared in sequence, comparing name from first file with next_name from second file. A mismatch indicates either the first name, name[1], or the last name, name[n], or in a vary rare circumstance, both, and the next pairs from each file have to be checked to determine what the mismatch indicates. At the time that the last name, name[n] is identified, then name from the second file pair will be the next to last name, name[n-1].

    Once name[n-1] and name[n] are known, a merge like operation using both files is performed, skipping name[n-1] and name[n] to create F with pairs (name[i], name[i+2]) for i = 1 to n-2 (in name order), and G with two pairs (n-1, x[n-1]), and (n, x[n]), also in name order (G and G' are in name order until the last step).

    F is copied to H, and an iterative process is performed as described in the algorithm, with t doubling each time, 2, 4, 8, ... . After each pass, F' contains pairs (x[i], x[i+t]) for i = 1 to n-t, then G' is sorted and merged with G back into G', resulting in a G' that contains the pairs (i, x[i]) for i = n-t to n, in name order. Eventually all the pairs end up in G (i, x[i]) for i = 1 to n, in name order, and then G is sorted by index (left part of pair), resulting in the names in sorted order.