Search code examples
carraystext-processingkernighan-and-ritchie

Folding input lines every nth column (K&R 1-22) in C


Write a program to "fold" long input lines into two or more shorter lines after the last non-blank character that occurs before the n-th column of input. Make sure your program does something intelligent with very long lines, and if there are no blanks or tabs before the specified column.

The algorithm I decided to follow for this was as follows:

  1. If length of input line < maxcol (the column after which one would have to fold), then print the line as it is.
  2. If not, from maxcol, I check towards it's left, and it's right to find the closest non-space character, and save them as 'first' and 'last'. I then print the character array from line[0] to line[first] and then the rest of the array, from line[last] to line[len] becomes the new line array.

Here's my code:

#include <stdio.h>

#define MAXCOL 5

int getline1(char line[]);

int main()
{
    char line[1000];
    int len, i, j, first, last;

    len = getline1(line);

    while (len > 0) {
        if (len < MAXCOL) {
            printf("%s\n", line);
            break;
        }
        else {
            for (i = MAXCOL - 1; i >= 0; i--) {
                if (line[i] != ' ') {
                    first = i; 
                    break;
                }
            }
            for (j = MAXCOL - 1; j <= len; j++) {
                if (line[j] != ' ') {
                    last = j; 
                    break;
                }
            }
            //printf("first %d last %d\n", first, last);
            for (i = 0; i <= first; i++) 
                putchar(line[i]);
            putchar('\n');
            for (i = 0; i < len - last; i++) {
                line[i] = line[last + i];
            }
            len -= last;
            first = last = 0;
        }
    }
    return 0;
}

int getline1(char line[])
{
    int c, i = 0;

    while ((c = getchar()) != EOF && c != '\n') 
        line[i++] = c;

    if (c == '\n')
        line[i++] = '\n';

    line[i] = '\0';

    return i;
}

Here are the problems:

  • It does not do something intelligent with very long lines (this is fine, as I can add it as an edge case).
  • It does not do anything for tabs.
  • I cannot understand a part of the output.

For example, with the input:

asd        de             def          deffff

I get the output:

asd
de
def
defff //Expected until here
//Unexpected lines below
ff
fff
      deffff
        deffff
    deffff

Question 1 - Why do the unexpected lines print? How do I make my program/algorithm better?

Eventually, after spending quite some time with this question, I gave up and decided to check the clc-wiki for solutions. Every program here did NOT work, save one (The others didn't work because they did not cover certain edge cases). The one that worked was the largest one, and it did not make any sense to me. It did not have any comments, and neither could I properly understand the variable names, and what they represented. But it was the ONLY program in the wiki that worked.

#include <stdio.h>

#define YES 1
#define NO 0

int main(void)
{
  int TCOL = 8, ch, co[3], i, COL = 19, tabs[COL - 1];
  char bls[COL - 1], bonly = YES;

  co[0] = co[1] = co[2] = 0;

  while ((ch = getchar()) != EOF)
  {
      if (ch != '\t') {
          ++co[0];
          ++co[2];
      }

      else {
          co[0] = co[0] + (TCOL * (1 + (co[2] / TCOL)) - co[2]);
          i = co[2];
          co[2] = TCOL + (co[2] / TCOL) * TCOL;
      }

      if (ch != '\n' && ch != ' ' && ch != '\t')
      {
          if (co[0] >= COL) {
              putchar('\n');

              co[0] = 1;
              co[1] = 0;
          }

          else
              for (i = co[1]; co[1] > 0; --co[1])
              {
                  if (bls[i - co[1]] == ' ')
                      putchar(bls[i - co[1]]);

                  else
                      for (; tabs[i - co[1]] != 0;)

                          if (tabs[i - co[1]] > 0) {
                              putchar(' ');
                              --tabs[i - co[1]];
                          }

                          else {
                              tabs[i - co[1]] = 0;
                              putchar(bls[i - co[1]]);
                          }
              }

          putchar(ch);

          if (bonly == YES)
              bonly = NO;
      }

      else if (ch != '\n')
      {
          if (co[0] >= COL)
          {
              if (bonly == NO) {
                  putchar('\n');

                  bonly = YES;
              }

              co[0] = co[1] = 0;
          }

          else if (bonly == NO) {
              bls[co[1]] = ch;

              if (ch == '\t') {

                  if (TCOL * (1 + ((co[0] - (co[2] - i)) / TCOL)) -
                    (co[0] - (co[2] - i)) == co[2] - i)
                      tabs[co[1]] = -1;

                  else
                      tabs[co[1]] = co[2] - i;
              }

              ++co[1];
          }

          else
              co[0] = co[1] = 0;
      }

      else {
          putchar(ch);

          if (bonly == NO)
              bonly = YES;

          co[0] = co[1] = co[2] = 0;
      }
  }

  return 0;
}

Question 2 - Can you help me make sense of this code and how it works?

It fixes all the problems with my solution, and also works by reading character to character, and therefore seems more efficient.


Solution

  • Question 1 - Why do the unexpected lines print? How do I make my program/algorithm better?

    You are getting the unexpected lines in the output because after printing the array, you are not terminating the new line array with null character \0 -

    Here you are copying character from starting from last till len - last, creating a new line array:

            for (i = 0; i < len - last; i++) {
                line[i] = line[last + i];
            }
    

    You have copied the characters but the null terminating character is still at its original position. Assume the input string is:

    asd        de             def          deffff
    

    So, initially the content of line array will be:

    "asd        de             def          deffff\n"
                                                    ^
                                                    |
                                      null character is here
    

    Now after printing asd, you are copying characters from last index of line till len - last index to line array itself starting from 0 index. So, after copying the content of line array will be:

    "de             def          deffff\n    deffff\n"
                                         |____  _____|
                                              \/
                                    This is causing the unexpected output 
                                (null character is still at the previous location)
    

    So, after for loop you should add the null character just after the last character copied, like this:

    line [len - last] = '\0';
    

    With this the content of line array that will be processed in the next iteration of while loop will be:

    "de             def          deffff\n"
    

    One more thing, in the line array you can see the \n (newline) character at the end. May you want to remove it before processing the input, you can do:

    line[strcspn(line, "\n")] = 0;
    

    Improvements that you can do in your program:
    1. One very obvious improvement that you can do is to use pointer to the input string while processing it. With the help of pointer you don't need to copy the rest of the array, apart from processed part, again to the same array till the program process the whole input. Initialize the pointer to the start of the input string and in every iteration just move the pointer to appropriate location and start processing from that location where pointer is pointing to.
    2. Since you are taking the whole input first in a buffer and then processing it. You may consider fgets() for taking input. It will give better control over the input from user.
    3. Add a check for line array overflow, in case of very long input. With fgets() you can specify the maximum number of character to be copied to line array from input stream.

    Question 2 - Can you help me make sense of this code and how it works?

    The program is very simple, try to understand it at least once by yourself. Either use a debugger or take a pen and paper, dry run it once for small size input and check the output. Increase the input size and add some variations like multiple space characters and check the program code path and output. This way you can understand it very easily.