Search code examples
sedksh

sed -i touching files that it doesn't change


Someone on our server ran sed -i 's/$var >> $var2/$var > $var2/ * to change inserts to overwrites in some bash scripts in a common directory. No big deal, it was tested first with grep and it returned the expected results that only his files would be touched.

He ran the script and now 1200 files of the 1400 in the folder have a new modified date, yet as far as we can tell, only his small handful of files were actually changed.

  1. Why would sed 'touch' a file that it's not changing.
  2. Why would it only 'touch' a portion of the files and not all of them.
  3. Did it actually change something (maybe some trailing white space or something totally unexpected because of the $'s in the sed regex)?

Solution

  • When GNU sed successfully edits a file "in-place," its timestamp is updated. To understand why, let's review how edit "in-place" is done:

    1. A temporary file is created to hold the output.

    2. sed processes the input file, sending output to the temporary file.

    3. If a backup file extension was specified, the input file is renamed to the backup file.

    4. Whether a backup is created or not, the temporary output is moved (rename) to the input file.

    GNU sed does not track whether any changes were made to the file. Whatever is in the temporary output file is moved to the input file via rename.

    There is a nice benefit to this procedure: POSIX requires that rename be atomic. Consequently, the input file is never in a mangled state: it is either the original file or the modified file and never part way in-between.

    As a result of this procedure, any file that sed successfully processes will have its timestamp changed.

    Example

    Let's consider this inputfile:

    $ cat inputfile
    this is
    a test.
    

    Now, under the supervision of strace, let's run sed -i on it in a way guaranteed to cause no changes:

    $ strace sed -i 's/XXX/YYY/' inputfile
    

    The edited result looks like:

    execve("/bin/sed", ["sed", "-i", "s/XXX/YYY/", "inputfile"], [/* 55 vars */]) = 0
    [...snip...]
    open("inputfile", O_RDONLY)             = 4
    [...snip...]
    open("./sediWWqLI", O_RDWR|O_CREAT|O_EXCL, 0600) = 6
    [...snip...]
    read(4, "this is\na test.\n", 4096)     = 16
    write(6, "this is\n", 8)                = 8
    write(6, "a test.\n", 8)                = 8
    read(4, "", 4096)                       = 0
    [...snip...]
    close(4)                                = 0
    [...snip...]
    close(6)                                = 0
    [...snip...]
    rename("./sediWWqLI", "inputfile")      = 0
    

    As you can see, sed opens the input file, inputfile, on file handle 4. It then creates a temporary file, ./sediWWqLI on file handle 6, to hold the output. It reads from the input file and writes it unchanged to the output file. When this is done, a call to rename is made to overwrite inputfile, changing its timestamp.

    GNU sed source code

    The relevant source code is in the execute.c file of the sed directory of the source. From version 4.2.1:

      ck_fclose (input->fp);
      ck_fclose (output_file.fp);
      if (strcmp(in_place_extension, "*") != 0)
        {
          char *backup_file_name = get_backup_file_name(target_name);
          ck_rename (target_name, backup_file_name, input->out_file_name);
          free (backup_file_name);
        }
    
      ck_rename (input->out_file_name, target_name, input->out_file_name);
      free (input->out_file_name);
    

    ck_rename is a cover function for the stdio function rename. The source for ck_rename is in sed/utils.c.

    As you can see, no flag is kept to determine whether the file actually changed or not. rename is called regardless.

    Files whose timestamps were not updated

    As for the 200 of the 1400 files whose timestamps did not change, that would mean that sed somehow failed on those files. One possibility would be a permissions issue.

    sed -i and Symbolic Links

    As noted by mklement0, applying sed -i to a symbolic link leads to a surprising result. sed -i does not update the file pointed to by the symbolic link. Instead, sed -i overwrites the symbolic link with a new regular file.

    This is a result of the call that sed makes to the STDIO rename. As documented by man 2 rename:

    if newpath refers to a symbolic link the link will be overwritten.

    mklement0 reports that this is also true of the (BSD) sed on Mac OSX 10.10.