Search code examples
filesystemsext2

How removing a file in ext2 file system work


I am learning the EXT2 file system. I am confused about how the removal of a file works for EXT2. My understanding is that, upon deletion, it doesn't actually deletes the inode, instead it marks some metadata as unused. My question is that, what metadata does it modify upon deletion, and how does the file system know that the file is deleted? Thanks.


Solution

  • In Linux this is implemented around ext2_delete_inode function of fs/ext2/inode.c file: http://lxr.free-electrons.com/source/fs/ext2/inode.c?v=2.6.32#L59

     56 /*
     57  * Called at the last iput() if i_nlink is zero.
     58  */
     59 void ext2_delete_inode (struct inode * inode)
     60 {
     61         truncate_inode_pages(&inode->i_data, 0);
      ..
     65         EXT2_I(inode)->i_dtime  = get_seconds();
     66         mark_inode_dirty(inode);
     67         ext2_write_inode(inode, inode_needs_sync(inode));
     68 
     69         inode->i_size = 0;
     70         if (inode->i_blocks)
     71                 ext2_truncate (inode);
     72         ext2_free_inode (inode);
     73 
     74         return;
      ..
     77 }
    

    So, it removes pages from page cache in truncate_inode_pages, sets dtime (deletion time) and marks inode as dirty - I_DIRTY which is combination of (I_DIRTY_SYNC | I_DIRTY_DATASYNC | I_DIRTY_PAGES):

    1601  * I_DIRTY_SYNC         Inode is dirty, but doesn't have to be written on
    1602  *                      fdatasync().  i_atime is the usual cause.
    1603  * I_DIRTY_DATASYNC     Data-related inode changes pending. We keep track of
    1604  *                      these changes separately from I_DIRTY_SYNC so that we
    1605  *                      don't have to write inode on fdatasync() when only
    1606  *                      mtime has changed in it.
    1607  * I_DIRTY_PAGES        Inode has dirty pages.  Inode itself may be clean.
    

    Then write modified inode, change it size to zero, truncate all blocks linked from inode with ext2_truncate() (the actual marking of data blocks as free is done there): http://lxr.free-electrons.com/source/fs/ext2/inode.c?v=2.6.32#L1025

    1025 void ext2_truncate(struct inode *inode)
    1026 {
    ..
    1059         n = ext2_block_to_path(inode, iblock, offsets, NULL);
     99 /*      ext2_block_to_path - parse the block number into array of offsets
    105  *      To store the locations of file's data ext2 uses a data structure common
    106  *      for UNIX filesystems - tree of pointers anchored in the inode, with
    107  *      data blocks at leaves and indirect blocks in intermediate nodes.
    108  *      This function translates the block number into path in that tree -
    109  *      return value is the path length and @offsets[n] is the offset of
    110  *      pointer to (n+1)th node in the nth one. If @block is out of range
    111  *      (negative or too large) warning is printed and zero returned. */
    1069         if (n == 1) {
    1070                 ext2_free_data(inode, i_data+offsets[0],
    1071                                         i_data + EXT2_NDIR_BLOCKS);
    1072                 goto do_indirects;
    1073         }
    ..
    1082                 ext2_free_branches(inode, &nr, &nr+1, (chain+n-1) - partial);
    ..
    1084         /* Clear the ends of indirect blocks on the shared branch */
    1085         while (partial > chain) {
    1086                 ext2_free_branches(inode,
    1087                                    partial->p + 1,
    1088                                    (__le32*)partial->bh->b_data+addr_per_block,
    1089                                    (chain+n-1) - partial);
    ..
    1094 do_indirects:
    1095         /* Kill the remaining (whole) subtrees */
    1096         switch (offsets[0]) {
    1097                 default:
    1098                         nr = i_data[EXT2_IND_BLOCK];
    1099                         if (nr) {
    1100                                 i_data[EXT2_IND_BLOCK] = 0;
    1101                                 mark_inode_dirty(inode);
    1102                                 ext2_free_branches(inode, &nr, &nr+1, 1);
    1103                         }
    1104                 case EXT2_IND_BLOCK:
    1105                         nr = i_data[EXT2_DIND_BLOCK];
    1106                         if (nr) {
    1107                                 i_data[EXT2_DIND_BLOCK] = 0;
    1108                                 mark_inode_dirty(inode);
    1109                                 ext2_free_branches(inode, &nr, &nr+1, 2);
    1110                         }
    1111                 case EXT2_DIND_BLOCK:
    1112                         nr = i_data[EXT2_TIND_BLOCK];
    1113                         if (nr) {
    1114                                 i_data[EXT2_TIND_BLOCK] = 0;
    1115                                 mark_inode_dirty(inode);
    1116                                 ext2_free_branches(inode, &nr, &nr+1, 3);
    1117                         }
    1118                 case EXT2_TIND_BLOCK:
    1119                         ;
    1120         }
    

    (why EXT2_TIND_BLOCK is not cleared?)

    Then we can free inode structure in kernel memory.

    how does the file system know that the file is deleted?

    The check is there in the ext2_iget function: http://lxr.free-electrons.com/source/fs/ext2/inode.c?v=2.6.32#L1251

    1251         /* We now have enough fields to check if the inode was active or not.
    1252          * This is needed because nfsd might try to access dead inodes
    1253          * the test is that same one that e2fsck uses
    1254          * NeilBrown 1999oct15
    1255          */
    1256         if (inode->i_nlink == 0 && (inode->i_mode == 0 || ei->i_dtime)) {
    1257                 /* this inode is deleted */
    1258                 brelse (bh);
    1259                 ret = -ESTALE;
    1260                 goto bad_inode;
    1261         }
    

    So, deleted inode is inode which has no incoming links (it not mentioned in any directory i_nlink) and have either zero mode or non-zero deletion time.