Search code examples
arduinodos

An algorithm to convert normal filename to DOS 8.3 format


As the title suggests I'm looking for a way to convert "normal" filename into a short 8.3 format. But I need to do it completely without using any external tool for that, I have to know the algorithm.

Maybe I should explain why: I'm working with a module for SD cards on Arduino, and I found that this module for some reason saves files using 8.3 format, even that the same card inserted into normal computer can have got files with normal filenames.

The problem is, that when I want to save a file with this module, I have to choose a filename that conforms to the 8.3 format, or I'll be unable to save the file. Some similar problem exists with reading a file: if I give the module a normal filename then the module won't convert it automatically to 8.3 and I won't be able to read it.

Because I'm coding for Arduino I can't use any external tool like WinAPI's GetShortPathName function. I have to know the right algorithm.


Solution

  • The algorithm is described in the Microsoft Extensible Firmware Initiative FAT32 File System Specification. Download fatgen103.doc

    The technique chosen to auto-generate short names from long names is modeled after Windows NT. Auto-generated short names are composed of the basis-name and an optional numeric-tail.

    The Basis-Name Generation Algorithm

    The basis-name generation algorithm is outlined below. This is a sample algorithm and serves to illustrate how short names can be auto-generated from long names. An implementation should follow this basic sequence of steps.

    1. The UNICODE name passed to the file system is converted to upper case.
    2. The upper cased UNICODE name is converted to OEM.
      If (the uppercased UNICODE glyph does not exist as an OEM glyph in the OEM code page) or (the OEM glyph is invalid in an 8.3 name)
      {
      Replace the glyph to an OEM '_' (underscore) character.
      Set a "lossy conversion" flag.
      }
    3. Strip all leading and embedded spaces from the long name.
    4. Strip all leading periods from the long name.
    5. While (not at end of the long name) and (char is not a period) and (total chars copied < 8)
      {
      Copy characters into primary portion of the basis name
      }
    6. Insert a dot at the end of the primary components of the basis-name iff the basis name has an extension after the last period in the name.
    7. Scan for the last embedded period in the long name.
      If (the last embedded period was found)
      {
      While (not at end of the long name) and (total chars copied < 3)
      {
      Copy characters into extension portion of the basis name
      }
      }
    8. Proceed to numeric-tail generation.

    The Numeric-Tail Generation Algorithm

    If (a "lossy conversion" was not flagged) and (the long name fits within the 8.3 naming conventions) and (the basis-name does not collide with any existing short name)
    {
    The short name is only the basis-name without the numeric tail.
    }
    else
    {
    Insert a numeric-tail "~n" to the end of the primary name such that the value of the "~n" is chosen so that the name thus formed does not collide with any existing short name and that the primary name does not exceed eight characters in length.
    }

    The "~n" string can range from "~1" to "~999999". The number "n" is chosen so that it is the next number in a sequence of files with similar basis-names. For example, assume the following short names existed: LETTER~1.DOC and LETTER~2.DOC. As expected, the next auto-generated name of name of this type would be LETTER~3.DOC. Assume the following short names existed: LETTER~1.DOC, LETTER~3.DOC. Again, the next auto-generated name of name of this type would be LETTER~2.DOC. However, one absolutely cannot count on this behavior. In a directory with a very large mix of names of this type, the selection algorithm is optimized for speed and may select another "n" based on the characteristics of short names that end in "~n" and have similar leading name patterns.