Search code examples
rustfilenames

How do I split the final component of a &Path at a specific character?


I have a &Path and I need to split the final component filename into two parts at the first colon.

I can get the final component as an &OsStr (path.file_name()) - but then I'm a bit stuck on actually doing anything with the contents. The documentation gives me a few options:

  • to_str() or to_string_lossy(), which either fail or return a corrupted string if it's not UTF-8 (which isn't guaranteed!)
  • to_bytes() or to_cstring(), but they're marked as deprecated since Rust 1.6.
  • Right at the bottom I see impl OsStrExt with an as_bytes() method. OsStrExt is std::os::unix::ffi::OsStrExt which is described as "Unix-specific extensions to OsStr". However std::os::unix is apparently "Experimental extensions to std for Unix platforms."

Have I missed anything more standard?

As it happens I'm happy to limit to Unix for this application, so the OsStrExt::as_bytes looks like the best option for now; but is it really still experimental, or is the documentation out of date?


Solution

  • There's no standard way to deal with file system paths because not all platforms have the same rules regarding the representation and validity of paths.

    On Unix-based systems (Linux, Mac OS X, etc.), paths are a sequence of bytes (u8) that cannot contain null bytes. The std::os::unix module is available on those platforms. Although the module's description says "experimental", most of it is stable, so the stable features are guaranteed to remain available in future Rust 1.x releases.

    NOTE: Since the question and this answer were written, the module's description was amended and it's no longer described as experimental.

    On Windows NT, paths are a sequence of 16-bit words (usually interpreted as UTF-16 code units), which may contain unpaired surrogates. Internally, Rust converts these paths to WTF-8 (which is just UTF-8 with the addition of allowing the encoding of unpaired surrogates, U+D800–U+DFFF). The std::os::windows module is available on this platform. It provides different OsStrExt and OsStringExt traits that let you encode an OsStr to potentially ill-formed UTF-16 or decode a potentially ill-formed UTF-16 path to an OsString, but don't provide access to the WTF-8 representation.