Search code examples
pythonstring-literals

Python raw string literal with slash-w -- r`\w`


With python 2.7.8 and 3.4 on my machine when I have a backslash-W inside a raw string literal it's not getting treated as raw. Is this really the expected behaviour?

import os
import sys
wspace = r'D:\Feb-19'
tile = '116o'
ex1 = os.path.join(wspace, r'Work_{}\scratch.gdb'.format(tile))
ex2 = os.path.join(wspace, r'\Work_{}\scratch.gdb'.format(tile))

print(sys.version)

print('''\n--- Expected ---
no-slash-W      D:\Feb-19Work_116o\scratch.gdb
yes-slash-W     D:\Feb-19\Work_116o\scratch.gdb
''')

print('''--- Actual Result ---
no-slash-W      {}
yes-slash-W     {}
'''.format(ex1, ex2))

Result I get from PyScripter and remote python interpreter. Note 9W vs 9\W and D:\Feb vs D:\Work.

2.7.8 (default, Jun 30 2014, 16:03:49) [MSC v.1500 32 bit (Intel)]
--- Expected ---
no-slash-W      D:\Feb-19Work_116o\scratch.gdb
yes-slash-W     D:\Feb-19\Work_116o\scratch.gdb

--- Actual Result ---
no-slash-W      D:\Feb-19\Work_116o\scratch.gdb
yes-slash-W     D:\Work_116o\scratch.gdb

...and command shell Python 3:

D:\> python broken-raw-string-example.py
3.4.1 (v3.4.1:c0e311e010fc, May 18 2014, 10:45:13) [MSC v.1600 64 bit (AMD64)]

--- Expected ---
no-slash-W      D:\Feb-19Work_116o\scratch.gdb
yes-slash-W     D:\Feb-19\Work_116o\scratch.gdb

--- Actual Result ---
no-slash-W      D:\Feb-19\Work_116o\scratch.gdb
yes-slash-W     D:\Work_116o\scratch.gdb

Gist here if you want to stick a fork in it: https://gist.github.com/maphew/9368fe16df751b016bbd


Solution

  • It's not raw strings tripping you up here; you misunderstand os.path.join. os.path.join is supposed to add a slash when the component doesn't start with one. And if a slash already exists, it's treated as the beginning of an absolute path, which discards the preceding components and begins again "from scratch" (except, on Windows, the drive letter). From the docs:

    If a component is an absolute path, all previous components are thrown away and joining continues from the absolute path component.

    On Windows, the drive letter is not reset when an absolute path component (e.g., r'\foo') is encountered. If a component contains a drive letter, all previous components are thrown away and the drive letter is reset. Note that since there is a current directory for each drive, os.path.join("c:", "foo") represents a path relative to the current directory on drive C: (c:foo), not c:\foo.

    So it is expected and normal for:

    os.path.join('C:\a', 'b')
    

    to produce the string C:\a\b on Windows (the repr of which would be "C:\\a\\b" due to the necessary escaping of the backslash), while:

    os.path.join('C:\a', '\\b')
    

    the '\\b' means "starting a new absolute path from the current drive" and throws away the \a replacing it with \b. Similarly,

    os.path.join('C:\a', 'b', '\\c', 'd')
    

    would, when it sees '\\c', throw away aand b and build up the path from there, producing C:\c\d.