Search code examples
stringtextjuliaescaping

Is it possible to use a "plain" long string?


In Julia, you can't store a string like that:

str = "\mwe"

Because there is a backslash. So the following allows you to prevent that:

str = "\\mwe"

The same occurs for "$, \n" and many other symbols. My question is, given that you have a extremely long string of thousands of characters and this is not very convenient to treat all the different cases even with a search and replace (Ctrl+H), is there a way to assign it directly to a variable?

Maybe the following (which I tried) gives an idea of what I'd like:

str = """\$$$ \\\nn\nn\m this is a very long and complicated (\n^$" string"""

Here """ is not suitable, what should I use instead?


Solution

  • Quick answer: raw string literals like raw"\$$$ \\\nn..." will get you most of the way there.

    Raw string literals allow you to put nearly anything you like between quotes and Julia will keep the characters as typed with no replacements, expansions, or interpolations. That means you can do this sort of thing easily:

    a = raw"\mwe"
    @assert codepoint(a[1]) == 0x5c  # Unicode point for backslash
    
    b = raw"$(a)"
    @assert codepoint(b[1]) == 0x25  # Unicode point for dollar symbol
    

    The problem is always the delimiters that define where the string begins and ends. You have to have some way of telling Julia what is included in the string literal and what is not, and Julia uses double inverted commas to do that, meaning if you want double inverted commas in your string literal, you still have to escape those:

    c = raw"\"quote"  # note the backslashe
    @assert codepoint(c[1]) == 0x22  # Unicode point for double quote marks
    

    If this bothers you, you can combine triple quotes with raw, but then if you want to represent literal triple quotes in your string, you still have to escape those:

    d = raw""""quote"""  # the three quotes at the beginning and three at the end delimit the string, the fourth is read literally
    @assert codepoint(d[1]) == 0x22  # Unicode point for double quote marks
    
    e = raw"""\"\"\"""" # In triple quoted strings, you do not need to escape the backslash
    @assert codeunits(e) == [0x22, 0x22, 0x22]  # Three Unicode double quote marks
    

    If this bothers you, you can try to write a macro that avoids these limitations, but you will always end up having to tell Julia where you want to start processing a string literal and where you want to end processing a string literal, so you will always have to choose some way to delimit the string literal from the rest of the code and escape that delimiter within the string.

    Edit: You don't need to escape backslashes in raw string literals in order to include quotation marks in the string, you just need to escape the quotes. But if you want a literal backslash followed by a literal quotation mark, you have to escape both:

    f = raw"\"quote"
    @assert codepoint(f[1]) == 0x22  # double quote marks
    
    g = raw"\\\"quote"  # note the three backslashes
    @assert codepoint(g[1]) == 0x5c  # backslash
    @assert codepoint(g[2]) == 0x22  # double quote marks
    

    If you escape the backslash and not the quote marks, Julia will get confused:

    h = raw"\\"quote"
    # ERROR: syntax: cannot juxtapose string literal
    

    This is explained in the caveat in the documentation.