Search code examples
ads

Does the ads.txt specification take a stand against spaces?


ads.txt is a file that every advertising supported website is supposed to put in its root folder. The IAB ads.txt specification instruct it to look like:

<FIELD #1>, <FIELD #2>, <FIELD #3>, <FIELD #4>

Data should be liberal in accepting files with varying whitespace or field separation characters.

But with the same breath it also mentions:

The consumer systems should ignore any sequence of whitespace or tabs. No field should contain tabs, commas or whitespace, otherwise it should be escaped with URL encoding [13].

My question is: if "no field should contain whitespace", why does the specification use spaces in its own examples [update: I meant spaces between the fields]? Is it acceptable or not? What should be the default? Stackoverflow itself uses spaces, BTW, but it doesn't mean it's right.


Solution

  • You asked:

    if "no field should contain whitespace", why does the specification use spaces in its own examples?

    The source of confusion is merely that <FIELD #1>, <FIELD #2>, ... is not an example, but rather a format or structure to follow, in other words a syntax. As a syntax <FIELD #1> is really a symbol, a stand-in to mark a position. You replace that position with a value such as greenadexchange.com

    The rule:

    No field should contain ... whitespace

    Could be better understood as:

    no field value should contain ... whitespace

    • field here is used in the sense of the content at a spot or position, in other words field value
    • so it would not be referring to the specific symbols the IAB writers decided to use such as <FIELD #1>.

    That said, the IAB writers could probably improve by writing symbols that didn't use spaces such as <FIELD#1> for the first field, <FIELD#2> for the second, and so on. It would help avoid potential risk of confusion while still conveying that there is a first field at this position, a second field at that position, and so on.

    You also wanted to clarify:

    But what about the surrounding spaces? Does the specification by default prefer <FIELD#1>,<FIELD#2>,<FIELD#3>,<FIELD#4> or <FIELD#1>, <FIELD#2>, <FIELD#3>, <FIELD#4>?

    Specification does not seem to have a preference, the specification only allows that surrounding space because it is ignored.

    Recommendation:

    • Have a space after the comma to help make it clear and more human-readable compared to just a long unbroken line.

    Detailed explanations below.

    Syntax

    When looking at additional text surrounding the <FIELD... you originally quoted , IAB ads.txt specification , PDF page 7:

    ... The records consist of a set of
    lines of the form:
    
    <FIELD #1>, <FIELD #2>, <FIELD #3>, <FIELD #4>
    
    or
    
    <VARIABLE>=<VALUE>
    

    It says of the form:, which suggests what it is about to present is structure, in other words syntax, rather than an example. It also doesn't use the word example until later pages, so this <FIELD #1>, ... is better understood as not being a literal example, but illustrating the syntax.

    Also, when you see something in angle brackets (<, >) such as <FIELD #1> and <VARIABLE> , this is a convention used often in technical manuals or documentation to mean that the angle brackets and whatever is inside, is to be replaced by a value when you actually use it.

    The field restrictions such as PDF page 8: No field should contain tabs, commas or whitespace , are better understood as restrictions on the field's value that you eventually write, and nothing to do with <FIELD #1> which was merely a convention, a way to communicate to you that there is a first field here at that spot or "position".

    Example 4.1

    Later on in the section: 4. EXAMPLES

    We can see how the syntax we saw earlier:

    <FIELD #1>, <FIELD #2>, <FIELD #3>, <FIELD #4>
    

    Is applied like this:

    greenadexchange.com, XF7342, DIRECT, 5jyxf8k54
    

    We can compare syntax with its implementation like so:

    | SYNTAX     | EXAMPLE             | WHAT                             |
    | ---        | ---                 | ---                              |
    | <FIELD #1> | greenadexchange.com | Domain value                     |
    | ,          | ,                   | Delimiter                        |
    |            |                     | Ignorable whitespace             |
    | <FIELD #2> | XF7342              | Publisher's Account ID value     |
    | ,          | ,                   | Delimiter                        |
    |            |                     | Ignorable whitespace             |
    | <FIELD #3> | DIRECT              | Type of account value            |
    | ,          | ,                   | Delimiter                        |
    |            |                     | Ignorable whitespace             |
    | <FIELD #4> | 5jyxf8k54           | Certification Authority ID value |
    
    • <FIELD #1> of the syntax, replaced to greenadexchange.com, and greenadexchange.com has no spaces in it
    • comma (,) is a delimiter because specification, PDF page 7: a comma separated format so it is a separator or delimiter
    • whitespace () is going to be ignored because generally all whitespace is meant to be ignored: The consumer systems should ignore any sequence of whitespace or tabs.
    • this is why for your field value, specifications emphasize that you should not have whitespace, likely because it could confuse things since it is supposed to ignore whitespace

    The same understanding applies for the rest in this example

    These are the rules, but as for preferences, I wasn't able to find anything specifically preferring space or no space after the comma.

    But I highly recommend write comma followed by space (,) because it is clearer, more human-readable than an unbroken line of text. Clarity helps minimize risk of misunderstandings and errors, and generally makes it more maintainable in the long run.

    Additional thoughts

    As you read more technical documentation, you might see other examples of the angle bracket (<...>) to mean something you replace. One interesting instance is in The Open Group Base Specifications Issue 7, 2018 edition, 12. Utility Conventions:

    4. Frequently, names of parameters that require substitution by actual values are shown with embedded <underscore> characters. Alternatively, parameters are shown as follows:
    
    <parameter name>
    
    • The <FIELD 1> that led to your confusion seems to be this alternate method

    If the IAB writers wanted to be more helpful and avoid risk of confusion, they probably could have written the syntax with no spaces

    <FIELD#1>, <FIELD#2>, <FIELD#3>, <FIELD#4>
    

    Or "embedded underscore" characters as suggested by the Open Group Base :

    <FIELD_1>, <FIELD_2>, <FIELD_3>, <FIELD_4>
    

    That probably would have been more helpful because it further emphasizes the point about no whitespace, as well as avoids potential confusion.