Search code examples
validationparsingurlrustxss

How to prevent Rust's Url::parse from auto-encoding and instead throw error?


I am using Url::parse from the url crate

For example:

let input = "http://example.com:8080/?sort=custom&kind=comm        <    >   ents&scope=discover&time=6mo&page=2";

match Url::parse(&input) {
    Ok(u) => println!("Url: {}",u),
    Err(err) => println!("Error: {}",err),
}

I would expect this to trigger the Err arm because the input has spaces and also < and >.

But instead, it auto-encodes it and gives:

http://example.com:8080/?sort=custom&kind=comm%20%20%20%20%20%20%20%20%3C%20%20%20%20%3E%20%20%20ents&scope=discover&time=6mo&page=2

How to prevent this and instead trigger the Err arm?


Solution

  • Note that even though spaces and angle brackets in the query constitute validation errors, the URL Standard states that: "A validation error does not mean that the parser terminates."

    You can use syntax_violation_callback to report validation errors:

    use std::cell::RefCell;
    use url::{SyntaxViolation, Url};
    
    fn main() {
        let violations = RefCell::new(Vec::new());
        let input =
            "http://example.com:8080/?sort=custom&kind=comm <ents&scope=discover&time=6mo&page=2";
        let url = Url::options()
            .syntax_violation_callback(Some(&|v| violations.borrow_mut().push(v)))
            .parse(input)
            .unwrap();
        assert_eq!(
            url.as_str(),
            "http://example.com:8080/?sort=custom&kind=comm%20%3Cents&scope=discover&time=6mo&page=2"
        );
        assert_eq!(
            violations.into_inner(),
            vec![
                SyntaxViolation::NonUrlCodePoint,
                SyntaxViolation::NonUrlCodePoint
            ]
        );
    }
    

    Playground