Search code examples
web-scrapingrustmp3

Why does reqwest download a binary file as text?


I am trying to scrape a page that automatically downloads an mp3 file https://dl251.filemate24.shop/?file=M3R4SUNiN3JsOHJ6WWRQNXNPUFN2cFdxRVJIOGhmSXBsY1l1d2hrdFN1QnRxNGc5M3UraGFPWkpLSzRNeEl1dVd1aGQ4VHZYVG9uZE93MlpwZFlyVWlESDhkOHh2QURmOHBvb0JJd2pjQS8zanZLMmxEUXoyeUg0Ym91SVI1NE9LQ1Zka1ZJNzN5eWh5L0wrbHpqSDZpdW1wVjJRSWlrYTRYME1PUHFOeEt3TzBISGJadVhoeDVrSXFIdk90Y29maTZYTDVWYWdoYUE3dnVOMlZrRjNlTTBNbjRuamd2VE8vQT09 but using a get request and putting the output in a .txt file results in a 28,000 line long file of random unicode characters. Is there any way to get the actual mp3 file? And yes, I am trying to download a random rammstein song as a test for a larger project.

I checked using inspect element and all the site does is indeed one get request that somehow results in a file. I'm pretty sure that the random unicode is related to LAME mp3 encryption, but I'm not sure how. Is there any way to get an mp3 file from a get request? Rust code that results in said file:

#[tokio::main]
async fn get(url: &str,client: &reqwest::Client) -> String {
    let response = client
    .get(url).send()
    .await
    .unwrap()
    .text()
    .await
    .expect("GET failed");

    response

}
fn main() {
    let client = reqwest::Client::new();
    let url = "https://dl251.filemate24.shop/?file=M3R4SUNiN3JsOHJ6WWRQNXNPUFN2cFdxRVJIOGhmSXBsY1l1d2hrdFN1QnRxNGc5M3UraGFPWkpLSzRNeEl1dVd1aGQ4VHZYVG9uZE93MlpwZFlyVWlESDhkOHh2QURmOHBvb0JJd2pjQS8zanZLMmxEUXoyeUg0Ym91SVI1NE9LQ1Zka1ZJNzN5eWh5L0wrbHpqSDZpdW1wVjJRSWlrYTRYME1PUHFOeEt3TzBISGJadVhoeDVrSXFIdk90Y29maTZYTDVWYWdoYUE3dnVOMlZrRjNlTTBNbjRuamd2VE8vQT09";
    println!("{}",url);
    let mp3 = get(url,&client);
    let mut file = std::fs::File::create("a.txt");
    std::fs::write("a.txt",mp3);
}

Sorry if this is a stupid question; I'm pretty new to rust and coding in general.


Solution

  • You're seeing text data because you're asking for text data when you use .text() on the response. According to the .text() documentation:

    This method decodes the response body with BOM sniffing and with malformed sequences replaced with the REPLACEMENT CHARACTER.

    In other words, this modifies the response to be valid text by replacing any bytes that don't represent valid text with a special Unicode character. This ensures the result is valid as a Rust String (which, by definition, must be UTF-8 encoded text). Effectively, this function call corrupts the MP3 data you're expecting to download.

    Use .bytes() instead, and change the return type of get() to Vec<u8>.

    As a side note, you're using async (tokio) when it's not necessary. Consider instead using reqwest::blocking.