Search code examples
rkagglehttr

Download file by using R


I have this url : "https://www.kaggle.com/datasets/nelgiriyewithana/most-streamed-spotify-songs-2024/download"

I want to download the dataset from the url , but I failed... knowing it should be simple but I really stuck ...

Anyway, I use httr package. Here are my attempt(s):

First attempt :

url <- "https://www.kaggle.com/datasets/nelgiriyewithana/most-streamed-spotify-songs-2024/download"

download.file(url = url,
              destfile = "spotiy_2014_data")

Second attempt :

url <- "https://www.kaggle.com/datasets/nelgiriyewithana/most-streamed-spotify-songs-2024/download"

download.file(url = url,
              destfile = "spotiy_2014_data.zip")

How can I make it succeed? Need help...


Solution

  • With httr2 and Kaggle Token saved as kaggle.json in your working directory:

    library(httr2)
    
    credentials <- jsonlite::read_json("kaggle.json")
    request(paste0("https://www.kaggle.com/api/v1/datasets/download/", 
                   "nelgiriyewithana/most-streamed-spotify-songs-2024")) |> 
      req_auth_basic(credentials$username, credentials$key) |>
      req_perform(path = "kaggle_tmp.zip")
    #> <httr2_response>
    #> GET
    #> https://storage.googleapis.com:443/kaggle-data-sets/5218014/8700156/bundle/archive.zip?...
    #> Content-Type: application/zip
    #> Body: On disk 'kaggle_tmp.zip' (508134 bytes)
    
    readr::read_csv("kaggle_tmp.zip")
    #> Rows: 4600 Columns: 29
    #> ── Column specification ────────────────────────────────────────────────────────
    #> Delimiter: ","
    #> chr  (5): Track, Album Name, Artist, Release Date, ISRC
    #> dbl  (6): Track Score, Spotify Popularity, Apple Music Playlist Count, Deeze...
    #> num (17): All Time Rank, Spotify Streams, Spotify Playlist Count, Spotify Pl...
    #> lgl  (1): TIDAL Popularity
    #> 
    #> ℹ Use `spec()` to retrieve the full column specification for this data.
    #> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
    #> # A tibble: 4,600 × 29
    #>    Track  `Album Name` Artist `Release Date` ISRC  `All Time Rank` `Track Score`
    #>    <chr>  <chr>        <chr>  <chr>          <chr>           <dbl>         <dbl>
    #>  1 MILLI… "Million Do… Tommy… 4/26/2024      QM24…               1          725.
    #>  2 Not L… "Not Like U… Kendr… 5/4/2024       USUG…               2          546.
    #>  3 i lik… "I like the… Artem… 3/19/2024      QZJ8…               3          538.
    #>  4 Flowe… "Flowers - … Miley… 1/12/2023      USSM…               4          445.
    #>  5 Houdi… "Houdini"    Eminem 5/31/2024      USUG…               5          423.
    #>  6 Lovin… "Lovin On M… Jack … 11/10/2023     USAT…               6          410.
    #>  7 Beaut… "Beautiful … Benso… 1/18/2024      USWB…               7          407.
    #>  8 Gata … "Gata Only"  Floyy… 2/2/2024       QZL3…               8          376.
    #>  9 Danza… "\xfd\xfd\x… MUSIC… 6/9/2024       TCJP…               9          356.
    #> 10 BAND4… "BAND4BAND … Centr… 5/23/2024      USSM…              10          331.
    #> # ℹ 4,590 more rows
    #> # ℹ 22 more variables: `Spotify Streams` <dbl>, `Spotify Playlist Count` <dbl>,
    #> #   `Spotify Playlist Reach` <dbl>, `Spotify Popularity` <dbl>,
    #> #   `YouTube Views` <dbl>, `YouTube Likes` <dbl>, `TikTok Posts` <dbl>,
    #> #   `TikTok Likes` <dbl>, `TikTok Views` <dbl>, `YouTube Playlist Reach` <dbl>,
    #> #   `Apple Music Playlist Count` <dbl>, `AirPlay Spins` <dbl>,
    #> #   `SiriusXM Spins` <dbl>, `Deezer Playlist Count` <dbl>, …
    

    Created on 2024-06-25 with reprex v2.1.0