Search code examples
sqlrr-markdowndata-wranglingrsqlite

Use RSQLite to manipulate data frame in r directly using SQL


I have a data set of the form enter image description here

that I would like to change to this form below in R using SQL. enter image description here

I know that I could do this daily simply with dplyr but the point here is to learn to use SQL to create and manipulate a small relational database.

  • Price needs to be turned into a numeric value. Removing the "R" and spaces in between.

  • coordinates needs to be turned into 2 coordinates Long and Lat

  • floor size needs to be turned into a numeric from a string removing the space and "m^2" at the end.

Minimum working example

# Data to copy into sheet

       Price                            coordinates floor.size surburb       date
 R 1 750 000 -33.93082074573843, 18.857342125467635      68 m²     Jhb 2021-06-24
 R 1 250 000 -33.930077157927855, 18.85420954236195      56 m²     Jhb 2021-06-17
 R 2 520 000 -33.92954929205658, 18.857504799977896      62 m²     Jhb 2021-06-24

Code to manipulate in R markdown

```{r}
#install.packages("RSQLite", repos = "http://cran.us.r-project.org")

library(readxl)
library(dplyr)
library(RSQLite)
library(DBI)
library(knitr)

db <- dbConnect(RSQLite::SQLite(), ":memory:")

knitr::opts_chunk$set(echo = TRUE)
knitr::opts_chunk$set(connection = "db")


# Import data
dataH <- read_excel("C:/Users/Dell/Desktop/exampledata.xlsx")

``` 

```{sql, connection = db}
# SQL code passed directly
```

Edit 1:

The answer by @Onyambu works almost. It is producing an error with the coordinates. For example in the image below the last two coordinates are supposed to have a Long that starts with '18.85' instead of '.85' when the coordinate was "-33.930989501123, 18.857270308516927". How would I fix this?

enter image description here


Solution

  • Using the basic sql functions, you could do:

    ```{r setup, include=FALSE}
    knitr::opts_chunk$set(echo = TRUE,connection = "db")
    ```
    
    ```{r}
    db <- DBI::dbConnect(RSQLite::SQLite(), ":memory:")
    
    txt <- "Price coordinates floor.size surburb date\n
         'R 1 750 000' '-33.93082074573843, 18.857342125467635' '68 m²' Jhb 2021-06-24\n
         'R 1 250 000' '-33.930077157927855, 18.85420954236195' '56 m²' Jhb 2021-06-17\n
         'R 2 520 000' '-33.92954929205658, 18.857504799977896' '62 m²' Jhb 2021-06-24"
    
    dataH <- read.table(text = txt, header = TRUE) 
    DBI::dbWriteTable(db, 'dataH', dataH)
    ```
    
    
    ```{sql}
    SELECT REPLACE(SUBSTRING(price, 3, 100), ' ', '') price,
           replace(SUBSTRING(coordinates, 1, 20), ',', '') Lat,
           SUBSTRING(coordinates, 21, 255) Long,
           SUBSTRING(`floor.size`, 1, 2) floor_size,
           surburb,
           date
    FROM dataH
    ```