store_id | store_location | store_mngr | cust_id | cust_address | cust_name | date | total |
---|---|---|---|---|---|---|---|
1 | Baker St. | John | 34 | Downing Rd. | Kelly | 2019-02-12 | 14 |
1 | Baker St. | John | 34 | Downing Rd. | Kelly | 2019-02-06 | 12 |
1 | Baker St. | John | 108 | Berwick St. | Aleta | 2019-02-21 | 20 |
1 | Baker St. | John | 17 | Carnaby St. | Peter | 2019-01-14 | 18 |
2 | Abbey Road | George | 108 | Berwick St. | Aleta | 2019-02-23 | 18 |
2 | Abbey Road | George | 17 | Carnaby St. | Peter | 2019-01-25 | 18 |
2 | Abbey Road | George | 17 | Carnaby St. | Peter | 2019-01-26 | 12 |
2 | Abbey Road | George | 17 | Carnaby St. | Peter | 2019-01-27 | 17 |
store_id | cust_id | date | total |
---|---|---|---|
1 | 34 | 2019-02-12 | 14 |
1 | 34 | 2019-02-06 | 12 |
1 | 108 | 2019-02-21 | 20 |
1 | 17 | 2019-01-14 | 18 |
2 | 108 | 2019-02-23 | 18 |
2 | 17 | 2019-01-25 | 18 |
2 | 17 | 2019-01-26 | 12 |
2 | 17 | 2019-01-27 | 17 |
store_id | store_location | store_mngr |
---|---|---|
1 | Baker St. | John |
2 | Abbey Road | George |
cust_id | cust_names | cust_address |
---|---|---|
17 | Peter | Carnaby St. |
34 | Kelly | Downing Rd. |
108 | Aleta | Berwick St. |
id
columns (called keys).dplyr
also works with remote on-disk data stored in relational databases.For this to work you need
install.packages("dbplyr")
You’ll also need to install a DBI
backend package. The DBI
package provides a common interface that allows dplyr
to work with many different databases using the same code.
Some commonly used backends are:
RMariaDB
connects to MySQL and MariaDB.RPostgreSQL
connects to Postgres and Redshift.RSQLite
embeds a SQLite database.bigrquery
connects to Google’s BigQuery.There is a remote MySQL database hosted on the web which we have access to:
library(dplyr)library(dplyr)con <- DBI::dbConnect( drv = RMariaDB::MariaDB(), # MariaDB driver works for MySQL as well username = "sql7312029", dbname = "sql7312029", host = "sql7.freesqldatabase.com", password = rstudioapi::askForPassword("Database password"), port = "3306" )
con
to communicate with the database.The function DBI::dbListTables()
tells us which tables are in the database.
DBI::dbListTables(con)
## [1] "country_codes" "gapminder"
With dplyr::tbl()
we can create an object which is a reference to a table in the database. Most dplyr
functions work on this object as it was as a data frame (or tibble
).
gap_db <- tbl(con, "gapminder")class(gap_db)
## [1] "tbl_MariaDBConnection" "tbl_dbi" "tbl_sql" ## [4] "tbl_lazy" "tbl"
dplyr
Functions on the Databasegap_avg_db <- gap_db %>% group_by(continent) %>% select_if(is.numeric) %>% select(-year) %>% summarise_all(mean)
gap_avg_db
is a data frame it still is only a set of instructions to be performed by the data base. show_query()
tells us which instructions are sent to the database.
show_query(gap_avg_db)
## <SQL>## SELECT `continent`, AVG(`lifeExp`) AS `lifeExp`, AVG(`pop`) AS `pop`, AVG(`gdpPercap`) AS `gdpPercap`## FROM (SELECT `continent`, `lifeExp`, `pop`, `gdpPercap`## FROM `gapminder`) `dbplyr_001`## GROUP BY `continent`
dplyr
functions on the DatabaseWhen working with databases, dplyr
tries to be as lazy as possible:
collect()
actually pulls the data into R.
gap_avg <- gap_avg_db %>% collect()print(gap_avg)
## # A tibble: 5 x 4## continent lifeExp pop gdpPercap## <chr> <dbl> <dbl> <dbl>## 1 Africa 48.9 9916003. 2194.## 2 Americas 64.7 24504795. 7136.## 3 Asia 60.1 77038722. 7902.## 4 Europe 71.9 17169765. 14469.## 5 Oceania 74.3 8874672. 18622.
If the connection to the database is no longer needed you should disconnect to free system ressources.
DBI::dbDisconnect(con)rm(con)
RSQLite
packages. DBI::dbConnect()
with argument dbname = "path_to_database"
where "path_to_database"
is a path to
a non-existing file. In case "path_to_database"
is an existing SQLite database it
connects to that database.
library(dplyr)con <- DBI::dbConnect( drv = RSQLite::SQLite(), dbname = here::here("databases","mtcars.sqlite3"), )
With
DBI::dbWriteTable(con, name = "mtcars_db", value = mtcars)
we add the data frame mtcars
as a new table with name mtcars_db
to the database.
To add data to an existing table we use the additional argument append = TRUE
.
new_car <- data.frame(mpg = 1, cyl = 16, disp = 800, hp = 210, drat = 5, wt = 6, qsec = 2, vs = 1, am = 0)rownames(new_car) <- "Super Car 500"DBI::dbWriteTable(con, name = "mtcars_db", value = new_car, append = TRUE)DBI::dbDisconnect(con)
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |