Loading in our Libraries
knitr::opts_chunk$set(echo = TRUE)
library(tidyverse)
library(here)
library(janitor)
library(plotly)
library(rnaturalearth)
library(leaflet)
library(sf)
library(vembedr)
Read in 2023 Sustainable Development Data with read_csv() and here()
sdr_data <- read_csv(here("data/SDR-2023-Data.csv"))
Clean column names
sdr_data <- sdr_data %>%
clean_names()
Let’s start with the histogram you created last lesson with ggplot() and geom_histogram()
ggplot(sdr_data, aes(x = goal_4_score, fill=regions_used_for_the_sdr)) +
geom_histogram()
Looks Nice, but let’s improve it!
We start with the same exact code and add to it with +
ggplot(sdr_data, aes(x = goal_4_score, fill=regions_used_for_the_sdr)) +
geom_histogram() +
theme_minimal() +
scale_fill_viridis_d() +
labs(title = "Distributions of SDG 4 Scores",
x = "SDG 4 Score",
y = "Number of Countries",
fill = "Region")
Awesome, looking much better!
Interactive visualizations are a really exciting part of data science! Interactivity engages the viewer by allowing them to explore the data in a way they cannot with static visualizations
The great part is that with the ggplotly() function from the plotly package, making interactive visualizations is very simple
First we create the plot with the exact same code
The only difference is that we assign the plot a name with <-
the same way we do with dataframes or lists
Next we put the name that we give the plot into the ggplotly function
goal_4_histogram <- ggplot(sdr_data, aes(x = goal_4_score, fill=regions_used_for_the_sdr)) +
geom_histogram() +
theme_minimal() +
scale_fill_viridis_d() +
labs(title = "Distributions of SDG 4 Scores",
x = "SDG 4 Score",
y = "Number of Countries",
fill = "Region")
ggplotly(goal_4_histogram)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 27 rows containing non-finite values (`stat_bin()`).
Epic! Now we can hover over the histogram and get some info
We can also double-click the boxes in the legend to only view the regions we’re interested in
Here is the code for a scatter plot you made in the last lesson:
ggplot(data = sdr_data, aes(x= goal_3_score, y = goal_4_score, color = regions_used_for_the_sdr)) +
geom_point()
Edit the code chunk above that generates the plot to:
AND
make it interactive with ggplotly()
leaflet
This takes a few fun steps to do
Lucky for us the rnaturalearth
package has this information for us
world <- ne_countries(scale = "medium", returnclass = "sf")
Now we have a dataframe named world in our environment
It has 241 locations/countries (rows) and 64 columns describing each location/country
We are only interested in 3 columns:
Let’s select the three columns we are interested, making the dataframe much smaller
world <- world %>%
select(name_long, iso_a3, geometry)
We’re almost ready to join these dataframes
We have ISO3 codes in both of our dataframes (sdr_data and world) but the columns that contain the ISO3 codes have different names
Let’s change the column in sdr_data called country_code_iso3 to iso_a3 to match the world dataframe
# Rename a column in a data frame or matrix
colnames(sdr_data)[which(colnames(sdr_data) == "country_code_iso3")] <- "iso_a3"
Perfect! Now we can join
There are many different ways to join data
Let’s use the left_join()
function
We use left join because we want the 3 columns we are interested in from world to be joined/added to sdr_data
There are 35 countries/places in the world dataframe that are not in our sdr_data. Using full_join()
would include these countries/places in the new joined dataframe, however, all the SDG info would be NA. TO avoid this, we’ll use left_join()
We’ll name the new joined dataframe sdr_data_world_joined and we will join by the column the 2 dataframes share: iso_a3
sdr_data_world_joined <- left_join(sdr_data, world, by = "iso_a3")
Nice!
The next 4 code chunks are slightly technical regarding the class of sdr_data_world_joined. We can check the class with the ‘class()’ function
class(sdr_data_world_joined)
## [1] "spec_tbl_df" "tbl_df" "tbl" "data.frame"
Right now, sdr_data_world_joined is a dataframe. In order to use the leaflet
package that generates the map, we need sdr_data_world_joined to be an sf dataframe. sf stands for spatial features. Converting sdr_data_world_joined to class sf dataframe is easy with the st_as_sf()
function and it allows leaflet to interpret the geometry column as something it can map
sdr_data_world_joined <- st_as_sf(sdr_data_world_joined)
Now lets check the class again to make sure sdr_data_world_joined is of class sf dataframe
class(sdr_data_world_joined)
## [1] "sf" "spec_tbl_df" "tbl_df" "tbl" "data.frame"
The last step before mapping with leaflet is specifying a coordinate reference system (crs). There are many different coordinate refernece systmes, and here we choose WGS84
sdr_data_world_joined <- st_transform(sdr_data_world_joined, "+proj=longlat +datum=WGS84")
Awesome! We’re ready to make a map
Let’s map SDG 7 Scores
the my_text part of the code determines what happens when we hover over a country
the leaflet()
function generates the map from the sdr_data_world_joined dataframe
mytext <- paste(
"Country: ", sdr_data_world_joined$country,"<br/>",
"Goal 7 Score: ", round(sdr_data_world_joined$goal_7_score, 2),
sep="") %>%
lapply(htmltools::HTML)
leaflet(sdr_data_world_joined) %>%
addTiles() %>%
setView( lat=10, lng=0 , zoom=2) %>%
addPolygons(stroke = FALSE, fillOpacity = 0.5, smoothFactor = 0.5, color = ~colorQuantile("YlOrRd", goal_7_score)(goal_7_score), label = mytext)
Can you edit the code chunk above that generates the map to: