library(virgo)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(tidyr)
library(gapminder)

This example is inspired by the Many Models example in R for Data Science by Grolemund and Wickham.

To begin we create a drop down menu with each continent using input_select() and then we create a line plot for each country’s life expectancy over time. Here the selection modifies the opacity of each line, so we can compare countries within a continent to each other:

selection <- select_bind(
  continent = input_select(
    name = "Select Continent:",
    choices = c(NA, levels(gapminder$continent)))
)
  
lex <- gapminder %>% 
  vega(enc(x = year, y = lifeExp, group = country)) %>% 
  mark_line(
    enc(opacity = encode_if(selection, 0.5, 0.1))
  )
lex 

We see that for most countries life expectancy is generally increasing, however several countries from Africa and Asia have dips in their life expectancies.

Next for each country we can estimate a simple linear model to summarise the relationship of how life expectancy has changed over time. We will use \(R^2\) to assess the quality of the fit and link that back to our raw data

# we could do this with broom but this seems
# ok for now
gapminder_augmented <- gapminder %>% 
    group_nest(continent, country) %>% 
    mutate(
      model = lapply(data, function(x) lm(lifeExp ~ year, data = x)),
      r2 = vapply(model, function(x) summary(x)$r.squared, numeric(1)),
    ) %>% 
  unnest(data) %>% 
  select(-model)
  
gapminder_augmented
#> # A tibble: 1,704 x 7
#>    continent country  year lifeExp      pop gdpPercap    r2
#>    <fct>     <fct>   <int>   <dbl>    <int>     <dbl> <dbl>
#>  1 Africa    Algeria  1952    43.1  9279525     2449. 0.985
#>  2 Africa    Algeria  1957    45.7 10270856     3014. 0.985
#>  3 Africa    Algeria  1962    48.3 11000948     2551. 0.985
#>  4 Africa    Algeria  1967    51.4 12760499     3247. 0.985
#>  5 Africa    Algeria  1972    54.5 14760787     4183. 0.985
#>  6 Africa    Algeria  1977    58.0 17152804     4910. 0.985
#>  7 Africa    Algeria  1982    61.4 20033753     5745. 0.985
#>  8 Africa    Algeria  1987    65.8 23254956     5681. 0.985
#>  9 Africa    Algeria  1992    67.7 26298373     5023. 0.985
#> 10 Africa    Algeria  1997    69.2 29072015     4797. 0.985
#> # … with 1,694 more rows

Now we can create a plot driven selection, that will display the \(R^2\) values within each continent:

select_r2 <- select_interval("y")

tick_plot <- gapminder_augmented %>% 
    vega(enc(x = continent, y = r2)) %>% 
    mark_tick(
      enc(opacity = encode_if(select_r2, 1, 0.1))
    )
tick_plot

New we can overlay the real data alongside the model \(R^2\) values to identify which countries have poor fits. First, we modify the color scale to match the provided gapminder::country_colors and then produce a line plot.

palette <- country_colors[sort(names(country_colors))]

country_fits <- gapminder_augmented %>% 
  vega(enc(x = year, y = lifeExp, color = country)) %>% 
  mark_line(selection = select_r2) %>% 
  scale_color(range = palette ,guide = FALSE)

country_fits

Next we combine them together to see which countries have poor fits.

hconcat(tick_plot, country_fits)

The countries with \(R^2\) below 0.2 have strong non-linear trends in life expectancies hovering over the lines identifies them to be countries affected by genocide (Rwanda) or the HIV/AIDS epidemic (Botswana, Lesotho, Swaziland, Zambia, Zimbabwe).