How to make a plot using vegabrite

This vignette covers the basics of how to make a chart using vegabrite. For more about the design philosophy of the package, see the design vignette. For example charts see the examples vignette. For much more information about what is possible with Vega-Lite, see the Vega-Lite documentation.

Installation

vegabrite is not yet on cran but can be installed from github:

remotes::install_github('vegawidget/vegabrite')

The basics: data + mark + encoding

All plots require data, a mark and encodings. To intialize a chart you use

library(vegabrite)
vl_chart()

Without the other components it will be empty! But this output can be piped into commands to add the other components. Some components can also be added directly via that function call.

Data

vl_chart() %>%
  vl_add_data(values = mtcars)

Still empty! Need to set mark and encoding

We could also have passed the data directly to vl_chart:

vl_chart(mtcars)

Data can also be passed as a url using vl_add_data_url or the url argument to vl_add_data. vl_chart should also be able to recognize if the argument to data is a url and treat it as such.

Mark

The ‘mark’ is the type of shape being plotted, e.g. ‘point’, ‘line’, ‘bar’.

vl_chart() %>%
  vl_add_data(values = mtcars) %>%
  vl_mark_point()

Without encodings, all the points show up on top of each other!

It can be passed without arguments besides the input spec (which often will be piped in), but also can take in arguments to change its properties.

vl_chart() %>%
  vl_add_data(values = mtcars) %>%
  vl_mark_point(color = 'red')

Encoding

The encoding tells the plot what to use along the x or y axis, as well as what to use for color, tooltips, links, and other aspects of the plot that might be controlled by a column of your data. Encodings can also take constant values (generally with the value argument).

Functions to add an encoding are of the format vl_encode_* where the * is the name of the encoding, e.g. x:

vl_chart() %>%
  vl_add_data(values = mtcars) %>%
  vl_mark_point() %>%
  vl_encode_x(field = 'wt', type = 'quantitative')

Multiple encodings can be added; it does not matter what order they are added.

vl_chart() %>%
  vl_add_data(values = mtcars) %>%
  vl_mark_point() %>%
  vl_encode_x(field = 'wt', type = 'quantitative') %>%
  vl_encode_y(field = 'mpg', type = 'quantitative')

Type shorthands

A shorthand can be used to specify the type. The field name can be joined using a colon with a single letter abbreviation. The abbreviations are:

vl_chart() %>%
  vl_add_data(values = mtcars) %>%
  vl_mark_point() %>%
  vl_encode_x('wt:Q') %>%
  vl_encode_y('mpg:Q')

Type inference

When the data has been added to the spec already and is given directly by an R data.frame (rather than a URL or a transformation), the type can also be inferred and does not need to be given.

vl_chart() %>%
  vl_add_data(values = mtcars) %>%
  vl_mark_point() %>%
  vl_encode_x('wt') %>%
  vl_encode_y('mpg')

`vl_encode`

Each encoding can be passed by a separate function, or a single call to vl_encode can be used:

vl_chart() %>%
  vl_add_data(values = mtcars) %>%
  vl_mark_point() %>%
  vl_encode(x = 'wt', y = "mpg")

When not providing additional arguments for x or y encoding this form can be more succinct, but specifying each encoding separately can be easier when passing lots of arguments.

For help constructing a full encoding object to vl_encode one can use functions within vl list of functions.

vl_chart() %>%
  vl_add_data(values = mtcars) %>%
  vl_mark_point() %>%
  vl_encode(
    x = vl$X(field = 'wt', title = "weight"), 
     y = "mpg"
    )

View Composition

Sub-plots or ‘views’ can be combined in a number of ways. Layering assembles views into the same plot. Facetting splits views based on a column of data with the same encodings per view. Repeating splits views based on what field to use as one or two of the encodings. Concatenation combines arbitrary views together.

Layering

Layering assembles views into the same plot. A single view can only have one ‘mark’, so layering is how a plot can be made with multiple marks. Those marks can have different encodings or the same encodings.

vl_layer

Two charts can be layered via vl_layer. Additional modifications can then be made to the resulting layered chart, although there are some limitations to the kinds of modifications that can be made to layered charts versus single view charts.

points_plot <- vl_chart() %>%
  vl_encode_x(field = "IMDB_Rating", type = "quantitative") %>%
  vl_encode_y(type = "quantitative", aggregate = "count") %>%
  vl_bin_x(maxbins=10) %>%
  vl_mark_bar()
  
avg_plot <- vl_chart() %>%
  vl_encode_x(field = "IMDB_Rating", type = "quantitative", aggregate = "mean") %>%
  vl_encode_color(value = "red") %>%
  vl_encode_size(value = 5) %>%
  vl_mark_rule()

vl_layer(points_plot, avg_plot) %>%
  vl_add_data_url("https://vega.github.io/vega-editor/app/data/movies.json")

In this case the data is added afterwards. That can help make sure the data isn’t duplicated, and helps keep the size of the spec smaller. Note that layers can have different data, too, in which case the data would get added to each chart first before layering.

The `+` operator

Layering can also be performed by using the + operator.

(points_plot + avg_plot) %>%
  vl_add_data_url("https://vega.github.io/vega-editor/app/data/movies.json")

Facetting splits up your data into different subplots based on a field in the data. This can be done across rows and/or columns. Facets can get added in two ways: via vl_facet_ functions or as encodings, eg vl_encode_row.

In terms of which option to use, they have slightly different options in terms of the parameters that can be given, but often are interchangeable.

This example splits the data between rows based on the country of origin.

vl_chart() %>%
  vl_add_data(url = "https://vega.github.io/vega-editor/app/data/cars.json") %>%
  vl_mark_bar() %>%
  vl_encode_x(field = "Horsepower", type = "quantitative") %>%
  vl_bin_x(maxbins = 15) %>%
  vl_encode_y(aggregate = "count", type = "quantitative") %>%
  vl_encode_row(field = "Origin", type = "nominal")

The following does the same as above, except via the vl_facet_row function.

vl_chart() %>%
  vl_add_data(url = "https://vega.github.io/vega-editor/app/data/cars.json") %>%
  vl_mark_bar() %>%
  vl_encode_x(field = "Horsepower", type = "quantitative") %>%
  vl_bin_x(maxbins = 15) %>%
  vl_encode_y(aggregate = "count", type = "quantitative") %>%
  vl_facet_row(field = "Origin", type = "nominal")

Repeating

Adding a ‘repeat’ is similar in some ways to facetting, except the same data gets shown in each subplot. Instead, what changes between subplots is what field gets used in the encoding.

To use ‘repeat’ one needs to both call a vl_repeat_* function (where * is wrap, row, or column) with the field names to use, and specify the field for an encoding as repeat:* with the matching value of *.For example:

vl_chart() %>%
  vl_add_data(url = "https://vega.github.io/vega-editor/app/data/cars.json") %>%
  vl_mark_bar() %>%
  vl_encode_x(field = "repeat:wrap", type = "quantitative", bin = TRUE) %>%
  vl_encode_y(aggregate = "count", type = "quantitative") %>%
  vl_encode_color(field = "Origin", type = "nominal") %>%
  vl_repeat_wrap("Horsepower", "Miles_per_Gallon", "Acceleration", "Displacement")

Alternatively a longer specification for the field can be used:

vl_chart() %>%
  vl_add_data(url = "https://vega.github.io/vega-editor/app/data/cars.json") %>%
  vl_mark_bar() %>%
  vl_encode_x(field = list(`repeat` = "repeat"), type = "quantitative", bin = TRUE) %>%
  vl_encode_y(aggregate = "count", type = "quantitative") %>%
  vl_encode_color(field = "Origin", type = "nominal") %>%
  vl_repeat_wrap("Horsepower", "Miles_per_Gallon", "Acceleration", "Displacement")

Concatenation

Different charts can be concatenated together via vl_concat, vl_hconcat, or vl_vconcat to concatenate in wrapping fashion (with number of columns set by columns arguments), horizontally, or vertically. Data can be set for each individual view (and can then be different) or can be set for the concatenated version.

bar_chart <- vl_chart() %>%
  vl_mark_bar() %>%
  vl_encode_x(field = "date:O", timeUnit = "month") %>%
  vl_encode_y(field = "precipitation:Q", aggregate = "mean")

point_chart <- vl_chart() %>%
  vl_mark_point() %>%
  vl_encode_x(field = "temp_min:Q", bin = TRUE) %>%
  vl_encode_y(field = "temp_max:Q", bin = TRUE) %>%
  vl_encode_size(aggregate = "count", type = "quantitative")
  
vl_hconcat(bar_chart, point_chart) %>% 
  vl_add_data(url = "https://vega.github.io/vega-lite/data/weather.csv") %>%
  vl_filter("datum.location === 'Seattle'")

Concatenation can also be done via the | or & operators, to do horizontal or vertical concatenation respectively.

(bar_chart | point_chart) %>% 
  vl_add_data(url = "https://vega.github.io/vega-lite/data/weather.csv") %>%
  vl_filter("datum.location === 'Seattle'")

(bar_chart & point_chart) %>% 
  vl_add_data(url = "https://vega.github.io/vega-lite/data/weather.csv") %>%
  vl_filter("datum.location === 'Seattle'")

Interactivity

Tooltips

There are three ways to get tooltips:

All the encodings

To get a tooltip that shows the values for fields set in the encoding, use tooltip = TRUE in the call to add a mark to the plot.

vl_chart(mtcars) %>%
  vl_mark_point(tooltip = TRUE) %>%
  vl_encode_x('wt') %>%
  vl_encode_y('mpg')

All data associated with point

To get all the data to show up (not just what is used by encoding) pass list(content = "data") to the tooltip argument.

vl_chart(mtcars) %>%
  vl_mark_point(tooltip = list(content = "data")) %>%
  vl_encode_x('wt') %>%
  vl_encode_y('mpg')

vegabrite also has a shortcut, whereby if you pass “data” or “encoding” to the tooltip argument, it will transform it into the above list format in the spec. So the following will do the same thing:

vl_chart(mtcars) %>%
  vl_mark_point(tooltip = "data") %>%
  vl_encode_x('wt') %>%
  vl_encode_y('mpg')

To some constant value

You can also set the tooltip to a constant value – e.g. to get each point to say “Hi!”:

vl_chart(mtcars) %>%
  vl_mark_point(tooltip = "Hi!") %>%
  vl_encode_x('wt') %>%
  vl_encode_y('mpg')

Via an encoding channel

We can also use a column of the data by passing the tooltip as an encoding. If we want to use the row.names of the data.frame we can use '_row' as the name:

vl_chart(mtcars) %>%
  vl_mark_point() %>%
  vl_encode_x('wt') %>%
  vl_encode_y('mpg') %>% 
  vl_encode_tooltip('_row')

If you want to pass multiple entries to the toolip, use vl_encode_tooltip_array instead.

vl_chart(mtcars) %>%
  vl_mark_point() %>%
  vl_encode_x('wt') %>%
  vl_encode_y('mpg') %>% 
  vl_encode_tooltip_array(c('_row','wt','mpg'))

Parameters

Parameters enable interactive manipulation of charts. There are two main types of parameters – variable parameters and selection parameters. Variable parameters are added using vl_add_parameter. For selection parameters, see next section of this doc.
A parameter can be a simple value or expression that can be used in other parts of the spec, or it can be something that is mapped to user input via a binding.

An example of adding a parameter that is fixed:

vl_chart(mtcars) %>%
  vl_add_parameter("radius", value = 10) %>% 
  vl_mark_bar(cornerRadius = list(expr = 'radius')) %>%
  vl_encode_x('wt', bin = TRUE) %>%
  vl_encode_y( aggregate = 'count')

The radius of the bar corners was set to a fixed variable of 15.

To bind a parameter to a user input, use the vl_bind_* family of functions, for example * vl_bind_range_input for binding to a slider * vl_bind_checkbox_input for binding to checkbox selections * vl_bind_radio_input for binding to radio selections * vl_bind_select_input for binding to dropdown selections * vl_bind_input for binding to any hmtl input * vl_bind_direct_input for binding to an externally defined element

An example of adding a parameter controlled by a range input:

vl_chart(mtcars) %>%
  vl_add_parameter("radius", value = 10) %>% 
  vl_bind_range_input("radius", min = 0, max = 15, step = 1) %>% 
  vl_mark_bar(cornerRadius = list(expr = 'radius')) %>%
  vl_encode_x('wt', bin = TRUE) %>%
  vl_encode_y( aggregate = 'count')

Selections

More complex parameters are “selections” – these map data queries to a parameter. There are two types of selections – interval and point, which can be added using vl_add_interval_selection or vl_add_point_selection, respectively.

The vl_condition_* family of functions can be used to condition an encoding on the selection.

Interval Selections

Interval selections enable selecting an interval on the chart. Below shows an example of adding an interval selection and conditioning the color on that selection. If you select an area of the plot, that area will remain colored by the hp field, and the remainder of the color will be ‘gray’.

vl_chart(mtcars) %>%
  vl_mark_point() %>%
  vl_encode_x('wt') %>%
  vl_encode_y('mpg') %>% 
  vl_add_interval_selection(name = 'brush') %>%
  vl_encode_color(value = 'gray') %>% 
  vl_condition_color(param = 'brush', field = 'cyl')

Binding interval to scales

Interval selections can be controlled via the scales, using bind = 'scales'. This enables using the view to pan and zoom on the data. An example – drag the plot panel to shift the area shown.

vl_chart(mtcars) %>%
  vl_mark_point() %>%
  vl_encode_x('wt') %>%
  vl_encode_y('mpg') %>% 
  vl_add_interval_selection(name = 'grid', bind = 'scales')

Point Selections

A point selection enables directly interacting with points (or lines, areas) on the chart. In the below example, clicking on a point will cause it to change color to red.

vl_chart(mtcars) %>%
  vl_mark_circle(size = 50) %>%
  vl_encode_x('wt') %>%
  vl_encode_y('mpg') %>% 
  vl_add_point_selection(name = 'pt', value = list()) %>%
  vl_encode_color(value = 'gray') %>% 
  vl_condition_color(param = 'pt', value = 'red', empty = FALSE)

These selections can be customized in wheter they respond to hover instead of click, whether they select the nearest point, how multiple items can be clicked, and how selections can be cleared. See (Vega-Lite documentation)[https://vega.github.io/vega-lite/docs/selection.html] for more that is possible with selections!

Binding legend to selection

A point selection can be controlled via the legend, using bind = 'legend'. For example:

vl_chart(data = 'https://vega.github.io/vega-editor/app/data/unemployment-across-industries.json') %>%
  vl_mark_area() %>%
  vl_encode_x(field = 'date', timeUnit = 'yearmonth') %>%
  vl_axis_x(domain = FALSE, format = '%Y', tickSize = 0) %>% 
  vl_encode_y(field = 'count', aggregate = 'sum', stack = 'center', axis = NULL) %>%
  vl_encode_color(field = 'series') %>%
  vl_scale_color(scheme = 'category20b') %>%
  vl_encode_opacity(value = 0.2) %>%
  vl_condition_opacity(param = 'industry', value = 1) %>%
  vl_add_point_selection(name = 'industry', fields = list('series'), bind = 'legend')

Installation

The basics: data + mark + encoding

Data

Mark

Encoding

Type shorthands

Type inference

`vl_encode`

View Composition

Layering

vl_layer

The `+` operator

Facet

Facets as encoding

Adding facets with facet functions

Repeating

Concatenation

Interactivity

Tooltips

All the encodings

All data associated with point

To some constant value

Via an encoding channel

Parameters

Selections

Interval Selections

Binding interval to scales

Point Selections

Binding legend to selection

How to make a plot using vegabrite

Installation

The basics: data + mark + encoding

Data

Mark

Encoding

Type shorthands

Type inference

vl_encode

View Composition

Layering

vl_layer

The + operator

Facet

Facets as encoding

Adding facets with facet functions

Repeating

Concatenation

Interactivity

Tooltips

All the encodings

All data associated with point

To some constant value

Via an encoding channel

Parameters

Selections

Interval Selections

Binding interval to scales

Point Selections

Binding legend to selection

`vl_encode`

The `+` operator