How to make a plot using vegabrite
introduction.Rmd
This vignette covers the basics of how to make a chart using vegabrite
. For more about the design philosophy of the package, see the design vignette. For example charts see the examples vignette. For much more information about what is possible with Vega-Lite, see the Vega-Lite documentation.
Installation
vegabrite
is not yet on cran but can be installed from github:
remotes::install_github('vegawidget/vegabrite')
The basics: data + mark + encoding
All plots require data, a mark and encodings. To intialize a chart you use
Without the other components it will be empty! But this output can be piped into commands to add the other components. Some components can also be added directly via that function call.
Data
vl_chart() %>%
vl_add_data(values = mtcars)
Still empty! Need to set mark and encoding
We could also have passed the data directly to vl_chart
:
vl_chart(mtcars)
Data can also be passed as a url using vl_add_data_url
or the url
argument to vl_add_data
. vl_chart
should also be able to recognize if the argument to data
is a url and treat it as such.
Mark
The ‘mark’ is the type of shape being plotted, e.g. ‘point’, ‘line’, ‘bar’.
vl_chart() %>%
vl_add_data(values = mtcars) %>%
vl_mark_point()
Without encodings, all the points show up on top of each other!
It can be passed without arguments besides the input spec (which often will be piped in), but also can take in arguments to change its properties.
vl_chart() %>%
vl_add_data(values = mtcars) %>%
vl_mark_point(color = 'red')
Encoding
The encoding tells the plot what to use along the x or y axis, as well as what to use for color, tooltips, links, and other aspects of the plot that might be controlled by a column of your data. Encodings can also take constant values (generally with the value
argument).
Functions to add an encoding are of the format vl_encode_*
where the *
is the name of the encoding, e.g. x
:
vl_chart() %>%
vl_add_data(values = mtcars) %>%
vl_mark_point() %>%
vl_encode_x(field = 'wt', type = 'quantitative')
Multiple encodings can be added; it does not matter what order they are added.
vl_chart() %>%
vl_add_data(values = mtcars) %>%
vl_mark_point() %>%
vl_encode_x(field = 'wt', type = 'quantitative') %>%
vl_encode_y(field = 'mpg', type = 'quantitative')
Type shorthands
A shorthand can be used to specify the type. The field name can be joined using a colon with a single letter abbreviation. The abbreviations are:
vl_chart() %>%
vl_add_data(values = mtcars) %>%
vl_mark_point() %>%
vl_encode_x('wt:Q') %>%
vl_encode_y('mpg:Q')
Type inference
When the data has been added to the spec already and is given directly by an R data.frame (rather than a URL or a transformation), the type can also be inferred and does not need to be given.
vl_chart() %>%
vl_add_data(values = mtcars) %>%
vl_mark_point() %>%
vl_encode_x('wt') %>%
vl_encode_y('mpg')
vl_encode
Each encoding can be passed by a separate function, or a single call to vl_encode
can be used:
vl_chart() %>%
vl_add_data(values = mtcars) %>%
vl_mark_point() %>%
vl_encode(x = 'wt', y = "mpg")
When not providing additional arguments for x or y encoding this form can be more succinct, but specifying each encoding separately can be easier when passing lots of arguments.
For help constructing a full encoding object to vl_encode
one can use functions within vl
list of functions.
vl_chart() %>%
vl_add_data(values = mtcars) %>%
vl_mark_point() %>%
vl_encode(
x = vl$X(field = 'wt', title = "weight"),
y = "mpg"
)
View Composition
Sub-plots or ‘views’ can be combined in a number of ways. Layering assembles views into the same plot. Facetting splits views based on a column of data with the same encodings per view. Repeating splits views based on what field to use as one or two of the encodings. Concatenation combines arbitrary views together.
Layering
Layering assembles views into the same plot. A single view can only have one ‘mark’, so layering is how a plot can be made with multiple marks. Those marks can have different encodings or the same encodings.
vl_layer
Two charts can be layered via vl_layer
. Additional modifications can then be made to the resulting layered chart, although there are some limitations to the kinds of modifications that can be made to layered charts versus single view charts.
points_plot <- vl_chart() %>%
vl_encode_x(field = "IMDB_Rating", type = "quantitative") %>%
vl_encode_y(type = "quantitative", aggregate = "count") %>%
vl_bin_x(maxbins=10) %>%
vl_mark_bar()
avg_plot <- vl_chart() %>%
vl_encode_x(field = "IMDB_Rating", type = "quantitative", aggregate = "mean") %>%
vl_encode_color(value = "red") %>%
vl_encode_size(value = 5) %>%
vl_mark_rule()
vl_layer(points_plot, avg_plot) %>%
vl_add_data_url("https://vega.github.io/vega-editor/app/data/movies.json")
In this case the data is added afterwards. That can help make sure the data isn’t duplicated, and helps keep the size of the spec smaller. Note that layers can have different data, too, in which case the data would get added to each chart first before layering.
The +
operator
Layering can also be performed by using the +
operator.
(points_plot + avg_plot) %>%
vl_add_data_url("https://vega.github.io/vega-editor/app/data/movies.json")
Facet
Facetting splits up your data into different subplots based on a field in the data. This can be done across rows and/or columns. Facets can get added in two ways: via vl_facet_
functions or as encodings, eg vl_encode_row
.
In terms of which option to use, they have slightly different options in terms of the parameters that can be given, but often are interchangeable.
Facets as encoding
This example splits the data between rows based on the country of origin.
vl_chart() %>%
vl_add_data(url = "https://vega.github.io/vega-editor/app/data/cars.json") %>%
vl_mark_bar() %>%
vl_encode_x(field = "Horsepower", type = "quantitative") %>%
vl_bin_x(maxbins = 15) %>%
vl_encode_y(aggregate = "count", type = "quantitative") %>%
vl_encode_row(field = "Origin", type = "nominal")
Adding facets with facet functions
The following does the same as above, except via the vl_facet_row
function.
vl_chart() %>%
vl_add_data(url = "https://vega.github.io/vega-editor/app/data/cars.json") %>%
vl_mark_bar() %>%
vl_encode_x(field = "Horsepower", type = "quantitative") %>%
vl_bin_x(maxbins = 15) %>%
vl_encode_y(aggregate = "count", type = "quantitative") %>%
vl_facet_row(field = "Origin", type = "nominal")
Repeating
Adding a ‘repeat’ is similar in some ways to facetting, except the same data gets shown in each subplot. Instead, what changes between subplots is what field gets used in the encoding.
To use ‘repeat’ one needs to both call a vl_repeat_*
function (where *
is wrap
, row
, or column
) with the field names to use, and specify the field
for an encoding as repeat:*
with the matching value of *
.For example:
vl_chart() %>%
vl_add_data(url = "https://vega.github.io/vega-editor/app/data/cars.json") %>%
vl_mark_bar() %>%
vl_encode_x(field = "repeat:wrap", type = "quantitative", bin = TRUE) %>%
vl_encode_y(aggregate = "count", type = "quantitative") %>%
vl_encode_color(field = "Origin", type = "nominal") %>%
vl_repeat_wrap("Horsepower", "Miles_per_Gallon", "Acceleration", "Displacement")
Alternatively a longer specification for the field can be used:
vl_chart() %>%
vl_add_data(url = "https://vega.github.io/vega-editor/app/data/cars.json") %>%
vl_mark_bar() %>%
vl_encode_x(field = list(`repeat` = "repeat"), type = "quantitative", bin = TRUE) %>%
vl_encode_y(aggregate = "count", type = "quantitative") %>%
vl_encode_color(field = "Origin", type = "nominal") %>%
vl_repeat_wrap("Horsepower", "Miles_per_Gallon", "Acceleration", "Displacement")
Concatenation
Different charts can be concatenated together via vl_concat
, vl_hconcat
, or vl_vconcat
to concatenate in wrapping fashion (with number of columns set by columns
arguments), horizontally, or vertically. Data can be set for each individual view (and can then be different) or can be set for the concatenated version.
bar_chart <- vl_chart() %>%
vl_mark_bar() %>%
vl_encode_x(field = "date:O", timeUnit = "month") %>%
vl_encode_y(field = "precipitation:Q", aggregate = "mean")
point_chart <- vl_chart() %>%
vl_mark_point() %>%
vl_encode_x(field = "temp_min:Q", bin = TRUE) %>%
vl_encode_y(field = "temp_max:Q", bin = TRUE) %>%
vl_encode_size(aggregate = "count", type = "quantitative")
vl_hconcat(bar_chart, point_chart) %>%
vl_add_data(url = "https://vega.github.io/vega-lite/data/weather.csv") %>%
vl_filter("datum.location === 'Seattle'")
Concatenation can also be done via the |
or &
operators, to do horizontal or vertical concatenation respectively.
(bar_chart | point_chart) %>%
vl_add_data(url = "https://vega.github.io/vega-lite/data/weather.csv") %>%
vl_filter("datum.location === 'Seattle'")
(bar_chart & point_chart) %>%
vl_add_data(url = "https://vega.github.io/vega-lite/data/weather.csv") %>%
vl_filter("datum.location === 'Seattle'")
Interactivity
Tooltips
There are three ways to get tooltips:
All the encodings
To get a tooltip that shows the values for fields set in the encoding, use tooltip = TRUE
in the call to add a mark to the plot.
vl_chart(mtcars) %>%
vl_mark_point(tooltip = TRUE) %>%
vl_encode_x('wt') %>%
vl_encode_y('mpg')
All data associated with point
To get all the data to show up (not just what is used by encoding) pass list(content = "data")
to the tooltip argument.
vl_chart(mtcars) %>%
vl_mark_point(tooltip = list(content = "data")) %>%
vl_encode_x('wt') %>%
vl_encode_y('mpg')
vegabrite
also has a shortcut, whereby if you pass “data” or “encoding” to the tooltip argument, it will transform it into the above list format in the spec. So the following will do the same thing:
vl_chart(mtcars) %>%
vl_mark_point(tooltip = "data") %>%
vl_encode_x('wt') %>%
vl_encode_y('mpg')
To some constant value
You can also set the tooltip to a constant value – e.g. to get each point to say “Hi!”:
vl_chart(mtcars) %>%
vl_mark_point(tooltip = "Hi!") %>%
vl_encode_x('wt') %>%
vl_encode_y('mpg')
Via an encoding channel
We can also use a column of the data by passing the tooltip as an encoding. If we want to use the row.names
of the data.frame
we can use '_row'
as the name:
vl_chart(mtcars) %>%
vl_mark_point() %>%
vl_encode_x('wt') %>%
vl_encode_y('mpg') %>%
vl_encode_tooltip('_row')
If you want to pass multiple entries to the toolip, use vl_encode_tooltip_array
instead.
vl_chart(mtcars) %>%
vl_mark_point() %>%
vl_encode_x('wt') %>%
vl_encode_y('mpg') %>%
vl_encode_tooltip_array(c('_row','wt','mpg'))
Parameters
Parameters enable interactive manipulation of charts. There are two main types of parameters – variable parameters and selection parameters. Variable parameters are added using vl_add_parameter
. For selection parameters, see next section of this doc.
A parameter can be a simple value or expression that can be used in other parts of the spec, or it can be something that is mapped to user input via a binding.
An example of adding a parameter that is fixed:
vl_chart(mtcars) %>%
vl_add_parameter("radius", value = 10) %>%
vl_mark_bar(cornerRadius = list(expr = 'radius')) %>%
vl_encode_x('wt', bin = TRUE) %>%
vl_encode_y( aggregate = 'count')
The radius of the bar corners was set to a fixed variable of 15.
To bind a parameter to a user input, use the vl_bind_*
family of functions, for example * vl_bind_range_input
for binding to a slider * vl_bind_checkbox_input
for binding to checkbox selections * vl_bind_radio_input
for binding to radio selections * vl_bind_select_input
for binding to dropdown selections * vl_bind_input
for binding to any hmtl input * vl_bind_direct_input
for binding to an externally defined element
An example of adding a parameter controlled by a range input:
vl_chart(mtcars) %>%
vl_add_parameter("radius", value = 10) %>%
vl_bind_range_input("radius", min = 0, max = 15, step = 1) %>%
vl_mark_bar(cornerRadius = list(expr = 'radius')) %>%
vl_encode_x('wt', bin = TRUE) %>%
vl_encode_y( aggregate = 'count')
Selections
More complex parameters are “selections” – these map data queries to a parameter. There are two types of selections – interval and point, which can be added using vl_add_interval_selection
or vl_add_point_selection
, respectively.
The vl_condition_*
family of functions can be used to condition an encoding on the selection.
Interval Selections
Interval selections enable selecting an interval on the chart. Below shows an example of adding an interval selection and conditioning the color on that selection. If you select an area of the plot, that area will remain colored by the hp
field, and the remainder of the color will be ‘gray’.
vl_chart(mtcars) %>%
vl_mark_point() %>%
vl_encode_x('wt') %>%
vl_encode_y('mpg') %>%
vl_add_interval_selection(name = 'brush') %>%
vl_encode_color(value = 'gray') %>%
vl_condition_color(param = 'brush', field = 'cyl')
Binding interval to scales
Interval selections can be controlled via the scales, using bind = 'scales'
. This enables using the view to pan and zoom on the data. An example – drag the plot panel to shift the area shown.
vl_chart(mtcars) %>%
vl_mark_point() %>%
vl_encode_x('wt') %>%
vl_encode_y('mpg') %>%
vl_add_interval_selection(name = 'grid', bind = 'scales')
Point Selections
A point selection enables directly interacting with points (or lines, areas) on the chart. In the below example, clicking on a point will cause it to change color to red.
vl_chart(mtcars) %>%
vl_mark_circle(size = 50) %>%
vl_encode_x('wt') %>%
vl_encode_y('mpg') %>%
vl_add_point_selection(name = 'pt', value = list()) %>%
vl_encode_color(value = 'gray') %>%
vl_condition_color(param = 'pt', value = 'red', empty = FALSE)
These selections can be customized in wheter they respond to hover instead of click, whether they select the nearest point, how multiple items can be clicked, and how selections can be cleared. See (Vega-Lite documentation)[https://vega.github.io/vega-lite/docs/selection.html] for more that is possible with selections!
Binding legend to selection
A point selection can be controlled via the legend, using bind = 'legend'
. For example:
vl_chart(data = 'https://vega.github.io/vega-editor/app/data/unemployment-across-industries.json') %>%
vl_mark_area() %>%
vl_encode_x(field = 'date', timeUnit = 'yearmonth') %>%
vl_axis_x(domain = FALSE, format = '%Y', tickSize = 0) %>%
vl_encode_y(field = 'count', aggregate = 'sum', stack = 'center', axis = NULL) %>%
vl_encode_color(field = 'series') %>%
vl_scale_color(scheme = 'category20b') %>%
vl_encode_opacity(value = 0.2) %>%
vl_condition_opacity(param = 'industry', value = 1) %>%
vl_add_point_selection(name = 'industry', fields = list('series'), bind = 'legend')