There are four foundations upon which this package rests:

• the Altair Python package, to build chart specifications
• the reticulate R package, to provide inter-operability with Python
• the Vega-Lite JavaScript framework, to render chart specifications in an HTML file
• the htmlwidgets R package, to provide inter-operability with HTML and JavaScript

This article deals with the first two items; the Field Guide to Rendering deals with the other two.

The purpose of this document is to try to collect in one place, in a semi-organized fashion, all the fiddly-bits we have found dealing with Python stuff. If you get a cryptic Python error, check here. If you find a workaround for something that isn’t here, please let us know!

## Overview

The Altair documentation is the best resource for learning how to create charts. In the course of building and documenting this package, we have noted a few “gotchas” and their workarounds. If you find another, please let us know!

Here’s the short version:

• Where you see a ., use a $ instead. • Altair methods return a copy of the object. Assignment of a Python object returns a reference, not a copy. • To get a copy a of “bare” Altair object, use a $copy() method.

• If you have a dataset that has variables with dots in their names, e.g. Sepal.Width, you have to make some accommodation when referring to such names in Altiar. As a workaround, you can use square-brackets to refer to “[Sepal.Width]”.

• There is an Altair Chart method called repeat(), which in R is a reserved word, so it needs to be enclosed in backticks: $repeat(). • Where you see an inversion operator, ~, like ~highlight, in Altair examples, call the method explicitly from R: hightlight$__invert__(). Alternatively, you may be able to rearrange the code so as to avoid using the inversion.

• Where you see a hyphen in the name of a Python object, use an underscore in R: vega_data$sf_temps() • Where you see a Python list, ["foo", "bar"], in Altair examples, use an unnamed list in R: list("foo", "bar"). • Where you see a Python dictionary, {'a' = "foo", 'b' = "bar"}, in Altair examples, use a named list in R: list(a = "foo", b = "bar") • Where you see a None in Altair examples, use a NULL in R. • You may see a function call with **, baz(a = 1, **{'foo': 'bar'}), in an Altair example. In R, interpolate the dictionary into the rest of the arguments, baz(a = 1, foo = "bar"). ## Data frames Consider this Python example: # Python from vega_datasets import data cars = data.cars() alt.Chart(cars).mark_point().encode( x='Horsepower', y='Miles_per_Gallon', color='Origin', ).interactive() In this case, we are supplying a Data Frame to a Chart() method. library("altair") vega_data <- import_vega_data() cars <- vega_data$cars()

chart <-
alt$Chart(cars)$
mark_point()$encode( x = "Horsepower:Q", y = "Miles_per_Gallon:Q", color = "Origin:N" ) ## Method chaining When reticulate returns a Python object with a custom class, it appears in R as an S3 object that behaves like a reference class. This means that if you see this sort of notation in Python: # Python foo.bar() You would use this notation in R: foo$bar()

In essence, you wherever you see a . in Python, use a $ in R. vega_data <- import_vega_data() cars <- vega_data$cars()

chart <-
alt$Chart(cars)$
mark_point()$encode( x = "Horsepower:Q", y = "Miles_per_Gallon:Q", color = "Origin:N" ) ## Altair method returns copy In Python, Altair methods return a copy of the object. To verify this, let’s use pryr:: library("pryr") #> Registered S3 method overwritten by 'pryr': #> method from #> print.bytes Rcpp chart_new <- chart$mark_point()

#> [1] FALSE

Although this looks like a reference-class method, the Altair method acts like an S3 method.

## Python assignment returns reference

The object returned by an Altair method is a modified copy of the calling-object, much as we are accustomed-to in R. However, it is important to note that using the R assignment operator (<-, =, ->) on a Python object returns a reference to the object rather than a copy.

This becomes apparent when assigning a “bare” object:

chart_new <- chart

#> [1] TRUE

To return a copy of the object, use a copy method.

chart_new <- chart$copy() address(chart_new) == address(chart) #> [1] FALSE ## Dots in variable names In Python, dots can refer to a nested structure within a Data Frame variable. Vega-Lite supports such nesting, so it assumes that a dot in a variable-name will refer to a nested variable. This means that we can run into trouble using R’s iris dataset: # does not render properly chart_iris_r <- alt$Chart(iris)$encode( x = "Sepal.Width:Q", y = "Sepal.Length:Q", color = "Species:N" )$
mark_point()

chart_iris_r

The problem here is that there are variables whose names have dots in them, e.g. Sepal.Width. One workaround is to use square brackets when referring to such variable names; another is to use backslashes, \\:

chart_iris_r <-
alt$Chart(iris)$
encode(
x = "[Sepal.Width]:Q",
y = "Sepal\\.Length:Q",
color = "Species:N"
)$mark_point() chart_iris_r As you can see, this has the side-effect of showing the brackets and slashes in the scale labels. To fix the fix, you can set the title for each axis: chart_iris_r$
encode(
x = alt$X( field = "[Sepal.Width]", type = "quantitative", axis = alt$Axis(title = "Sepal.Width")
),
y = alt$Y( field = "Sepal\\.Length", type = "quantitative", axis = alt$Axis(title = "Sepal.Length")
),
color = "Species:N"
)

## Repeat

As shown in the View Composistion article, you can use the repeat() method to compose one-or-more charts such that the only thing different among them is an encoding.

However, the article notes, there is a catch: repeat is a reserved word in R, so we have to enclose it in backticks, e.g. $repeat(). chart_repeat <- alt$Chart(vega_data$iris())$
encode(
x = alt$X(alt$repeat("column"), type = "quantitative"),
y = alt$Y(alt$repeat("row"), type = "quantitative"),
color = "species:N"
)$mark_point()$
properties(
width = 200,
height = 200
)$repeat( row = list("petalLength", "petalWidth"), column = list("sepalLength", "sepalWidth") ) chart_repeat ## Inversion: ~ This is another case where an operator has a completely different meaning in Python than it has in R. As you know, the ~ operator is used to construct a formula. In Python, it is the bitwise inversion operator. You might come across this in an Altair example where the operator is used to invert a selection. # Python highlight = alt.selection(type='single', on='mouseover', fields=['symbol'], nearest=True) alt.condition(~highlight, alt.value(1), alt.value(3)) There are a couple of alternatives available here, the first is to invoke the $__invert__() operator explicitly.

# R
highlight <-
alt$selection( type = "single", on = "mouseover", fields = list("symbol"), nearest = TRUE ) alt$condition(highlight$__invert__(), alt$value(1), alt$value(3)) The second alternative is to swap the order of the if_true and if_false arguments in alt$condition().

# R
highlight <-
alt$selection( type = "single", on = "mouseover", fields = list("symbol"), nearest = TRUE ) alt$condition(highlight, alt$value(3), alt$value(1))

## Hyphens in Python Names

This comes up in Vega datasets. Let’s use the $list_datasets() method to get the names of the datasets that contain a hyphen. vega_data$list_datasets() %>% stringr::str_subset("-")
#>  [1] "co2-concentration"              "flare-dependencies"
#>  [3] "flights-10k"                    "flights-200k"
#>  [5] "flights-20k"                    "flights-2k"
#>  [7] "flights-3m"                     "flights-5k"
#>  [9] "flights-airport"                "gapminder-health-income"
#> [11] "iowa-electricity"               "la-riots"
#> [13] "normal-2d"                      "seattle-temps"
#> [15] "seattle-weather"                "sf-temps"
#> [17] "unemployment-across-industries" "us-10m"
#> [19] "us-employment"                  "us-state-capitals"
#> [21] "world-110m"

To refer to one of these datasets in R, substitute the hyphen with an underscore:

vega_data$sf_temps() %>% head() #> temp date #> 1 47.8 2010-01-01 00:00:00 #> 2 47.4 2010-01-01 01:00:00 #> 3 46.9 2010-01-01 02:00:00 #> 4 46.5 2010-01-01 03:00:00 #> 5 46.0 2010-01-01 04:00:00 #> 6 45.8 2010-01-01 05:00:00 ## Lists: [] and Dictionaries: {} A Python list corresponds to an atomic vector in R; a Python dictionary corresponds to a named list in R. # Python example_list = [1, 2, 3] example_dictionary = {'a': 1, 'b': 2, 'c': 3} In practice, we find that reticulate does the right thing if we provide an R unnamed list where Altair expects a list, and an R named list where Altair expects a dictionary. example_list <- list(1, 2, 3) example_dictionary <- list(a = 1, b = 2, c = 3) Consider this Altair example that uses lists and dictionaries. This is some of the Python bits: import altair as alt from vega_datasets import data flights = alt.UrlData(data.flights_2k.url, format={'parse': {'date': 'date'}}) brush = alt.selection(type='interval', encodings=['x']) Here’s an R translation of the complete example, which demonstrates interactive cross-filtering. flights <- alt$UrlData(
vega_data$flights_2k$url,
format = list(parse = list(date = "date"))
)

brush <- alt$selection(type = "interval", encodings = list("x")) # Define the base chart, with the common parts of the # background and highlights base <- alt$Chart()$mark_bar()$
encode(
x = alt$X( alt$repeat("column"),
type = "quantitative",
bin = alt$Bin(maxbins = 20) ), y = "count()" )$
properties(width = 180, height = 130)

# blue background with selection
background <- base$properties(selection = brush) # yellow highlights on the transformed data highlight <- base$
encode(color=alt$value("goldenrod"))$
transform_filter(brush$ref()) # layer the two charts & repeat alt$
layer(background, highlight, data = flights)$transform_calculate("time", "hours(datum.date)")$
repeat(column = list("distance", "delay", "time"))

## None and **{}

These concepts are not related other that they are found in the same example:

import altair as alt
import pandas as pd

activities = pd.DataFrame({'Activity': ['Sleeping', 'Eating', 'TV', 'Work', 'Exercise'],
'Time': [8, 2, 4, 8, 2]})

alt.Chart(activities).mark_bar().encode(
alt.X('PercentOfTotal:Q', axis=alt.Axis(format='.0%')),
y='Activity:N'
).transform_window(
window=[alt.WindowFieldDef(op='sum', field='Time', **{'as': 'TotalTime'})],
frame=[None, None]
).transform_calculate(
PercentOfTotal="datum.Time / datum.TotalTime"
)

In this example, we have a list containing None, which reticulate associates with R’s NULL.

We also have some syntax, **{'as': 'TotalTime'}. This is a mechanism to pass additional arguments to a Python function, perhaps similar to ... in R. It is passing a dictionary, so perhaps we can add the additional named argument in R:

library("tibble")

activities <-
data_frame(
Activity = c("Sleeping", "Eating", "TV", "Work", "Exercise"),
Time = c(8, 2, 4, 8, 2)
)
#> Warning: data_frame() is deprecated, use tibble().
#> This warning is displayed once per session.

chart <-
alt$Chart(activities)$
mark_bar()$encode( x = alt$X("PercentOfTotal:Q", axis = alt$Axis(format =".0%")), y = "Activity:N" )$
transform_window(
window = list(
alt$WindowFieldDef(op = "sum", field = "Time", as = "TotalTime") ), frame = list(NULL, NULL) )$transform_calculate(
PercentOfTotal = JS("datum.Time / datum.TotalTime")
)

chart`