There are four foundations upon which this package rests:

• the Altair Python package, to build chart specifications
• the reticulate R package, to provide inter-operability with Python
• the Vega-Lite JavaScript framework, to render chart specifications in an HTML file
• the htmlwidgets R package, to provide inter-operability with HTML and JavaScript

This article deals with the first two items; the Field Guide to Rendering deals with the other two.

The purpose of this document is to try to collect in one place, in a semi-organized fashion, all the fiddly-bits we have found dealing with Python stuff. If you get a cryptic Python error, check here. If you find a workaround for something that isn’t here, please let us know!

## Overview

The Altair documentation is the best resource for learning how to create charts. In the course of building and documenting this package, we have noted a few “gotchas” and their workarounds. If you find another, please let us know!

Here’s the short version:

• Where you see a ., use a $ instead. • Altair methods return a copy of the object. Assignment of a Python object returns a reference, not a copy. • To get a copy a of “bare” Altair object, use a $copy() method.

• If you have a dataset that has variables with dots in their names, e.g. Sepal.Width, you have to make some accommodation when referring to such names in Altiar. As a workaround, you can use square-brackets to refer to “[Sepal.Width]”.

• There is an Altair Chart method called repeat(), which in R is a reserved word, so it needs to be enclosed in backticks: $repeat(). • Where you see an inversion operator, ~, like ~highlight, in Altair examples, call the method explicitly from R: hightlight$__invert__(). Alternatively, you may be able to rearrange the code so as to avoid using the inversion.

• Where you see a hyphen in the name of a Python object, use an underscore in R: vega_data$sf_temps() • Where you see a Python list, ["foo", "bar"], in Altair examples, use an unnamed list in R: list("foo", "bar"). • Where you see a Python dictionary, {'a' = "foo", 'b' = "bar"}, in Altair examples, use a named list in R: list(a = "foo", b = "bar") • Where you see a None in Altair examples, use a NULL in R. • You may see a function call with **, baz(a = 1, **{'foo': 'bar'}), in an Altair example. In R, interpolate the dictionary into the rest of the arguments, baz(a = 1, foo = "bar"). ## Data frames Consider this Python example: # Python from vega_datasets import data cars = data.cars() alt.Chart(cars).mark_point().encode( x='Horsepower', y='Miles_per_Gallon', color='Origin', ).interactive() In this case, we are supplying a Data Frame to a Chart() method. library("altair") vega_data <- import_vega_data() cars <- vega_data$cars()

chart <-
alt$Chart(cars)$
mark_point()$encode( x = "Horsepower:Q", y = "Miles_per_Gallon:Q", color = "Origin:N" ) ## Method chaining When reticulate returns a Python object with a custom class, it appears in R as an S3 object that behaves like a reference class. This means that if you see this sort of notation in Python: # Python foo.bar() You would use this notation in R: foo$bar()

In essence, you wherever you see a . in Python, use a $ in R. vega_data <- import_vega_data() cars <- vega_data$cars()

chart <-
alt$Chart(cars)$
mark_point()$encode( x = "Horsepower:Q", y = "Miles_per_Gallon:Q", color = "Origin:N" ) ## Altair method returns copy In Python, Altair methods return a copy of the object. To verify this, let’s use pryr:: library("pryr") #> Registered S3 method overwritten by 'pryr': #> method from #> print.bytes Rcpp chart_new <- chart$mark_point()

#> [1] FALSE

Although this looks like a reference-class method, the Altair method acts like an S3 method.

## Python assignment returns reference

The object returned by an Altair method is a modified copy of the calling-object, much as we are accustomed-to in R. However, it is important to note that using the R assignment operator (<-, =, ->) on a Python object returns a reference to the object rather than a copy.

This becomes apparent when assigning a “bare” object:

chart_new <- chart

#> [1] TRUE

To return a copy of the object, use a copy method.

chart_new <- chart$copy() address(chart_new) == address(chart) #> [1] FALSE ## Dots in variable names In Python, dots can refer to a nested structure within a Data Frame variable. Vega-Lite supports such nesting, so it assumes that a dot in a variable-name will refer to a nested variable. This means that we can run into trouble using R’s freeny dataset: # does not render properly chart_freeny_r <- alt$Chart(freeny)$encode( x = alt$X("income.level:Q", zero = FALSE),
y = alt$Y("market.potential:Q", zero = FALSE) )$
mark_point()

chart_freeny_r

The problem here is that there are variables whose names have dots in them, e.g. income.level. One workaround is to use square brackets when referring to such variable names; another is to use backslashes, \\:

chart_freeny_r <-
alt$Chart(freeny)$
encode(
x = alt$X("[income.level]:Q", scale = alt$Scale(zero = FALSE)),
y = alt$Y("market\\.potential:Q", scale = alt$Scale(zero = FALSE))
)$mark_point() chart_freeny_r As you can see, this has the side-effect of showing the brackets and slashes in the scale labels. To fix the fix, you can set the title for each axis: chart_freeny_r <- alt$Chart(freeny)$encode( x = alt$X(
"[income.level]:Q",
scale = alt$Scale(zero = FALSE), axis = alt$Axis(title = "income.level")
),
y = alt$Y( "market\\.potential:Q", scale = alt$Scale(zero = FALSE),
axis = alt$Axis(title = "market.potential") ) )$
mark_point()

chart_freeny_r

## Repeat

As shown in the View Composistion article, you can use the repeat() method to compose one-or-more charts such that the only thing different among them is an encoding.

However, the article notes, there is a catch: repeat is a reserved word in R, so we have to enclose it in backticks, e.g. $repeat(). chart_repeat <- alt$Chart(freeny)$encode( x = alt$X(
"[income.level]:Q",
scale = alt$Scale(zero = FALSE), axis = alt$Axis(title = "income.level")
),
y = alt$Y( alt$repeat("column"),
type = "quantitative",
scale = alt$Scale(zero = FALSE) ) )$
mark_point()$properties( width = 200, height = 200 )$
repeat(
column = list("[market.potential]", "[price.index]")
)

chart_repeat

As you can see, the repeat operator does not give us a way to customize the axis titles.

## Inversion: ~

This is another case where an operator has a completely different meaning in Python than it has in R. As you know, the ~ operator is used to construct a formula. In Python, it is the bitwise inversion operator.

You might come across this in an Altair example where the operator is used to invert a selection.

# Python
highlight = alt.selection(type='single', on='mouseover',
fields=['symbol'], nearest=True)

alt.condition(~highlight, alt.value(1), alt.value(3))

There are a couple of alternatives available here, the first is to invoke the $__invert__() operator explicitly. # R highlight <- alt$selection(
type = "single",
on = "mouseover",
fields = list("symbol"),
nearest = TRUE
)

alt$condition(highlight$__invert__(), alt$value(1), alt$value(3))

The second alternative is to swap the order of the if_true and if_false arguments in alt$condition(). # R highlight <- alt$selection(
type = "single",
on = "mouseover",
fields = list("symbol"),
nearest = TRUE
)

alt$condition(highlight, alt$value(3), alt$value(1)) ## Hyphens in Python Names This comes up in Vega datasets. Let’s use the $list_datasets() method to get the names of the datasets that contain a hyphen.

vega_data$list_datasets() %>% stringr::str_subset("-") #> [1] "annual-precip" "co2-concentration" #> [3] "flare-dependencies" "flights-10k" #> [5] "flights-200k" "flights-20k" #> [7] "flights-2k" "flights-3m" #> [9] "flights-5k" "flights-airport" #> [11] "gapminder-health-income" "iowa-electricity" #> [13] "la-riots" "normal-2d" #> [15] "seattle-temps" "seattle-weather" #> [17] "sf-temps" "unemployment-across-industries" #> [19] "uniform-2d" "us-10m" #> [21] "us-employment" "us-state-capitals" #> [23] "world-110m" To refer to one of these datasets in R, substitute the hyphen with an underscore: vega_data$sf_temps() %>% head()
#>   temp                date
#> 1 47.8 2010-01-01 00:00:00
#> 2 47.4 2010-01-01 01:00:00
#> 3 46.9 2010-01-01 02:00:00
#> 4 46.5 2010-01-01 03:00:00
#> 5 46.0 2010-01-01 04:00:00
#> 6 45.8 2010-01-01 05:00:00

## Lists: [] and Dictionaries: {}

A Python list corresponds to an atomic vector in R; a Python dictionary corresponds to a named list in R.

# Python

example_list = [1, 2, 3]
example_dictionary = {'a': 1, 'b': 2, 'c': 3}

In practice, we find that reticulate does the right thing if we provide an R unnamed list where Altair expects a list, and an R named list where Altair expects a dictionary.

example_list <- list(1, 2, 3)
example_dictionary <- list(a = 1, b = 2, c = 3)

Consider this Altair example that uses lists and dictionaries. This is some of the Python bits:

import altair as alt
from vega_datasets import data

flights = alt.UrlData(data.flights_2k.url,
format={'parse': {'date': 'date'}})

brush = alt.selection(type='interval', encodings=['x'])

Here’s an R translation of the complete example, which demonstrates interactive cross-filtering.

flights <-
alt$UrlData( vega_data$flights_2k$url, format = list(parse = list(date = "date")) ) brush <- alt$selection(type = "interval", encodings = list("x"))

# Define the base chart, with the common parts of the
# background and highlights
base <-
alt$Chart()$
mark_bar()$encode( x = alt$X(
alt$repeat("column"), type = "quantitative", bin = alt$Bin(maxbins = 20)
),
y = "count()"
)$properties(width = 180, height = 130) # blue background with selection background <- base$properties(selection = brush)

# yellow highlights on the transformed data
highlight <-
base$encode(color=alt$value("goldenrod"))$transform_filter(brush$ref())

# layer the two charts & repeat
alt$layer(background, highlight, data = flights)$
transform_calculate("time", "hours(datum.date)")$repeat(column = list("distance", "delay", "time")) ## None and **{} These concepts are not related other that they are found in the same example: import altair as alt import pandas as pd activities = pd.DataFrame({'Activity': ['Sleeping', 'Eating', 'TV', 'Work', 'Exercise'], 'Time': [8, 2, 4, 8, 2]}) alt.Chart(activities).mark_bar().encode( alt.X('PercentOfTotal:Q', axis=alt.Axis(format='.0%')), y='Activity:N' ).transform_window( window=[alt.WindowFieldDef(op='sum', field='Time', **{'as': 'TotalTime'})], frame=[None, None] ).transform_calculate( PercentOfTotal="datum.Time / datum.TotalTime" ) In this example, we have a list containing None, which reticulate associates with R’s NULL. We also have some syntax, **{'as': 'TotalTime'}. This is a mechanism to pass additional arguments to a Python function, perhaps similar to ... in R. It is passing a dictionary, so perhaps we can add the additional named argument in R: library("tibble") activities <- data_frame( Activity = c("Sleeping", "Eating", "TV", "Work", "Exercise"), Time = c(8, 2, 4, 8, 2) ) #> Warning: data_frame() is deprecated as of tibble 1.1.0. #> Please use tibble() instead. #> This warning is displayed once every 8 hours. #> Call lifecycle::last_warnings() to see where this warning was generated. chart <- alt$Chart(activities)$mark_bar()$
encode(
x = alt$X("PercentOfTotal:Q", axis = alt$Axis(format =".0%")),
y = "Activity:N"
)$transform_window( window = list( alt$WindowFieldDef(op = "sum", field = "Time", as = "TotalTime")
),
frame = list(NULL, NULL)
)\$transform_calculate(
PercentOfTotal = JS("datum.Time / datum.TotalTime")
)

chart`