Field Guide to Python Issues

There are four foundations upon which this package rests:

the Altair Python package, to build chart specifications
the reticulate R package, to provide inter-operability with Python
the Vega-Lite JavaScript framework, to render chart specifications in an HTML file
the htmlwidgets R package, to provide inter-operability with HTML and JavaScript

This article deals with the first two items; the Field Guide to Rendering deals with the other two.

The purpose of this document is to try to collect in one place, in a semi-organized fashion, all the fiddly-bits we have found dealing with Python stuff. If you get a cryptic Python error, check here. If you find a workaround for something that isn’t here, please let us know!

Overview

The Altair documentation is the best resource for learning how to create charts. In the course of building and documenting this package, we have noted a few “gotchas” and their workarounds. If you find another, please let us know!

Here’s the short version:

Where you see a ., use a $ instead.
Altair methods return a copy of the object. Assignment of a Python object returns a reference, not a copy.
To get a copy a of “bare” Altair object, use a $copy() method.
If you have a dataset that has variables with dots in their names, e.g. Sepal.Width, you have to make some accommodation when referring to such names in Altiar. As a workaround, you can use square-brackets to refer to “[Sepal.Width]”.
There is an Altair Chart method called repeat(), which in R is a reserved word, so it needs to be enclosed in backticks: $`repeat`().
Where you see an inversion operator, ~, like ~highlight, in Altair examples, call the method explicitly from R: hightlight$`__invert__`(). Alternatively, you may be able to rearrange the code so as to avoid using the inversion.
Where you see a hyphen in the name of a Python object, use an underscore in R: vega_data$sf_temps()
Where you see a Python list, ["foo", "bar"], in Altair examples, use an unnamed list in R: list("foo", "bar").
Where you see a Python dictionary, {'a' = "foo", 'b' = "bar"}, in Altair examples, use a named list in R: list(a = "foo", b = "bar")
Where you see a None in Altair examples, use a NULL in R.
You may see a function call with **, baz(a = 1, **{'foo': 'bar'}), in an Altair example. In R, interpolate the dictionary into the rest of the arguments, baz(a = 1, foo = "bar").

Data frames

Consider this Python example:

# Python

from vega_datasets import data
cars = data.cars()

alt.Chart(cars).mark_point().encode(
    x='Horsepower',
    y='Miles_per_Gallon',
    color='Origin',
).interactive()

In this case, we are supplying a Data Frame to a Chart() method.

library("altair")

vega_data <- import_vega_data()
cars <- vega_data$cars()

chart <- 
  alt$Chart(cars)$
  mark_point()$
  encode(
    x = "Horsepower:Q",
    y = "Miles_per_Gallon:Q",
    color = "Origin:N"
  )

Method chaining

When reticulate returns a Python object with a custom class, it appears in R as an S3 object that behaves like a reference class. This means that if you see this sort of notation in Python:

# Python
foo.bar()

You would use this notation in R:

foo$bar()

In essence, you wherever you see a . in Python, use a $ in R.

vega_data <- import_vega_data()
cars <- vega_data$cars()

chart <- 
  alt$Chart(cars)$
  mark_point()$
  encode(
    x = "Horsepower:Q",
    y = "Miles_per_Gallon:Q",
    color = "Origin:N"
  )

Altair method returns copy

In Python, Altair methods return a copy of the object. To verify this, let’s use `pryr::

library("pryr")

chart_new <- chart$mark_point()

address(chart_new) == address(chart)
#> [1] FALSE

Although this looks like a reference-class method, the Altair method acts like an S3 method.

Python assignment returns reference

The object returned by an Altair method is a modified copy of the calling-object, much as we are accustomed-to in R. However, it is important to note that using the R assignment operator (<-, =, ->) on a Python object returns a reference to the object rather than a copy.

This becomes apparent when assigning a “bare” object:

chart_new <- chart

address(chart_new) == address(chart)
#> [1] TRUE

To return a copy of the object, use a copy method.

chart_new <- chart$copy()

address(chart_new) == address(chart)
#> [1] FALSE

Dots in variable names

In Python, dots can refer to a nested structure within a Data Frame variable. Vega-Lite supports such nesting, so it assumes that a dot in a variable-name will refer to a nested variable.

This means that we can run into trouble using R’s freeny dataset:

# does not render properly

chart_freeny_r <-
  alt$Chart(freeny)$
  encode(
    x = alt$X("income.level:Q", zero = FALSE),
    y = alt$Y("market.potential:Q", zero = FALSE)
  )$
  mark_point()

chart_freeny_r

The problem here is that there are variables whose names have dots in them, e.g. income.level. One workaround is to use square brackets when referring to such variable names; another is to use backslashes, \\:

chart_freeny_r <-
  alt$Chart(freeny)$
  encode(
    x = alt$X("[income.level]:Q", scale = alt$Scale(zero = FALSE)),
    y = alt$Y("market\\.potential:Q", scale = alt$Scale(zero = FALSE))
  )$
  mark_point()

chart_freeny_r

As you can see, this has the side-effect of showing the brackets and slashes in the scale labels.

To fix the fix, you can set the title for each axis:

chart_freeny_r <-
  alt$Chart(freeny)$
  encode(
    x = alt$X(
      "[income.level]:Q", 
      scale = alt$Scale(zero = FALSE),
      axis = alt$Axis(title = "income.level")
    ),
    y = alt$Y(
      "market\\.potential:Q", 
      scale = alt$Scale(zero = FALSE),
      axis = alt$Axis(title = "market.potential")
    )
  )$
  mark_point()

chart_freeny_r

Repeat

As shown in the View Composistion article, you can use the repeat() method to compose one-or-more charts such that the only thing different among them is an encoding.

However, the article notes, there is a catch: repeat is a reserved word in R, so we have to enclose it in backticks, e.g. $`repeat`().

chart_repeat <- 
  alt$Chart(freeny)$
  encode(
    x = alt$X(
      "[income.level]:Q",
      scale = alt$Scale(zero = FALSE),
      axis = alt$Axis(title = "income.level")      
    ),
    y = alt$Y(
      alt$`repeat`("column"), 
      type = "quantitative",
      scale = alt$Scale(zero = FALSE)
    )
  )$
  mark_point()$
  properties(
    width = 200,
    height = 200
  )$
  `repeat`(
    column = list("[market.potential]", "[price.index]")
  )

chart_repeat

As you can see, the repeat operator does not give us a way to customize the axis titles.

Inversion: `~`

This is another case where an operator has a completely different meaning in Python than it has in R. As you know, the ~ operator is used to construct a formula. In Python, it is the bitwise inversion operator.

You might come across this in an Altair example where the operator is used to invert a selection.

# Python
highlight = alt.selection(type='single', on='mouseover',
                          fields=['symbol'], nearest=True)

alt.condition(~highlight, alt.value(1), alt.value(3))

There are a couple of alternatives available here, the first is to invoke the $__invert__() operator explicitly.

# R
highlight <-
  alt$selection(
    type = "single", 
    on = "mouseover",
    fields = list("symbol"), 
    nearest = TRUE
  )

alt$condition(highlight$`__invert__`(), alt$value(1), alt$value(3))

The second alternative is to swap the order of the if_true and if_false arguments in alt$condition().

# R
highlight <-
  alt$selection(
    type = "single", 
    on = "mouseover",
    fields = list("symbol"), 
    nearest = TRUE
  )

alt$condition(highlight, alt$value(3), alt$value(1))

Hyphens in Python Names

This comes up in Vega datasets. Let’s use the $list_datasets() method to get the names of the datasets that contain a hyphen.

vega_data$list_datasets() %>% stringr::str_subset("-")
#>  [1] "annual-precip"                  "co2-concentration"             
#>  [3] "flare-dependencies"             "flights-10k"                   
#>  [5] "flights-200k"                   "flights-20k"                   
#>  [7] "flights-2k"                     "flights-3m"                    
#>  [9] "flights-5k"                     "flights-airport"               
#> [11] "gapminder-health-income"        "iowa-electricity"              
#> [13] "la-riots"                       "normal-2d"                     
#> [15] "seattle-temps"                  "seattle-weather"               
#> [17] "sf-temps"                       "unemployment-across-industries"
#> [19] "uniform-2d"                     "us-10m"                        
#> [21] "us-employment"                  "us-state-capitals"             
#> [23] "world-110m"

To refer to one of these datasets in R, substitute the hyphen with an underscore:

vega_data$sf_temps() %>% head()
#>   temp                date
#> 1 47.8 2010-01-01 00:00:00
#> 2 47.4 2010-01-01 01:00:00
#> 3 46.9 2010-01-01 02:00:00
#> 4 46.5 2010-01-01 03:00:00
#> 5 46.0 2010-01-01 04:00:00
#> 6 45.8 2010-01-01 05:00:00

Lists: `[]` and Dictionaries: `{}`

A Python list corresponds to an atomic vector in R; a Python dictionary corresponds to a named list in R.

# Python

example_list = [1, 2, 3]
example_dictionary = {'a': 1, 'b': 2, 'c': 3}

In practice, we find that reticulate does the right thing if we provide an R unnamed list where Altair expects a list, and an R named list where Altair expects a dictionary.

example_list <- list(1, 2, 3)
example_dictionary <- list(a = 1, b = 2, c = 3)

Consider this Altair example that uses lists and dictionaries. This is some of the Python bits:

import altair as alt
from vega_datasets import data

flights = alt.UrlData(data.flights_2k.url,
                      format={'parse': {'date': 'date'}})

brush = alt.selection(type='interval', encodings=['x'])

Here’s an R translation of the complete example, which demonstrates interactive cross-filtering.

flights <-
  alt$UrlData(
    vega_data$flights_2k$url,
    format = list(parse = list(date = "date"))
  )

brush <- alt$selection(type = "interval", encodings = list("x"))

# Define the base chart, with the common parts of the
# background and highlights
base <- 
  alt$Chart()$
  mark_bar()$
  encode(
    x = alt$X(
      alt$`repeat`("column"), 
      type = "quantitative", 
      bin = alt$Bin(maxbins = 20)
    ),
    y = "count()"
  )$
  properties(width = 180, height = 130)

# blue background with selection
background <- base$properties(selection = brush)

# yellow highlights on the transformed data
highlight <-
  base$
  encode(color=alt$value("goldenrod"))$
  transform_filter(brush$ref())

# layer the two charts & repeat
alt$
  layer(background, highlight, data = flights)$
  transform_calculate("time", "hours(datum.date)")$
  `repeat`(column = list("distance", "delay", "time"))

None and `**{}`

These concepts are not related other that they are found in the same example:

import altair as alt
import pandas as pd

activities = pd.DataFrame({'Activity': ['Sleeping', 'Eating', 'TV', 'Work', 'Exercise'],
                           'Time': [8, 2, 4, 8, 2]})

alt.Chart(activities).mark_bar().encode(
    alt.X('PercentOfTotal:Q', axis=alt.Axis(format='.0%')),
    y='Activity:N'
).transform_window(
    window=[alt.WindowFieldDef(op='sum', field='Time', **{'as': 'TotalTime'})],
    frame=[None, None]
).transform_calculate(
    PercentOfTotal="datum.Time / datum.TotalTime"
)

In this example, we have a list containing None, which reticulate associates with R’s NULL.

We also have some syntax, **{'as': 'TotalTime'}. This is a mechanism to pass additional arguments to a Python function, perhaps similar to ... in R. It is passing a dictionary, so perhaps we can add the additional named argument in R:

library("tibble")

activities <- 
  tibble(
    Activity = c("Sleeping", "Eating", "TV", "Work", "Exercise"),
    Time = c(8, 2, 4, 8, 2)
  )

chart <- 
  alt$Chart(activities)$
  mark_bar()$
  encode(
    x = alt$X("PercentOfTotal:Q", axis = alt$Axis(format =".0%")),
    y = "Activity:N"
  )$
  transform_window(
    window = list(
      alt$WindowFieldDef(op = "sum", field = "Time", as = "TotalTime")
    ),
    frame = list(NULL, NULL)
  )$transform_calculate(
    PercentOfTotal = JS("datum.Time / datum.TotalTime")
  )

chart

2023-09-04