vignettes/field-guide-python.Rmd
field-guide-python.Rmd
There are four foundations upon which this package rests:
This article deals with the first two items; the Field Guide to Rendering deals with the other two.
The purpose of this document is to try to collect in one place, in a semi-organized fashion, all the fiddly-bits we have found dealing with Python stuff. If you get a cryptic Python error, check here. If you find a workaround for something that isn’t here, please let us know!
The Altair documentation is the best resource for learning how to create charts. In the course of building and documenting this package, we have noted a few “gotchas” and their workarounds. If you find another, please let us know!
Here’s the short version:
Where you see a .
, use a $
instead.
Altair methods return a copy of the object. Assignment of a Python object returns a reference, not a copy.
To get a copy a of “bare” Altair object, use a
$copy()
method.
If you have a dataset that has variables with dots in their
names, e.g. Sepal.Width
, you have to make some
accommodation when referring to such names in Altiar. As a workaround,
you can use square-brackets to refer to “[Sepal.Width]”.
There is an Altair Chart method called repeat()
,
which in R is a reserved word, so it needs to be enclosed in backticks:
$`repeat`()
.
Where you see an inversion operator, ~
, like
~highlight
, in Altair examples, call the method explicitly
from R: hightlight$`__invert__`()
. Alternatively, you may
be able to rearrange the code so as to avoid using the
inversion.
Where you see a hyphen in the name of a Python object, use an
underscore in R: vega_data$sf_temps()
Where you see a Python list, ["foo", "bar"]
, in
Altair examples, use an unnamed list in R:
list("foo", "bar")
.
Where you see a Python dictionary,
{'a' = "foo", 'b' = "bar"}
, in Altair examples, use a named
list in R: list(a = "foo", b = "bar")
Where you see a None
in Altair examples, use a
NULL
in R.
You may see a function call with **
,
baz(a = 1, **{'foo': 'bar'})
, in an Altair example. In R,
interpolate the dictionary into the rest of the arguments,
baz(a = 1, foo = "bar")
.
Consider this Python example:
# Python
from vega_datasets import data
= data.cars()
cars
alt.Chart(cars).mark_point().encode(='Horsepower',
x='Miles_per_Gallon',
y='Origin',
color ).interactive()
In this case, we are supplying a Data Frame to a Chart()
method.
library("altair")
vega_data <- import_vega_data()
cars <- vega_data$cars()
chart <-
alt$Chart(cars)$
mark_point()$
encode(
x = "Horsepower:Q",
y = "Miles_per_Gallon:Q",
color = "Origin:N"
)
When reticulate returns a Python object with a custom class, it appears in R as an S3 object that behaves like a reference class. This means that if you see this sort of notation in Python:
# Python
foo.bar()
You would use this notation in R:
foo$bar()
In essence, you wherever you see a .
in Python, use a
$
in R.
vega_data <- import_vega_data()
cars <- vega_data$cars()
chart <-
alt$Chart(cars)$
mark_point()$
encode(
x = "Horsepower:Q",
y = "Miles_per_Gallon:Q",
color = "Origin:N"
)
In Python, Altair methods return a copy of the object. To verify this, let’s use `pryr::
Although this looks like a reference-class method, the Altair method acts like an S3 method.
The object returned by an Altair method is a modified copy of the
calling-object, much as we are accustomed-to in R. However, it is
important to note that using the R assignment operator
(<-
, =
, ->
) on a Python
object returns a reference to the object rather than a
copy.
This becomes apparent when assigning a “bare” object:
To return a copy of the object, use a copy method.
In Python, dots can refer to a nested structure within a Data Frame variable. Vega-Lite supports such nesting, so it assumes that a dot in a variable-name will refer to a nested variable.
This means that we can run into trouble using R’s freeny
dataset:
# does not render properly
chart_freeny_r <-
alt$Chart(freeny)$
encode(
x = alt$X("income.level:Q", zero = FALSE),
y = alt$Y("market.potential:Q", zero = FALSE)
)$
mark_point()
chart_freeny_r
The problem here is that there are variables whose names have dots in
them, e.g. income.level
. One workaround is to use square
brackets when referring to such variable names; another is to use
backslashes, \\
:
chart_freeny_r <-
alt$Chart(freeny)$
encode(
x = alt$X("[income.level]:Q", scale = alt$Scale(zero = FALSE)),
y = alt$Y("market\\.potential:Q", scale = alt$Scale(zero = FALSE))
)$
mark_point()
chart_freeny_r
As you can see, this has the side-effect of showing the brackets and slashes in the scale labels.
To fix the fix, you can set the title for each axis:
chart_freeny_r <-
alt$Chart(freeny)$
encode(
x = alt$X(
"[income.level]:Q",
scale = alt$Scale(zero = FALSE),
axis = alt$Axis(title = "income.level")
),
y = alt$Y(
"market\\.potential:Q",
scale = alt$Scale(zero = FALSE),
axis = alt$Axis(title = "market.potential")
)
)$
mark_point()
chart_freeny_r
As shown in the View
Composistion article, you can use the repeat()
method
to compose one-or-more charts such that the only thing different among
them is an encoding.
However, the article notes, there is a catch: repeat
is
a reserved word in R, so we have to enclose it in backticks,
e.g. $`repeat`()
.
chart_repeat <-
alt$Chart(freeny)$
encode(
x = alt$X(
"[income.level]:Q",
scale = alt$Scale(zero = FALSE),
axis = alt$Axis(title = "income.level")
),
y = alt$Y(
alt$`repeat`("column"),
type = "quantitative",
scale = alt$Scale(zero = FALSE)
)
)$
mark_point()$
properties(
width = 200,
height = 200
)$
`repeat`(
column = list("[market.potential]", "[price.index]")
)
chart_repeat
As you can see, the repeat
operator does not give us a
way to customize the axis titles.
~
This is another case where an operator has a completely different
meaning in Python than it has in R. As you know, the ~
operator is used to construct a formula. In Python, it is the bitwise
inversion operator.
You might come across this in an Altair example where the operator is used to invert a selection.
# Python
= alt.selection(type='single', on='mouseover',
highlight =['symbol'], nearest=True)
fields
~highlight, alt.value(1), alt.value(3)) alt.condition(
There are a couple of alternatives available here, the first is to
invoke the $__invert__()
operator explicitly.
# R
highlight <-
alt$selection(
type = "single",
on = "mouseover",
fields = list("symbol"),
nearest = TRUE
)
alt$condition(highlight$`__invert__`(), alt$value(1), alt$value(3))
The second alternative is to swap the order of the
if_true
and if_false
arguments in
alt$condition()
.
# R
highlight <-
alt$selection(
type = "single",
on = "mouseover",
fields = list("symbol"),
nearest = TRUE
)
alt$condition(highlight, alt$value(3), alt$value(1))
This comes up in Vega datasets. Let’s use the
$list_datasets()
method to get the names of the datasets
that contain a hyphen.
vega_data$list_datasets() %>% stringr::str_subset("-")
#> [1] "annual-precip" "co2-concentration"
#> [3] "flare-dependencies" "flights-10k"
#> [5] "flights-200k" "flights-20k"
#> [7] "flights-2k" "flights-3m"
#> [9] "flights-5k" "flights-airport"
#> [11] "gapminder-health-income" "iowa-electricity"
#> [13] "la-riots" "normal-2d"
#> [15] "seattle-temps" "seattle-weather"
#> [17] "sf-temps" "unemployment-across-industries"
#> [19] "uniform-2d" "us-10m"
#> [21] "us-employment" "us-state-capitals"
#> [23] "world-110m"
To refer to one of these datasets in R, substitute the hyphen with an underscore:
[]
and Dictionaries: {}
A Python list corresponds to an atomic vector in R; a Python dictionary corresponds to a named list in R.
# Python
= [1, 2, 3]
example_list = {'a': 1, 'b': 2, 'c': 3} example_dictionary
In practice, we find that reticulate does the right thing if we provide an R unnamed list where Altair expects a list, and an R named list where Altair expects a dictionary.
Consider this Altair example that uses lists and dictionaries. This is some of the Python bits:
import altair as alt
from vega_datasets import data
= alt.UrlData(data.flights_2k.url,
flights format={'parse': {'date': 'date'}})
= alt.selection(type='interval', encodings=['x']) brush
Here’s an R translation of the complete example, which demonstrates interactive cross-filtering.
flights <-
alt$UrlData(
vega_data$flights_2k$url,
format = list(parse = list(date = "date"))
)
brush <- alt$selection(type = "interval", encodings = list("x"))
# Define the base chart, with the common parts of the
# background and highlights
base <-
alt$Chart()$
mark_bar()$
encode(
x = alt$X(
alt$`repeat`("column"),
type = "quantitative",
bin = alt$Bin(maxbins = 20)
),
y = "count()"
)$
properties(width = 180, height = 130)
# blue background with selection
background <- base$properties(selection = brush)
# yellow highlights on the transformed data
highlight <-
base$
encode(color=alt$value("goldenrod"))$
transform_filter(brush$ref())
# layer the two charts & repeat
alt$
layer(background, highlight, data = flights)$
transform_calculate("time", "hours(datum.date)")$
`repeat`(column = list("distance", "delay", "time"))
**{}
These concepts are not related other that they are found in the same example:
import altair as alt
import pandas as pd
= pd.DataFrame({'Activity': ['Sleeping', 'Eating', 'TV', 'Work', 'Exercise'],
activities 'Time': [8, 2, 4, 8, 2]})
alt.Chart(activities).mark_bar().encode('PercentOfTotal:Q', axis=alt.Axis(format='.0%')),
alt.X(='Activity:N'
y
).transform_window(=[alt.WindowFieldDef(op='sum', field='Time', **{'as': 'TotalTime'})],
window=[None, None]
frame
).transform_calculate(="datum.Time / datum.TotalTime"
PercentOfTotal )
In this example, we have a list containing None
, which
reticulate associates with R’s NULL
.
We also have some syntax, **{'as': 'TotalTime'}
. This is
a mechanism to pass
additional arguments to a Python function, perhaps similar to
...
in R. It is passing a dictionary, so perhaps we can add
the additional named argument in R:
library("tibble")
activities <-
tibble(
Activity = c("Sleeping", "Eating", "TV", "Work", "Exercise"),
Time = c(8, 2, 4, 8, 2)
)
chart <-
alt$Chart(activities)$
mark_bar()$
encode(
x = alt$X("PercentOfTotal:Q", axis = alt$Axis(format =".0%")),
y = "Activity:N"
)$
transform_window(
window = list(
alt$WindowFieldDef(op = "sum", field = "Time", as = "TotalTime")
),
frame = list(NULL, NULL)
)$transform_calculate(
PercentOfTotal = JS("datum.Time / datum.TotalTime")
)
chart