
We will cover how to “(c) manually set some scales” for a plot, focusing on
The contents for (c) are based on Chapters 5 and 6 of the book “ggplot2: elegant graphs for data analysis” by Hadley Wickham.
Scales
guides (e.g., axes and legends)A scale is needed for each plot, and ggplot2 will add a default scale when none is specified by a user.
# set ranges for both x-axis and y-axis
lims(...)
# set range for x-axis
xlim(...)
# set range for y-axis
ylim(...)
By default, the limits of position scales extend (or expand) a little past the range of data. This ensures that data do not overlap axes.
One can control the amount of expansion with the expand argument. This parameter should be a numeric vector of length two. The first element gives the multiplicative expansion, and the second the additive expansion.
If no expansion is needed, use
scale_x_continuous(expand=c(0,0))
Plot cty (mpg in city) vs hwy (mpg on highway):
> p =ggplot(mpg)+geom_point(aes(x=cty,y=hwy,color=class)); p

> p + xlim(c(4,20))

> # use xlim(c(NA,20)) to set an automatic lower limit
> p + lims(x = c(10, 20), y = c(3, 5))

scale_*_continuousscale_x_continuous (or scale_y_continuous) controls the x (or y) axis for continuous variables, and often sets breaks, labels, na.value, and/or trans:
breaks: a numeric vector of tick positionslabels: a character vector giving labels (must be same length as breaks)na.value=value: missing values are set as valuetrans: transformations such as scale_*_log10(), scale_*_sqrt() and scale_*_reverse()> # Plot `displ` vs `hwy`:
> p1 = ggplot(mpg, aes(displ,hwy)) + geom_point(); p1

breaks> # choose where the x-axis ticks appear
> p1 + scale_x_continuous(breaks = c(2, 4, 6))

label> # personalized labels for ticks at specified positions
> p1 + scale_x_continuous(breaks = c(2, 4, 6),
+ label = c("two", "four", "six"))

trans> # y-axis on natural logarithmic scale via `trans=log`
> p1 + scale_y_continuous(trans = "log")

trans: optionsTable 6.2 from book “ggplot2”

scale_*_discretescale_x_discrete (or scale_y_discrete):
x (or y) axis for discrete variablesbreaks, labels, na.value, and/or trans.scale_x_continuous (and scale_y_continuous)Base layer: bar plot for drv:
> p = ggplot(mpg, aes(x = drv)) + geom_bar(); p

> # re-label x-axis ticks
> p + scale_x_discrete(labels =
+ c("4 wheel drive", "front drive", "rear drive"))

After position, probably the most commonly used aesthetic is colour. For this aesthetic and continuous variables, there are three methods, based on their gradient schemes:
scale_*_gradient()scale_*_gradient2()scale_*_gradientn())Note: colour is exchangeable with color
scale_colour_gradient() and scale_fill_gradient():
low (for “low end” ) and high (for “high end”) control the colours at the low end and high end of the gradient, respectivelyscale_colour_gradient2() and scale_fill_gradient2():
mid colour for the colour of midpointmidpoint defaults to \(0\) but can be set to any valueThese two functions are particularly useful for creating diverging colour schemes
n-colour gradientscale_colour_gradientn() and scale_fill_gradientn():
n-colour gradientcolours argument; by default, these colours will be evenly spaced along the range of the data> # Plot `cty` vs `hwy` with default scheme `color=displ`
> p = ggplot(mpg)+geom_point(aes(cty,hwy,color=displ))
> p # note legend title "displ""

> p2a1=p + scale_colour_gradient2("Displacement",low="gray",
+ mid="blue",high="red",midpoint=mean(mpg$displ))
> p2a1 # note legend title "Displacement" and `midpoint`

midpoint> p2a2 = p+scale_colour_gradient2("Displacement",low="gray",
+ mid="blue",high="red",midpoint=2*mean(mpg$displ))
> library(gridExtra); grid.arrange(p2a1,p2a2,nrow=2)

The faithful dataset (in library MASS) records waiting times (waiting) between eruptions and eruption times in minutes eruption for the Old Faithful geyser in Yellowstone Park:
> library(MASS); head(faithful)
eruptions waiting
1 3.600 79
2 1.800 54
3 3.333 74
4 2.283 62
5 4.533 85
6 2.883 55
Density function \(z=f(x,y)\) for \((x,y)\)=(eruptions,waiting) can be visualized via 2D contours
Obtain density estimate for (eruptions,waiting):
MASS; apply kde2d, i.e., 2D kernel density estimation, to faithful data set> f2d <- with(faithful, MASS::kde2d(eruptions, waiting,
+ h = c(1, 10), n = 50))
> df <- with(f2d, cbind(expand.grid(x, y), as.vector(z)))
> names(df) <- c("eruptions", "waiting", "density")
> head(df)
eruptions waiting density
1 1.600000 43 0.003216159
2 1.671429 43 0.004146406
3 1.742857 43 0.004987802
4 1.814286 43 0.005611508
5 1.885714 43 0.005921813
6 1.957143 43 0.005882327
> erupt <- ggplot(df,aes(waiting,eruptions,fill = density))+
+ geom_tile()+
+ scale_x_continuous(expand = c(0, 0))+
+ scale_y_continuous(expand = c(0, 0))
Note the use of
geom_tile() and fill = densityscale_*_continuous(expand = c(0, 0))> erupt #max(df$density)=0.037, min(df$density)=10^(-24)

> # `limits = c(0, 0.04)` sets range for values in legend
> erupt + scale_fill_gradient(limits = c(0,0.04),
+ low = "white", high = "black")

> erupt + scale_fill_gradient2(limits = c(0, 0.04),
+ midpoint = mean(df$density))

Pay attention to how the range for density in the legend is controlled by limits in scale_fill_gradient or scale_fill_gradient2
There are palettes available for color scales
Two methods for colour scales for discrete data:
scale_colour_hue()); scale_colour_hue() works well for up to about eight coloursRColorBrewer)Popular palettes of RColorBrewer are “Set1” and “Dark2” for points and “Set2”, “Pastel1”, “Pastel2” and “Accent” for areas. RColorBrewer::display.brewer.all() lists all palettes.
Part of msleep data set (from library ggplot2):
# A tibble: 6 x 3
brainwt bodywt vore
<dbl> <dbl> <chr>
1 NA 50 carni
2 0.0155 0.48 omni
3 NA 1.35 herbi
4 0.00029 0.019 omni
5 0.423 600 herbi
6 NA 3.85 herbi
brainwt (brain weight in kilograms); bodywt (body weight in kilograms); vore (carnivore, omnivore or herbivore)Plot brainwt vs bodywt and color “point” by vore:
> p4 = ggplot(msleep)+
+ geom_point(aes(brainwt,bodywt,colour = vore))+
+ scale_x_continuous(trans="log")+
+ scale_y_continuous(trans="log")
Note: both axes on nautral logarithmic scale
> p4

brewer> p4 + scale_colour_brewer(palette = "Set1")

> sessionInfo()
R version 3.5.0 (2018-04-23)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods
[7] base
other attached packages:
[1] MASS_7.3-49 knitr_1.21
loaded via a namespace (and not attached):
[1] compiler_3.5.0 magrittr_1.5 tools_3.5.0
[4] htmltools_0.3.6 revealjs_0.9 yaml_2.2.0
[7] Rcpp_1.0.0 stringi_1.2.4 rmarkdown_1.11
[10] stringr_1.3.1 xfun_0.4 digest_0.6.18
[13] evaluate_0.12