# Data Visualization in R (Part-2) ## Introduction

In this report, I will plot some more advanced charts using package`ggplot2`. If you want to learn more about some basic plots you can refer to my earlier article Data Visualization in R (Part 1) Also, you can view other posts related to visualizations here.

``````library(ggplot2)
library(RColorBrewer)``````

## Data Smoothing in plots

Smoothing means to use algorithms to remove noise from a data set, allowing some important patterns to stand out. To add smoothing lines we would the geom `geom_smooth()` by default it uses `LOESS` smoothing which stands for `Locally Weighted Scatterplot Smoothing`

``data("mtcars")``
``````ggplot(mtcars, aes(x = wt, y = mpg)) +
geom_point() + geom_smooth()`````` If we want to change the previous plot to use ordinary linear model smoothing we can use the argument`method = "lm"`.

``````ggplot(mtcars, aes(x = wt, y = mpg)) +
geom_point() + geom_smooth(method = "lm")`````` The shaded portion in the above plots shows the 95% Confidence Intervals which also known as the `standard error`, we can remove this shaded portion using the argument `se = FALSE`

``````ggplot(mtcars, aes(x = wt, y = mpg)) +
geom_point() + geom_smooth(method = "lm", se = FALSE)`````` ## Grouping variables in plots

Sometimes in our data, we might like to see patterns in the data based on some subgroups or categorical variables which can be shown using the aesthetic `col` as follows

``````ggplot(mtcars, aes(x = wt, y = mpg, col = factor(cyl))) +
geom_point() +
stat_smooth(method = "lm", se = FALSE)`````` In the above ggplot command our smooth is calculated for each subgroup because there is an invisible aesthetic `group` which inherits from `col`. The following plot also add a smoothing line for the complete data along with the other subgroups

``````ggplot(mtcars, aes(x = wt, y = mpg, col = factor(cyl))) +
geom_point() +
stat_smooth(method = "lm", se = F) +
stat_smooth(method = "lm", se = F, aes(group = 1), linetype = 5)`````` ## Mapping different models in plots

In the below plot we will add two different models `lm` and `loess` in the same plot. Where `lm` stands for linear model which is also known as `Ordinary Least Squares (OLS)` method and `loess` smoothing is a non-parametric form of regression that uses a weighted, sliding-window, average to calculate a line of best fit. We can control the size of this window with the `span` argument.

``````myColors <- c(brewer.pal(3, "Dark2"), "black")
ggplot(mtcars, aes(x = wt, y = mpg, col = factor(cyl))) +
geom_point() +
stat_smooth(method = "lm", se = FALSE, span = 0.75) +
stat_smooth(method = "loess",
aes(group = 1, col="All"),
se = F, span = 0.7) +
scale_color_manual("Cylinders", values = myColors)`````` Sometimes the `ColorBrewer` shows error when our factor or categorical variables have more than 9 subgroups as by default we `ColorBrewer` palette has only 9 colors. We can solve this problem using the function `scale_color_gradientn`

``````library(car)
data(Vocab)``````
``````ggplot(Vocab, aes(x = education, y = vocabulary, col = year, group = factor(year))) +
geom_jitter(alpha = 0.6) +
stat_smooth(method = "lm", se = F, alpha = 0.2, size = 2) + We can write explicit functions to calculate the statistics and then we can use those statistics in our plots. For example, below is a function save range for use in plots

``````plot_range <- function(x) {
data.frame(ymin = min(x),
ymax = max(x))
}``````

and the below function calculates median, 1st quartile, and 3rd quartile

``````plot_IQR <- function(x) {
data.frame(y = median(x),
ymin = quantile(x),
ymax = quantile(x))
}``````
``data("mtcars")``
``````posn.d <- position_dodge(width = 0.1)

base_plot <- ggplot(mtcars, aes(x = factor(cyl),y = wt, col = factor(am), fill = factor(am), group = factor(am)))

base_plot +
stat_summary(geom = "linerange", fun.data = plot_IQR,
position = posn.d, size = 3) +
stat_summary(geom = "linerange", fun.data = plot_range,
position = posn.d, size = 3,
alpha = 0.4) +
stat_summary(geom = "point", fun.y = median,
position = posn.d, size = 3,
col = "black", shape = "X")`````` ## Creating pie charts

Pie charts can be thought of modification of stacked bar charts. Lets imagine a stacked bar plot and we just take the y-axis and bend it until it loops back on itself and will create a pie chart. So in the below code we first created a bar plot and then we converted the bar chart to a pie chart using the polar coordinates.

``````ggplot(mtcars, aes(x = 1, fill = factor(cyl))) +
geom_bar() +
coord_polar(theta = "y")`````` ## Use of facetting in plots

Facets are a way of presenting categorical variables in the plots. It can also be used to include more number of variables in a plot. The following plot shows a total of 7 variables which are represented in the chart. In the following plot we have used a trick to map two variables onto two scalar scales- hue and lightness and we have combined `cyl` and `am` into a single variable `cyl_am`. And to accommodate this we also make a new color palette with alternating red and blue of increasing darkness.

``````mtcars\$cyl_am <- paste(mtcars\$cyl, mtcars\$am, sep = "_")
myCol <- rbind(brewer.pal(9, "Blues")[c(3,6,8)],
brewer.pal(9, "Reds")[c(3,6,8)])

ggplot(mtcars, aes(x = wt, y = mpg, col = cyl_am, size = disp)) +
geom_point() +
scale_color_manual(values = myCol) +
facet_grid(vs ~ gear)``````  