## Data Visualisation – R (Part-1)

## Introduction

In this report, I will use different datasets to plot the data to gain some meaningful insights using `ggplot2`

package. There is one more post which explains how to visualize maps in R using `ggmaps`

package, you can read more about it here. This post will cover basics of data visualisation-R.

## Some basic plots

First load the `mtcars`

dataset

```
data('mtcars')
ggplot(mtcars, aes(x=mpg, y=0)) + geom_jitter() + scale_y_continuous(limits = c(-2,2))
```

The above plot is known as `stripchart`

which is a univariate plot

`ggplot(mtcars, aes(x = cyl, y = mpg)) + geom_point()`

if we observe the dataset `mtcars`

we will get to know that the variable `cyl`

is categorical in nature but it is classified as numeric in the dataset. So we will need to tell `ggplot2`

that `cyl`

is a categorical variable.

`ggplot(mtcars, aes(x = factor(cyl), y = mpg)) + geom_point()`

Now we can see that `ggplot2`

treats `cyl`

as a factor. This time the x-axis does not contain the variables like 5 or 7, it contains only the values that are present in the dataset

`ggplot(mtcars, aes(x = wt, y = mpg, color = disp)) + geom_point()`

The above plot shows relationship between `mpg`

and `wt`

of the car with varying displacement `disp`

of the car engine shown in different colors.

`ggplot(mtcars, aes(x = wt, y = mpg, size = disp)) + geom_point()`

This plot also same as above, but this time dispalcement of the car engine is shown with varying sizes

```
ggplot(mtcars, aes(x = wt, y = mpg, col = factor(am), fill = factor(cyl))) +
geom_point(shape = 21, size = 4, alpha = 0.6)
```

The above plot is used whenever we need to distinguish the data points based on two categorical variables

```
ggplot(mtcars, aes(x = wt, y = mpg, col = factor(cyl))) +
geom_point() +
geom_smooth(method = "lm", se = FALSE) +
geom_smooth(aes(group = 1), method = "lm", se = FALSE, linetype = 2)
```

The above plot shows linear models of different subgroups `cyl`

variable

```
val = c("#E41A1C", "#377EB8")
lab = c("Manual", "Automatic")
ggplot(mtcars, aes(x = factor(cyl), fill = factor(am))) +
geom_bar(position = "dodge") +
scale_x_discrete("Cylinders") +
scale_y_continuous("Number") +
scale_fill_manual("Transmission",
values = val,
labels = lab)
```

Plotting several distributions in the same panel

`ggplot(mtcars, aes(x=mpg, col=factor(cyl))) + geom_histogram(binwidth = 1, position = "identity") + geom_freqpoly(binwidth = 1)`

In the above plot we can see the three different distributions of `cyl`

variable displayed on the same panel. This plot is known as `Frequency Polygon`

plot

## Daimond Dataset

Reducing the `overplotting`

problem

```
ggplot(diamonds, aes(x=clarity, y=carat, color=price)) + geom_point()
ggplot(diamonds, aes(x=clarity, y=carat, color=price)) + geom_point(alpha=0.5, position = "jitter")
```

Adding a smoothing line

`ggplot(diamonds, aes(x = carat, y = price)) + geom_point() + geom_smooth()`

`ggplot(diamonds, aes(x = carat, y = price, color = clarity)) + geom_point(alpha = 0.2)`

The `alpha`

argument inside the geom_point() function makes the data points transparent

`ggplot(diamonds, aes(x = carat, y = price)) + geom_point(alpha = 0.2) + geom_smooth(aes(col = clarity), se = FALSE)`

## Tidying the data to make a Plot

In this problem we will use the `iris`

dataset and we will rearrange the dataset to make simple but meaningful plots

```
library(tidyr)
data("iris")
```

Now we will tidy our `iris`

dataset so that it is ready for plotting

`iris_tidy <- iris %>% gather(Key, Value, -Species) %>% separate(Key, c("Part", "Measure"), "\\.") `

Now our dataset is ready for plotting

`ggplot(iris_tidy, aes(x = Species, y = Value, col = Part)) + geom_jitter() + facet_grid(. ~ Measure)`

## Bar Plots with Color Ramp

we will use `Vocab`

dataset from the `car`

package

`library(car)`

`## Warning: package 'car' was built under R version 3.3.3`

```
library(RColorBrewer)
data("Vocab")
blues <- brewer.pal(9, "Blues") # from the RColorBrewer package
blue_range <- colorRampPalette(blues)
ggplot(Vocab, aes(x = factor(education), fill = factor(vocabulary))) +
geom_bar(position = "fill") +
scale_fill_manual(values = blue_range(11))
```