Basics, R, Visualization Tutorials

Data Visualization-R (Part-1)

ggplot8

Data Visualisation – R (Part-1)

Introduction

In this report, I will use different datasets to plot the data to gain some meaningful insights using ggplot2 package. There is one more post which explains how to visualize maps in R using ggmaps package, you can read more about it here. This post will cover basics of data visualisation-R.

Some basic plots

First load the mtcars dataset

data('mtcars')
ggplot(mtcars, aes(x=mpg, y=0)) + geom_jitter() + scale_y_continuous(limits = c(-2,2))
stripchart
stripchart

The above plot is known as stripchart which is a univariate plot

ggplot(mtcars, aes(x = cyl, y = mpg)) + geom_point()

stripchart1

if we observe the dataset mtcars we will get to know that the variable cyl is categorical in nature but it is classified as numeric in the dataset. So we will need to tell ggplot2 that cyl is a categorical variable.

ggplot(mtcars, aes(x = factor(cyl), y = mpg)) + geom_point()

stripchart2

Now we can see that ggplot2 treats cyl as a factor. This time the x-axis does not contain the variables like 5 or 7, it contains only the values that are present in the dataset

ggplot(mtcars, aes(x = wt, y = mpg, color = disp)) + geom_point()

ggplot1

The above plot shows relationship between mpg and wt of the car with varying displacement disp of the car engine shown in different colors.

ggplot(mtcars, aes(x = wt, y = mpg, size = disp)) + geom_point()

ggplot2

This plot also same as above, but this time dispalcement of the car engine is shown with varying sizes

ggplot(mtcars, aes(x = wt, y = mpg, col = factor(am), fill = factor(cyl))) + 
  geom_point(shape = 21, size = 4, alpha = 0.6)

ggplot3

The above plot is used whenever we need to distinguish the data points based on two categorical variables

ggplot(mtcars, aes(x = wt, y = mpg, col = factor(cyl))) + 
  geom_point() + 
  geom_smooth(method = "lm", se = FALSE) +
  geom_smooth(aes(group = 1), method = "lm", se = FALSE, linetype = 2)

ggplot

The above plot shows linear models of different subgroups cyl variable

val = c("#E41A1C", "#377EB8")
lab = c("Manual", "Automatic")
ggplot(mtcars, aes(x = factor(cyl), fill = factor(am))) +
  geom_bar(position = "dodge") +
  scale_x_discrete("Cylinders") + 
  scale_y_continuous("Number") +
  scale_fill_manual("Transmission", 
                    values = val,
                    labels = lab) 

ggplot5

Plotting several distributions in the same panel

ggplot(mtcars, aes(x=mpg, col=factor(cyl))) + geom_histogram(binwidth = 1, position = "identity") + geom_freqpoly(binwidth = 1)

ggplot6

In the above plot we can see the three different distributions of cyl variable displayed on the same panel. This plot is known as Frequency Polygon plot

Daimond Dataset

Reducing the overplotting problem

ggplot(diamonds, aes(x=clarity, y=carat, color=price)) + geom_point()
ggplot(diamonds, aes(x=clarity, y=carat, color=price)) + geom_point(alpha=0.5, position = "jitter")

ggplot7

Adding a smoothing line

ggplot(diamonds, aes(x = carat, y = price)) + geom_point() + geom_smooth()

ggplot8

ggplot(diamonds, aes(x = carat, y = price, color = clarity)) + geom_point(alpha = 0.2)

ggplot9

The alpha argument inside the geom_point() function makes the data points transparent

ggplot(diamonds, aes(x = carat, y = price)) + geom_point(alpha = 0.2) + geom_smooth(aes(col = clarity), se = FALSE)

ggplot10

Tidying the data to make a Plot

In this problem we will use the iris dataset and we will rearrange the dataset to make simple but meaningful plots

library(tidyr)
data("iris")

Now we will tidy our iris dataset so that it is ready for plotting

iris_tidy <- iris %>% gather(Key, Value, -Species) %>% separate(Key, c("Part", "Measure"), "\\.")  

Now our dataset is ready for plotting

ggplot(iris_tidy, aes(x = Species, y = Value, col = Part)) + geom_jitter() + facet_grid(. ~ Measure)

ggplot11

Bar Plots with Color Ramp

we will use Vocab dataset from the car package

library(car)
## Warning: package 'car' was built under R version 3.3.3
library(RColorBrewer)
data("Vocab")
blues <- brewer.pal(9, "Blues") # from the RColorBrewer package
blue_range <- colorRampPalette(blues)
ggplot(Vocab, aes(x = factor(education), fill = factor(vocabulary))) +
  geom_bar(position = "fill") +
  scale_fill_manual(values = blue_range(11))

ggplot12

Tagged , ,