R, Text Mining

Text Analytics: Mining Enron Emails

Mining Enron Emails You might have heard about the Enron scandal that came to light in 2001 which eventually led to bankruptcy of the Enron corporation. This is the largest corporate fraud that had happened so far. The Enron top-honchos used what is called Mark-to-market accounting to make up their financial statements. They used this accounting and financial shenanigan to…

Continue Reading

Basics, Unsupervised Learning

Implementing Principal Component Analysis using Python

This article is in continuation of my previous article on Mathematics of Principal Component Analysis (PCA). It is advised to go through that article before moving into this article. In this post, I will explain how to implement PCA using Python. I have taken the wholesale customer distribution dataset from UCI Machine Learning repository. This dataset refers to clients of…

Continue Reading

Basics, Fun with Statistics, Unsupervised Learning

Mathematics of Principal Component Analysis (PCA)

Understanding Principal Component Analysis In this part of the article, I will try to explain the mathematics and intuition behind Principal Component Analysis and in the next part, I will show how to implement Principal Component Analysis (PCA) using Python. PCA is an unsupervised machine learning technique which creates a low dimensional representation of a dataset. PCA is used to…

Continue Reading

Basics, Fun with Statistics, Python

Post-Hoc Analysis of ANOVA Test

Finding Relationship Between Variables – ANOVA (Part 2) Post-Hoc Analysis of ANOVA Test in Python This article is in continuation of my earlier post, Finding Relationship Between Variables – ANOVA Test. In that post, I have explained basics of ANOVA Test on when and how to implement it using Python programming language. We took the Auto MPG DataSet and tried to…

Continue Reading

Basics, Fun with Statistics, Python

Finding Relationship Between Variables – ANOVA Test

Finding Relationship Between Variables – ANOVA (Part 1) ANOVA Test in Python Finding the relationship between variables is a very important step in any statistical modeling. For example, you are working in a dataset which contains hundreds of variables but very few observations, you cannot simply include all those hundreds of variables in your modeling. Otherwise you will be violating…

Continue Reading

heat-maps
Basics, R, Visualization Tutorials

Data Visualisation in R (Part-3)

Data Visualisation in R (Part-3) Introduction In this report I will plot some more advanced charts using ggplot2 package. If you want to learn more about some basic plots you can refer to my earlier articles Data Visualization in R (Part 1) and Data Visualization in R (Part 2) library(Hmisc) library(dplyr) library(ggplot2) library(ggplot2movies) library(RColorBrewer) library(PerformanceAnalytics) library(GGally) Boxplots and Variable Transformation…

Continue Reading

ggplot7
Basics, R, Visualization Tutorials

Data Visualization in R (Part-2)

Introduction In this report, I will plot some more advanced charts using packageggplot2. If you want to learn more about some basic plots you can refer to my earlier article Data Visualization in R (Part 1).  Also, you can view other posts related to visualizations here. library(ggplot2) library(RColorBrewer) Data Smoothing in plots Smoothing means to use algorithms to remove noise…

Continue Reading

Data Envelopment Analysis, Optimization, R

Introduction to Data Envelopment Analysis in R

Introduction to Data Envelopment Analysis Data Envelopment Analysis is a Performance Measurement technique which is used for comparing the performances of similar units of an organization. The units for which we are doing the performance analysis are called Decision Making Units (DMU). For example, we can compare all the McDonald’s outlets operating in the Delhi NCR Region to find out…

Continue Reading