R, Text Mining

Mining the Enron Emails

You might have heard about the Enron scandal that came to light in 2001 which eventually led to bankruptcy of the Enron corporation. This is the largest corporate fraud that had happened so far. The Enron top-honchos used what is called Mark-to-market accounting to make up their financial statements. They used this accounting and financial shenanigan to trick investors and…

Continue Reading

Neural Networks

Activation Functions in ANNs (Conclusion)

In my last article Activation Functions in ANNs, we discussed on few activation functions, now let’s explore more on some other available activation functions. Tanh Function These are scaled sigmoid function which is similar to sigmoid functions. Or It is nonlinear so we can have more than one layer of neurons depending upon the requirement. Its range is (-1, 1).…

Continue Reading

Inverted tree image
Basics, Supervised Learning

All you need to know about Decision Tree (Part-1)

Introduction As the title suggests, I’ll try to put necessary information on decision tree under this article. However, providing all the required information in one post will be difficult and makes you lost. So, I’ve made this article into two parts. Part 1 (this post), we shall discuss introduction and definitions Part 2 (will publish soon) Algorithms behind decision tree…

Continue Reading

Sigmoid
Logistic Regression, R, Regression

Implementing Logistic Regression using Titanic dataset in R

Introduction In my last post, “Understanding mathematics behind Logistic Regression“, I explained the basic maths behind logistic regression. In this post, I intend to implement logistic regression model in R using Titanic dataset. I have used Titanic dataset for explaining logistic regression where the target variable is ‘Survived’ which has two values 0 and 1. Data Dictionary Variable Definition Key…

Continue Reading

Activation Function
Basics, Neural Networks

Activation Functions in ANNs (Part-1)

Introduction In an ANN the activation function of a node is defined as the threshold after which the node will produce an output given an input or set of inputs. Activation functions can be linear or non-linear but mostly nonlinear functions are being used in ANNs. This is a very important in the way a network learns because in light…

Continue Reading

Basics, Unsupervised Learning

Implementing Principal Component Analysis using Python

This article is in continuation of my previous article on Mathematics of Principal Component Analysis (PCA). It is advised to go through that article before moving into this article. In this post, I will explain how to implement PCA using Python. I have taken the wholesale customer distribution dataset from UCI Machine Learning repository. This dataset refers to clients of…

Continue Reading

Basics, Regression

Understanding mathematics behind Logistic Regression

Introduction to Logistic Regression Logistic Regression is a type of regression in which returns the probability of occurrence of an event by fitting the data to a mathematical function called ‘logit function’. It is basically a classification algorithm and is used mostly when the dependent variable is categorical, the independent variables can be discrete or continuous. Generalized Linear Models Before starting with…

Continue Reading

Basics, Fun with Statistics, Unsupervised Learning

Mathematics of Principal Component Analysis (PCA)

Understanding Principal Component Analysis In this part of the article, I will try to explain the mathematics and intuition behind Principal Component Analysis and in the next part, I will show how to implement Principal Component Analysis (PCA) using Python. PCA is an unsupervised machine learning technique which creates a low dimensional representation of a dataset. PCA is used to…

Continue Reading