Machine Learning Tutorial

Rumman Ansari   Software Engineer   2023-02-01   234 Share
☰ Table of Contents

Table of Content:


Welcome to Machine Learning Fundamentals!

In this course, you will be exposed to the different concepts of Machine learning with a brief overview on each. We will be adding detailed courses covering each of these concepts in-depth.

Difference between Supervised vs. Unsupervised

Take the example of face recognition.

In Supervised learning, one will learn from many examples as to what & how a face is, in terms of structure, color, shape, position of eyes, nose and so on. After several iterations, the algorithm learns to define a face.

In Unsupervised learning, there is no desired output provided. Therefore, categorization is done so that the algorithm differentiates correctly between the face of a horse, cow or human (clustering of data).

Machine Learning Techniques

Now that you have fair understanding of Supervised & Unsupervised learning and Features & Labels, let's now focus on learning different techniques used for Machine Learning.

Decision tree learning is commonly used in data mining. A decision tree is a tree-like model of decisions and possible consequences, chance event outcomes, resource costs, and utility. It is a way to display an algorithm.

Decision Trees (DTs) are a non-parametric supervised learning method used for classification and regression.

Decision Tree - Types

There are 2 types of Decision trees:

  • Classification Tree - The predicted outcome is the class to which the data belongs. This corresponds to the Tree models where the target variable can take a finite set of values.

  • Regression Tree - The predicted outcome can be considered a real number. This corresponds to the Tree models where the target variable can take continuous values.

(We will discuss these in detail in separate course).

Decision Tree - Pros & Cons

Pros:

  • Easy and simple to understand & interpret.
  • Can analyze both numerical and categorical data.

Cons:

  • Small variations in the data might generate a completely different tree.

Naïve Bayes

Naive Bayes, a supervised learning methodology, is a family of algorithms based on a common principle:

All Naive Bayes classifiers assume that the value of a particular feature is independent of the value of any other feature, given the class variable

Naïve Bayes Usage
  • Naive Bayes comes handy since you can train it quickly.
  • You can use it when you have limited resources in terms of CPU and Memory.
  • It is usually used for Real-time predictions, Multi-class Predictions, Text classification, Spam filtering, and Sentiment Analysis.

Gradient Descent

  • Gradient descent is an optimization algorithm. It is normally used to find the values of parameters (coefficients) of a function (f) that minimizes a cost function (cost).
  • In Gradient descent, the algorithm has to run through ALL the samples in given training set to update a parameter in a particular iteration.
  • Hence, if the number of training samples is large, or in fact very large, then using gradient descent may be time-consuming. This is when you update the values of the parameters in every iteration, you are running through the complete training set.

Linear Regression

  • Linear regression (considered as a step up after correlation) predicts the value of a dependent variable depending on the value of an independent variable.
  • Simple linear regression has only 1 independent variable whereas multiple linear regression has (>1) independent variables.
  • It is very sensitive to Outliers. Outliers could terribly affect the regression line and eventually the forecasted values. Hence it is a good practice to keep a check on the Outliers.

Logistic Regression

Logistic regression or logit model is used to model dichotomous outcome variables.

This is used with data where there is a binary (success-failure) outcome variable. It is also used when the outcome takes the form of a binomial proportion.

Support Vector Machine

Support Vector Machine (SVM) is a supervised machine learning algorithm. It is used for classification or regression type of problems.

  • SVM is all about identifying the right hyper plane. To decide the right hyper-plane, we need to maximize the distances between the nearest data point (either class) and hyper-plane.

  • SVM works well with clear margin of separation & high dimensional spaces.

Kernel Methods

  • Kernel methods provide ways to manipulate data as though it were projected into a higher dimensional space, by operating on it in its original space.
  • The number of operations required is not necessarily proportional to the number of features.

Neural Networks

Neural network is a powerful computational data model, that captures and represents complex input/output relationships.

This model is motivated by the desire to develop an artificial system which would be able to perform "intelligent tasks" similar to the human brain.

Neural networks try to resemble the human brain in the following two ways:

  1. Acquires knowledge through learning.
  2. Stores knowledge within inter-neuron connection strengths known as synaptic weights.
Neural Networks - Application

Neural Networks have a broad spectrum of data-intensive applications such as,

  • Process Modeling and Control
  • Machine Diagnostics
  • Target Recognition
  • Medical Diagnosis
  • Credit Rating
  • Financial Forecasting and so on.

Clustering

Clustering is an unsupervised learning model that deals with finding a structure/cluster in a collection of unlabeled data.

The idea is to partition the examples into clusters or classes. Each class predicts feature values for the examples in the class.

The are 2 main types of clustering :

  • K-Means Clustering
  • Hierarchical Clustering

We will discuss about the clustering types in the next set of cards.

K-means Clustering

In Clustering, you partition a group of data points into a small number of clusters.

This tries to improve the inter group similarity while keeping the groups as far as possible from each other.

How long to do iteration in K-means

Iterate until stable (= no object move group):

  • Determine the centroid coordinate.
  • Determine the distance of each object to the centroids.
  • Group the object based on minimum distance (find the closest centroid).
Hierarchical Clustering

Hierarchical clustering builds hierarchy of clusters.

We do not partition the data into a particular cluster in a single step. Instead, there are a series of partitions which may contain either a single cluster with all the objects or n clusters, each containing a single object.

Hierarchical Clustering

The Hierarchical clustering can be implemented in 2 ways:

Top-Down Approach In this approach, all data points are assigned to a single cluster, and the clusters recursively perform splits till each data point is assigned a separate cluster.

Bottom-Top Approach In this approach, all data points assigned are slowly merged recursively until it forms a single large cluster.