Introduction of Machine Learning

What is machine learning?

Definition

  • Literlly, “machine” denotes “programming computer” and “learning” denotes “learn from data”.
  • In a general sense, machine learning means that the computer can learn some ability without explicitly programming.
  • From the perspective of engineering, given some task $T$, corresponding experience (training data) $E$ and performance measurement $P$, machine learning hopes to learn from $E$ so that the performance $P$ on task $T$ can be improved.

Machine learning is a interdisciplinary field, which relates to computer science, statistics, mathematics and so on.

Basic element

  • Data. Every insatcne is called sample. The set of training and testing data is called training set and testing set, respectively. Since for some algorithms, parameters are required to be tuned, we need to split a subset from the training set, which is called evaluation set and used for determining how good or bad the parameters are.
  • Model. It can be viewed as a function $f$. Given an input $x$, one can get an output $y$. The model may rely on some changeable parameters $\theta$. The process of learning is to update $\theta$.
  • Performance measurement. It is used to evalute the performance of the model. We can use utility function, fitness function to evaluate how good a model is. And we can also use the cost function to evaluate how bad a model is.

Procedure

  • To study data;
  • To select a model;
  • To train the model on the training set;
  • To make a(n) prediction/inference on new data.

Why use machine learnig?

  • To do work that requires a lot of hand-tuning or long lists of rules;
  • To adpat to change of environment/data;
  • To solve problems that is difficult for human;
  • To learning unkonwn rules (data mining)

Types of machine learning

There are many categories for machine learning algorithms. Generally, we can classify them from the following perspectives.

Training data

Supervised learning.

In supervised learning, each training sample $x\in \mathscr{X}$ has a label $y\in\mathscr{Y}$.

  • Classification. The label set $\mathscr{Y}$ consists of finite elements, such as $\{0,1\}$, $\{\text{Yes}, \text{No}\}$ and so on. The classification task is to determine which class is for a given sample.
  • Regression. The label set $\mathscr{Y}$ consists of an interval or even more complex elements, such as $[0,1]$. The regression task is to find a suitable map from $\mathscr{X}$ to $\mathscr{Y}$.
  • Ranking. The samples are splitted into different group, and the label set can either be discrete or continuous. This is a special task and commonly used in recommended systems. It aims to give ranks of samples in a group.

Some common supervised learning algorithms are given below:

  • k-Nearest Neighbors
  • Linear Regression
  • Logistic Regression
  • Support Vector Machines (SVMs)
  • Decision Trees and Random Forests
  • Neural networks

Unsupervised learning.

  • Clustering
    • K-Means
    • DBSCAN
    • Hierarchical Cluster Analysis (HCA)
  • Anomaly detection and novelty detection
    • One-class SVM
    • Isolation Forest
  • Visualization and dimensionality reduction
    • Principal Component Analysis (PCA)
    • Kernel PCA
    • Locally-Linear Embedding (LLE)
    • t-distributed Stochastic Neighbor Embedding (t-SNE)
  • Association rule learning
    • Apriori
    • Eclat

Semisupervised learning.

Reinforcement learning.

Introduction Of Reinforcement Learning

Learning

  • Offline learning/batch learning.
  • Online learning.

Generalizing

  • Instance-based learning.
  • Model-based learning.

Main challenges of machine learning

Data

  • Lack of training data;
  • Lack of representitive of training data;
  • Poor quaility of training data;
  • Irrelevant features.

Model

  • Overfitting of models on training data;
  • Underfitting of models on training data.

References

  1. Hands-on Machine Learning with Scikit-Learn, Keras, and TensorFlow, 2nd Edition
Your support will encourage me to continue to create.