Skip to content

toto-haricot/machine-learning

Repository files navigation

Machine Learning 👩‍💻

In this repository we present an implementation from scratch, using only numpy and python most basic libraries, of the classical machine learning algorithms. We will go through the following algorithms :

For each algo we create a specific repository in which you can find the python implementation from scratch along with a jupyter notebook which is meant to train and test our code on some open source datasets. To provide a better understanding of the algorithm implemented, you will find in each repository a readme.md file that goes through the matematics that run the algo.

In the datasets/ folder you will find several classic open source datasets that we will use to train and test our models. In addition a python module called utils.py gives some useful functions to work with the datasets.

Note : In this repository we will propose an implementation for vanilla neural network in the mlp (multi layer perceptron) repository but we won't go further into deep learning. More deep learning code will be shared in some other coming repositories.

Models 🗳️

In processing.

K-Means 🥝

K-Means is a very simple-to-understand clustering algorithm. We start by setting the parameter K which represents the number of clusters we are looking for. Then we initialize K points at random as our clusters centroïds. As its name suggests, a centroïd is simply the center point of a cluster. Once our centroïds are randomly choosen, we compute for each point its Euclidien Distances to the centroïds. We form the clusters by assigning each point to its closest centroïd. After that we get K groups of data and we will compute the centers of these clusters which we will assign as the new centroïds. Then we can once again form the clusters, and compute the centroïds, and form new clusters, and compute the new centroïds and so on...

More details and illustrations on the K-Means algorithm will soon be available in the coming k_mean/readme.md file.

Linear regression is probably the most common machine learning algorithm. Most of us had already used it even before starting to learn data science or artificial intelligence. This algorithm deals with regression problems and it can also be applied to classification but it's less relevent. The assumption made is that the output $y$ of an input $x$ is linear combination that input with a set of parameters $\omega$. You'll get more details on Linear Regression in the linear_regression/readme.md file

Our code is done in a way that allows the model to be used for multi-linear regression.

A classic linear regression model can easily suffer from over-fitting especially when we deal with polynomial regression, which is not yet presented in this machine_learning repository. To avoid such a behaviour we can restrict the parameters $\omega$ to be not too large thanks to regularization. Our LinearRegression() class implemented in linear_regression/ supports regularization. To do so you have to use the arguments regularization_type and regularization_coef arguments either in the __init__() of fit() method.

Logistic regression is probably the most famous classification algorithm. It is quite similar to linear regression except that we pass its result into a sigmoïd function that resizes the output between 0 and 1. We can then interpret this result as a probability and assign to any $x$ the class $C_k$ with the highest probability. Once again, please refer to the logistic_regression/readme.md file to have the global matematical overview.

Note : The current implementation only allows binary classification. This is quite a restriction so we will soon improve it to be multi-class compatible.

Linear Discriminant Analysis is a fundamental discriminative model. It can be used for classification but also for dimension reduction. In this repository we only tackle classfication for the moment. It's not a hard-to-understand algorithm and it is very beneficial to take a bit of time the step into how it is built. This is why we recommand you to take a look to the implementation in LDA.py, the case-study which is an application of LDA for classification on the iris dataset thanks to the notebook LDANotebook.ipynb and have a glance to the readme.md to get a global overview of how the algorithm works.

First draft of code available and description coming very soon

Code available and description coming very soon

[Quadratic Discriminant Analysis] 🚧

Coming soon

First draft of code available and description coming very soon

About

Machine Learning algorithms implemented from scratch with Python

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors