This post does a quick review of how Proximal Policy Optimization (PPO) and Direct Preference Optimization (DPO) work, and what their relationship to Reinforcement Learning from Human Feedback (RLHF) is. Especially RLHF has become popular over the past two years for fine-tuning networks beyond simple SFT. This phase is often called alignment, where we train a model to follow human preferences in a wide variety of tasks.
Recently I came across a paper extensively using the EM algorithm and I felt I was lacking a deeper understanding of its inner workings. As a result I decided to review it here, mostly following the excellent machine learning class from Stanford CS229. In this post I follow the structure outlined in the class notes but change the notation slightly for clarity. We will first have a look at \(k\)-means clustering, then see the EM algorithm in the special case of mixtures of Gaussians and finally discover a general version of EM.
This post are my lecture notes from watching the first six episodes of the excellent DeepMind x UCL Deep Learning Lecture Series 2020. I recommend to watch them in full, if you have the time. They cover a broad range of topics:
This post is as applied as it gets for this blog. We will see how to manipulate multi-dimensional arrays as clean and efficient as possible. Being able to do so is an essential tool for any machine learning practitioner these days, much of what is done in python nowadays would not be possible without libraries such as NumPy, PyTorch and TensorFlow which handle heavy workloads in the background. This is especially true if you are working in computer vision. Images are represented as multi-dimensional arrays, and we frequently need to pre- and post-process them in an efficient manner in the ML-pipeline. In what follows, we will see some of the tools necessary for these tasks.
The goal of this post is to create the basic building blocks of a neural network from scratch. This means without using any PyTorch or Tensorflow library functionalities, but in the end the code should look as simple as when using them. This work is based on the fast.ai course called deep learning from the foundations which gives an introduction similar to this over multiple lectures. I highly recommended watching the entire course.
When looking at a Deep Learning related project or paper, there are four fundamental parts for me: data, network architecture, optimization method and loss function. As the title suggests, here will focus on the last part. Loss functions are deeply tied to the task one is trying to solve, and are often used as measures of progress during training. In this post series we are going to see where they come from and why we using the ones we do. This first part will cover losses for the tasks of regression and classification.
If you are able to read this, then that means I’ve finally managed to put my small blog online and will publish more interesting posts soon. The goal of this blog is to journal topics I have recently studied using what is sometimes called the Feynman Technique: