Dan Saattrup NielsenMathematician and Machine Learning Specialist
https://saattrupdan.github.io
Graph Convolutional Neural Networks
As more and more businesses strive toward becoming data-driven, the use of graph methods for storing relational data has been on the rise ( [1], [2], [3]). Along with these graph databases comes more opportunities for analysing the data, including the use of predictive machine learning models on graphs. The...
Sun, 30 May 2021 00:00:00 +0000
https://saattrupdan.github.io/2021-05-30-graph-convolutional-neural-networks/
https://saattrupdan.github.io/2021-05-30-graph-convolutional-neural-networks/Doubt - Bringing back uncertainty to ML
I have previously been exploring uncertainty measures that we can build into our machine learning models, making it easier to see whether a concrete prediction can be trusted or not. This involved confidence intervals for datasets and prediction intervals for models; see the previous posts in this series for a...
Sun, 04 Apr 2021 00:00:00 +0000
https://saattrupdan.github.io/2021-04-04-doubt/
https://saattrupdan.github.io/2021-04-04-doubt/DeepWalk
Deep learning has almost exclusively been working with simple objects: images and text. By simple I am here referring to the graphical structure of these objects, where a word is a linear sequence of letters, a document is a linear sequence of words, and an image is a rectangular grid...
Mon, 24 Aug 2020 00:00:00 +0000
https://saattrupdan.github.io/2020-08-24-deepwalk/
https://saattrupdan.github.io/2020-08-24-deepwalk/The PageRank Algorithm
I’ve recently started working with graph structures in the context of machine learning, and have found that I’ve opened what seems to be a reverse Pandora’s box, full of neat algorithms that can pull out a lot of insights from graph structures. As a way of cementing my knowledge and...
Fri, 07 Aug 2020 00:00:00 +0000
https://saattrupdan.github.io/2020-08-07-pagerank/
https://saattrupdan.github.io/2020-08-07-pagerank/Quantile regression forests
A random forest is an incredibly useful and versatile tool in a data scientist’s toolkit, and is one of the more popular non-deep models that are being used in industry today. If we now want our random forests to also output their uncertainty, it would seem that we are forced...
Sun, 05 Apr 2020 00:00:00 +0000
https://saattrupdan.github.io/2020-04-05-quantile-regression-forests/
https://saattrupdan.github.io/2020-04-05-quantile-regression-forests/Quantile regression
When we are performing regression analysis using complicated predictive models such as neural networks, knowing how certain the model is is highly valuable in many cases, for instance when the applications are within the health sector. The bootstrap prediction intervals that we covered last time requires us to train the...
Mon, 09 Mar 2020 00:00:00 +0000
https://saattrupdan.github.io/2020-03-09-quantile-regression/
https://saattrupdan.github.io/2020-03-09-quantile-regression/Bootstrapping prediction intervals
Continuing from where we left off, in this post I will discuss a general way of producing accurate prediction intervals for all machine learning models that are in use today. The algorithm for producing these intervals uses bootstrapping and was introduced in Kumar and Srivastava (2012). This post is part...
Sun, 01 Mar 2020 00:00:00 +0000
https://saattrupdan.github.io/2020-03-01-bootstrap-prediction/
https://saattrupdan.github.io/2020-03-01-bootstrap-prediction/Parametric prediction intervals
One aspect of machine learning that does not seem to attract much attention is quantifying the uncertainty of our models’ predictions. In classification tasks we can partially remedy this by outputting conditional probabilities rather than boolean values, but what if the model is outputting 52%? Is that a clear-cut positive...
Wed, 26 Feb 2020 00:00:00 +0000
https://saattrupdan.github.io/2020-02-26-parametric-prediction/
https://saattrupdan.github.io/2020-02-26-parametric-prediction/Evaluating confidence
This post will be the first post where I’m delving into quantifying uncertainty of statistical models. We start with the classical confidence interval, used to estimate uncertainty of statistics about the data that we are working with. Computing confidence intervals can be done using normal theory, which is the classical...
Thu, 20 Feb 2020 00:00:00 +0000
https://saattrupdan.github.io/2020-02-20-confidence/
https://saattrupdan.github.io/2020-02-20-confidence/Scholarly
Categorising scientific papers -
I recently finished Scholarly, a long-standing side project of mine, which consists of predicting the category of a given title and abstract of a scientific paper. More precisely, I am predicting the ~150 subject classification categories from the arXiv preprint server, and have trained the model on all papers on...
Tue, 21 Jan 2020 00:00:00 +0000
https://saattrupdan.github.io/2020-01-21-scholarly/
https://saattrupdan.github.io/2020-01-21-scholarly/Syllabification with neural networks
As part of another project, I came across the problem of correctly counting the number of syllables in English words. After searching around and seeing mostly rule- and dictionary-based methods, I ended up building such a syllable counter from scratch, which ultimately led to the construction of a neural network...
Mon, 11 Nov 2019 00:00:00 +0000
https://saattrupdan.github.io/2019-11-11-syllables/
https://saattrupdan.github.io/2019-11-11-syllables/Squared error and cross entropy
When introduced to machine learning, practically oriented textbooks and online courses focus on two major loss functions, the squared error for regression tasks and cross entropy for classification tasks, usually with no justification for why these two are important. I’ll here show that they’re both instances of the same concept:...
Sun, 27 Oct 2019 00:00:00 +0000
https://saattrupdan.github.io/2019-10-27-squared-error-and-cross-entropy/
https://saattrupdan.github.io/2019-10-27-squared-error-and-cross-entropy/NaturalSelection
A Python package to easily evolve neural networks -
In a deep learning project I am currently working on, I faced the inevitable problem of having to tune my hyperparameters. After trying a few dozen combinations it felt way more like guesswork than anything and I decided to be more systematic, which eventually led to the development of my...
Sat, 07 Sep 2019 00:00:00 +0000
https://saattrupdan.github.io/2019-09-07-naturalselection/
https://saattrupdan.github.io/2019-09-07-naturalselection/Singular value decomposition
Whenever we are dealing with a complicated problem, it usually helps to break it into smaller pieces that are easier to handle. This is as true in mathematics and machine learning as it is when we’re cooking a meal or cleaning our home. This idea is what’s the guiding principle...
Wed, 12 Jun 2019 00:00:00 +0000
https://saattrupdan.github.io/2019-06-12-singular-value-decomposition/
https://saattrupdan.github.io/2019-06-12-singular-value-decomposition/Normal
Why standardise data? -
The normal distribution. Gaussian distribution. Bell curve. The ideal has many names. But what is so special about this distribution? Answering this question turns out to also give justification for Scikit-Learn’s StandardScaler! Let’s get crackin’. This post is part of my series on distributions: Poisson Uniform Geometric and Exponential Normal...
Wed, 05 Jun 2019 00:00:00 +0000
https://saattrupdan.github.io/2019-06-05-normal/
https://saattrupdan.github.io/2019-06-05-normal/Geometric and Exponential
Forgetful distributions -
This week we’ll deal with memory. More specifically, we’ll tackle the question of when a distribution do not have any memory whatsoever, meaning that it doesn’t depend on past experience in any way. It turns out that there is a unique continuous distribution with this property, the exponential distribution, and...
Tue, 28 May 2019 00:00:00 +0000
https://saattrupdan.github.io/2019-05-28-geometric-exponential/
https://saattrupdan.github.io/2019-05-28-geometric-exponential/Uniform
The universality of the uniform -
Today I’d like to talk about the uniform distribution. It might seem a bit weird to dedicate an entire post to such a thing as it’s arguably one of the simplest distributions there are. But where the definition isn’t that interesting, it has a very fundamental property in that it...
Wed, 22 May 2019 00:00:00 +0000
https://saattrupdan.github.io/2019-05-22-uniform/
https://saattrupdan.github.io/2019-05-22-uniform/Poisson
The law of small numbers -
In these first few posts I’ll cover a few common distributions and note their interesting properties. I’ll try to follow standard notation here, so that capital letters $X,Y,Z$ will denote random variables, which are represented as functions $X\colon\Omega\to\mathbb R$ for some sample probability space $\Omega$. $P(A)$ will be the probability...
Wed, 15 May 2019 00:00:00 +0000
https://saattrupdan.github.io/2019-05-15-poisson/
https://saattrupdan.github.io/2019-05-15-poisson/