Golang Neural Network
March 15, 2023
Have you ever wanted to understand Machine Learning (ML) from the ground up? Then it is a good exercise to write a neural network yourself! This article shows you how it can be done in Go (v1.20). For evaluation, we will use the MNIST dataset that contains 70.000 images of hand-written digits. At the end of this article, you will have your own feed-forward neural network that can be used for image classification.
Network Initialization
First, we have to define a struct that represents the network.
A typical neural network consists of multiple layers where the first
one represents the input data (input layer) and the last one produces
the output data (output layer). In between, we can add arbitrarily many
layers that are hidden from the user's perspective (hidden
layers). For
simplicity, we only define one hidden layer.
The number of units for each layer is stored in the integers
Inodes, Hnodes,
and Onodes
.
During training and
inference, input data is fed forward and modified at each layer. This
modification is achieved with weights that are assigned to
each layer's units. We define the weight matrix Wih
for weights between the input and hidden layers
as well as the weight matrix Who
for weights
between the hidden and output layers. To visualize the role of weights: the element in the fifth row
and third column of Wih
connects the fifth unit
of the input layer with the third unit of the hidden layer.
Finally, we define a learning rate that describes the speed at which the neural network adjusts itself in the event of errors during training. The learning rate does not necessarily have to be a constant as there can be advantages in adapting the rate during training. However, we will restrict it to a constant value for the sake of simplicity.
The TrainingStep
variable is an internal
representation that keeps track of the overall training process and
can be ignored for now.
type NeuralNetwork struct {
Inodes int
Hnodes int
Onodes int
Learningrate float64
TrainingStep uint64
Wih [][]float64
Who [][]float64
}
For good results, the network's weights should be initialized
with certain values before the first training step. We use the normal
distribution function implemented in the distuv
package of gonum.org (v0.12.0)
for this purpose. In the code below, the CreateWeights
helper function receives the
normally distributed random generator as a function and prepares a
two-dimensional weights array with random values. We can later use the
NewNeuralNetwork
function to
create neural networks with our desired layer dimensions and learning
rate.
import (
"math"
"golang.org/x/exp/rand"
"gonum.org/v1/gonum/stat/distuv"
)
func CreateWeights(rows, columns int, fill func() float64) [][]float64 {
var data [][]float64 = make([][]float64, rows)
for i, _ := range data {
data[i] = make([]float64, columns)
if fill != nil {
for j, _ := range data[i] {
data[i][j] = fill()
}
}
}
return data
}
func NewNeuralNetwork(inodes, hnodes, onodes int, learningrate float64) *NeuralNetwork {
wihNormal := distuv.Normal{0, math.Pow(float64(hnodes), -0.5), rand.NewSource(0)}
whoNormal := distuv.Normal{0, math.Pow(float64(onodes), -0.5), rand.NewSource(0)}
wih := CreateWeights(hnodes, inodes, wihNormal.Rand)
who := CreateWeights(onodes, hnodes, whoNormal.Rand)
return &NeuralNetwork{inodes, hnodes, onodes, learningrate, 0, wih, who}
}
Activation-Function
We apply weight matrices in between each layer. In detail, we build the dot product of a weight matrix with the transposed input vector. Afterwards, the resulting matrix is passed through an activation function, before we use that as input at the next layer.
There are different activation functions for all kinds of purposes. They
can be linear, logistic, map to boolean values, etc. In our case, we use
the logistic function (also called expit) that maps to the range \((0,
1)\):
$$
\text{expit}(x) = \frac{1}{1+e^{-x}}
$$
This function has a very easily calculated derivative which will come in
handy for us later:
$$
\frac{d}{dx}\text{expit}(x) = \text{expit}(x)\cdot (1-\text{expit}(x))
$$
In the code below, the ActivationFunction
receives a matrix and applies the
expit
function on a copy of the received matrix.
func expit(x float64) float64 {
return 1 / (1 + math.Exp(-x))
}
func ActivationFunction(matrix [][]float64) (outMatrix [][]float64) {
outMatrix = make([][]float64, len(matrix))
for i, _ := range outMatrix {
outMatrix[i] = make([]float64, len(matrix[i]))
for j, _ := range outMatrix[i] {
outMatrix[i][j] = expit(matrix[i][j])
}
}
return
}
Matrix Operations
All that is left before we can define a training procedure is the introduction of a couple of matrix operations. Although we could use libraries such as gonum, it can be a fun exercise to come up with a personal implementation.
As already mentioned, we will need a DotProduct
function. We also need a way to
transpose matrices and perform element wise computations on matrix pairs.
Here, you can find the function headers of the methods that need to be
implemented. Try to find your own algorithms before you check the
solutions below!
func TransposeArray(array []float64) [][]float64
func TransposeMatrix(matrix [][]float64) [][]float64
func DotProduct(matrixA, matrixB [][]float64) [][]float64
func MatrixOperation(matrixA, matrixB [][]float64, operator func(float64, float64) float64) [][]float64
// Example Use Case for DotProduct:
matrixA := [][]float64{{1,2}, {3,4}}
matrixB := [][]float64{{5,6}, {7,8}}
matrixC := DotProduct(matrixA, matrixB)
// matrixC: [][]float64{{19,22}, {43,50}}
// MatrixOperation:
addition := func(a, b float64) float64 { return a + b }
matrixC = MatrixOperation(matrixA, matrixB, addition)
// matrixC: [][]float64{{6,8}, {10,12}}
func TransposeArray(array []float64) [][]float64 {
var arrayTransposed [][]float64 = make([][]float64, len(array))
for i, _ := range arrayTransposed {
arrayTransposed[i] = []float64{array[i]}
}
return arrayTransposed
}
func TransposeMatrix(matrix [][]float64) [][]float64 {
if len(matrix) < 1 {
return matrix
}
var matrixTransposed [][]float64 = make([][]float64, len(matrix[0]))
for i, _ := range matrixTransposed {
matrixTransposed[i] = make([]float64, len(matrix))
for j, _ := range matrixTransposed[i] {
matrixTransposed[i][j] = matrix[j][i]
}
}
return matrixTransposed
}
func DotProduct(matrixA, matrixB [][]float64) [][]float64 {
var outMatrix [][]float64 = make([][]float64, len(matrixA))
for i, _ := range matrixA {
outMatrix[i] = make([]float64, len(matrixB[0]))
for j, _ := range matrixB[0] {
var sum float64 = 0
for k, _ := range matrixB {
sum += matrixA[i][k] * matrixB[k][j]
}
outMatrix[i][j] = sum
}
}
return outMatrix
}
func MatrixOperation(matrixA, matrixB [][]float64, operator func(float64, float64) float64) [][]float64 {
var outMatrix [][]float64 = make([][]float64, len(matrixA))
for i, _ := range matrixA {
outMatrix[i] = make([]float64, len(matrixA[i]))
for j, _ := range matrixB[i] {
outMatrix[i][j] = operator(matrixA[i][j], matrixB[i][j])
}
}
return outMatrix
}
Training!
During training, the neural network receives data samples and corresponding labels. The produced output is compared to the provided labels. Based on this comparison, an error vector is computed that is propagated back to the weight matrices.
Let \(x\) be the input vector and \(y\) the one-hot encoded output
target. Then the hidden layer's values are computed as
$$
x' = \text{ActivationFunction}(w_{\text{ih}}\cdot x^\text{T})
$$
where \(x'\) corresponds to hiddenOutputs
in the code below.
After passing \(x'\) forward, the values at the output layer are
obtained:
$$
x'' = \text{ActivationFunction}(w_{\text{ho}}\cdot x').
$$
Here, \(x''\) corresponds to finalOutputs
.
If we were only interested in the network's output for certain input data, we would be done with the last step above, as \(x''\) resembles the output data. During training, however, we have to give the network feedback such that it can improve itself.
In the next step, we therefore compute an error vector \(e_\text{ho}\)
(corresponding to outputErrors
)
that will be
used for adjusting the weights matrix \(w_\text{ho}\):
$$
e_\text{ho} = y^\text{T} - x''.
$$
The error vector \(e_\text{ih}\)
(hiddenErrors
)
can be computed by
propagating the previously calculated vector:
$$
e_\text{ih} = w_\text{ho}^\text{T}\cdot e_\text{ho}
$$
Finally, we can compute new weight matrices based on the error vectors, the derivative of the activation function, and the learning rate \(\alpha\): $$ w_\text{ih}' = w_\text{ih} + \alpha \cdot (e_\text{ih}\circ x'\circ (1-x'))\cdot x^\text{T} $$ $$ w_\text{oh}' = w_\text{oh} + \alpha \cdot (e_\text{ho}\circ x''\circ (1-x''))\cdot x' $$ The symbol \(\circ\) represents the Hadamard product, i.e., element-wise multiplication.
The function below performs all computations described above and returns
a new NeuralNetwork
object that
embodies the new weights \(w_\text{ih}'\) and \(w_\text{ho}'\).
func (n NeuralNetwork) Train(inputs []float64, targets []float64) *NeuralNetwork {
var inputsTransposed = TransposeArray(inputs)
var targetsTransposed = TransposeArray(targets)
var hiddenInputs = DotProduct(n.Wih, inputsTransposed)
var hiddenOutputs = ActivationFunction(hiddenInputs)
var finalInputs = DotProduct(n.Who, hiddenOutputs)
var finalOutputs = ActivationFunction(finalInputs)
var outputErrors [][]float64 = MatrixOperation(targetsTransposed, finalOutputs, func(a float64, b float64) float64 {
return a - b
})
var hiddenErrors = DotProduct(TransposeMatrix(n.Who), outputErrors)
// Computation below: who += learningrate * Dot(output*finalOutput*(1.0-finalOutput), hiddenOutput.Tranpose)
who := MatrixOperation(n.Who, DotProduct(MatrixOperation(outputErrors, finalOutputs, func(a float64, b float64) float64 {
return a * b * (1.0 - b)
}), TransposeMatrix(hiddenOutputs)),
func(a float64, b float64) float64 {
return a + (n.Learningrate)*b
})
wih := MatrixOperation(n.Wih, DotProduct(MatrixOperation(hiddenErrors, hiddenOutputs, func(a float64, b float64) float64 {
return a * b * (1.0 - b)
}), TransposeMatrix(inputsTransposed)),
func(a float64, b float64) float64 {
return a + n.Learningrate*b
})
return &NeuralNetwork{Inodes: n.Inodes, Hnodes: n.Hnodes, Onodes: n.Onodes, Learningrate: n.Learningrate, TrainingStep: n.TrainingStep + 1, Wih: wih, Who: who}
}
Since we have already implemented network querying functionality within the
Train
function, let's quickly add a
function that is dedicated to only that. In short, the function below
simply returns \(x''\) from above.
func (n NeuralNetwork) Query(inputs []float64) [][]float64 {
var inputsTransposed = TransposeArray(inputs)
var hiddenInputs = DotProduct(n.Wih, inputsTransposed)
var hiddenOutputs = ActivationFunction(hiddenInputs)
var finalInputs = DotProduct(n.Who, hiddenOutputs)
var finalOutputs = ActivationFunction(finalInputs)
return finalOutputs
}
And that's it! The functions above are sufficient to train and use feed-forward neural networks. Of course, it would be beneficial to evaluate our network based on datasets. Continue with the sections below to learn how the network can be applied on the MNIST-dataset.
Working with Data
For simplicity, we will be working with the MNIST dataset in CSV format (find it here). Each CSV row consists of 785 integer entries where the first one represents the label. The last 784 values represent a flattened greyscale image with 28 by 28 pixels. Each image depicts a hand-written digit that matches the label. If you are interested, you can convert the CSV files to PNG (276MB in total) using the code below.
import (
"encoding/csv"
"fmt"
"image"
"image/color"
"image/png"
"os"
"path/filepath"
"strconv"
)
func CsvToPng(filePath, targetDir string) error {
file, err := os.Open(filePath)
if err != nil {
return err
}
defer file.Close()
reader := csv.NewReader(file)
record, err := reader.Read()
counter := 0
for record != nil {
var img = image.NewGray(image.Rect(0, 0, 28, 28))
for i, s := range record[1:] {
value, err := strconv.Atoi(s)
if err != nil {
return err
}
img.Pix[i] = byte(value)
}
var label = record[0]
imgFile, err := os.Create(filepath.Join(targetDir, fmt.Sprintf("%v-image-%v.png", label, counter)))
counter++
if err != nil {
return err
}
err = png.Encode(imgFile, img)
if err != nil {
return err
}
record, err = reader.Read()
if err != nil {
return err
}
}
return nil
}
Let's start by initializing a neural network and loading the training data:
import (
"encoding/csv"
"os"
)
func main() {
var inputNodes = 784
var hiddenNodes = 200
var outputNodes = 10
var learningRate float64 = 0.005
var n *NeuralNetwork = NewNeuralNetwork(inputNodes, hiddenNodes, outputNodes, learningRate)
var trainFile = "/path/to/mnist_train.csv"
file, err := os.Open(trainFile)
if err != nil {
panic(err)
}
reader := csv.NewReader(file)
records, _ := reader.ReadAll()
}
Next, we need to define functions that split the records
object into input data and labels. Remember
that each CSV row starts with the label.
In the two functions below, we iterate over all rows and obtain either
the data or the label respectively. We expect pixel data to lie in the
range \([0,255]\) and normalize that to \((0,1]\).
The function PrepareTrainLabels
does
not only read the label value of each row but performs one-hot encoding.
I.e., for digits from 0 to 9, a zero-valued array of length 10 is
created. The index that matches the label value is then filled with the
value \(0.999\).
import (
"errors"
"strings"
"strconv"
)
func PrepareDataset(rawData [][]string) ([][]float64, error) {
outputData := make([][]float64, len(rawData))
for i, _ := range rawData {
outputData[i] = make([]float64, len(rawData[i])-1)
for j, _ := range rawData[i] {
if j == 0 {
// skip the label
continue
} else {
value, err := strconv.Atoi(strings.Trim(rawData[i][j], " "))
if err != nil {
return nil, err
}
outputData[i][j-1] = ((float64(value) / 255.0) * 0.999) + 0.001
}
}
}
return outputData, nil
}
func PrepareTrainLabels(rawData [][]string, labelMax int) ([][]float64, error) {
labels := make([][]float64, len(rawData))
numLabels := labelMax + 1
for i, _ := range rawData {
labels[i] = make([]float64, numLabels)
for k := 0; k < numLabels; k++ {
labels[i][k] = 0
}
value, err := strconv.Atoi(strings.Trim(rawData[i][0], " "))
if err != nil {
return nil, err
}
if value < 0 || value > labelMax {
return nil, errors.New("Label is not in allowed space: " + strconv.Itoa(value))
}
labels[i][value] = 0.999
}
return labels, nil
}
We can then use these functions to extend the main
method
...
trainData, err := PrepareDataset(records)
if err != nil { panic(err) }
trainLabels, err := PrepareTrainLabels(records, 9)
if err != nil { panic(err) }
Training and Testing
Finally, we can define a training loop. In each iteration, we train with the complete dataset (= one epoch) and shuffle it. For shuffling, we use the Fisher-Yates algorithm that is easy to implement in Go:
...
var epochs = 30
for epoch := 0; epoch < epochs; epoch++ {
fmt.Printf("Epoch %v\n", epoch+1)
// Randomly Shuffle the dataset (Fisher–Yates algorithm):
for i := len(trainData) - 1; i > 0; i-- {
j := rand.Intn(i + 1)
trainData[i], trainData[j] = trainData[j], trainData[i]
trainLabels[i], trainLabels[j] = trainLabels[j], trainLabels[i]
}
for i, _ := range trainData {
n = n.Train(trainData[i], trainLabels[i])
}
}
The time it takes for 30 epochs to complete can be long. Check out the code on GitHub to see how you can perform validation steps in-between epochs. We will omit that implementation here and instead skip to network testing after the training is completed.
Testing is similar to training, except that we concentrate on the Query
function instead of Train
. By querying the network with test
data, we receive for each sample an array of 10 float64 numbers. Within
that array, each
number represents the network's assumed probability for the corresponding
label to be the correct one. By finding the index with the highest number,
we can derive the network's classification decision. For this task, we add
a new function to the NeuralNetwork struct. The Validate
function receives the input dataset and returns both the number of
correctly classified samples and the total number of assessed samples.
The helper function Argmax
is used to
derive the classified label from the networks' output data.
The Validate
function is written
such that you can easily extend it to also return
the raw classification results. Simply append the classifications
array to the return statement.
func (n NeuralNetwork) Validate(inputs [][]float64, labels []int) (int, int) {
var classifications []float64
var correct int = 0
for i, _ := range inputs {
outputs := n.Query(inputs[i])
if labels[i] == Argmax(outputs) {
classifications = append(classifications, 1)
correct++
} else {
classifications = append(classifications, 0)
}
}
return correct, len(classifications)
}
func Argmax(values [][]float64) int {
curMax := values[0][0]
index := 0
for i, _ := range values {
localMax := values[i][0]
for j, _ := range values[i] {
if values[i][j] > localMax {
localMax = values[i][j]
}
}
if localMax > curMax {
curMax = localMax
index = i
}
}
return index
}
The Validate
function expects labels
to be represented in pure integer values so lets add another function
that reads such integer labels from CSV files:
func PrepareTestLabels(rawData [][]string, labelMax int) ([]int, error) {
labels := make([]int, len(rawData))
for i, _ := range rawData {
value, err := strconv.Atoi(strings.Trim(rawData[i][0], " "))
if err != nil {
return nil, err
}
if value < 0 || value > labelMax {
return nil, errors.New("Label is not in allowed space: " + strconv.Itoa(value))
}
labels[i] = value
}
return labels, nil
}
Now, we can add this code to the main method and obtain the accuracy of our network!
...
var testFile = "/path/to/mnist_test.csv"
file, err = os.Open(testFile)
if err != nil {
panic(err)
}
reader = csv.NewReader(file)
records, _ = reader.ReadAll()
testData, err := PrepareDataset(records)
if err != nil { panic(err) }
testLabels, err := PrepareTestLabels(records, 9)
if err != nil { panic(err) }
correct, length := n.Validate(testData, testLabels)
accuracy := float64(correct) / float64(length)
fmt.Println("Accuracy: ", accuracy)
Without any further hyperparameter tuning, I obtained a test accuracy of 97% after 18 epochs, but you can surely beat that!
Summary & Extensions
With just these lines of code, you now have your very own feed-forward neural network. You have learned how to deal with network layers, forward data along weight matrices, backpropagate errors, and evaluate the network with real data.
Still, there are many more aspects that you could find interesting. E.g., if you want to have functionality to store and load your network weights, head over to the GitHub repo where you also find a more advanced training loop that allows you to safely abort training via CTRL+C without losing the progress. The repo also offers a function to read 28 by 28 PNG images for training and classifying your own data.
As you may have noticed, the network has lots of room for optimization. If you are motivated, you can try to come up with your own solutions for the following topics:
- Batch Training
- Multiple Hidden Layers
- Parallelized Matrix Operations
- Improved Learning Rate