Golang Neural Network

March 15, 2023

Have you ever wanted to understand Machine Learning (ML) from the ground up? Then it is a good exercise to write a neural network yourself! This article shows you how it can be done in Go (v1.20). For evaluation, we will use the MNIST dataset that contains 70.000 images of hand-written digits. At the end of this article, you will have your own feed-forward neural network that can be used for image classification.

Network Initialization

First, we have to define a struct that represents the network. A typical neural network consists of multiple layers where the first one represents the input data (input layer) and the last one produces the output data (output layer). In between, we can add arbitrarily many layers that are hidden from the user's perspective (hidden layers). For simplicity, we only define one hidden layer. The number of units for each layer is stored in the integers Inodes, Hnodes, and Onodes.

During training and inference, input data is fed forward and modified at each layer. This modification is achieved with weights that are assigned to each layer's units. We define the weight matrix Wih for weights between the input and hidden layers as well as the weight matrix Who for weights between the hidden and output layers. To visualize the role of weights: the element in the fifth row and third column of Wih connects the fifth unit of the input layer with the third unit of the hidden layer.

Finally, we define a learning rate that describes the speed at which the neural network adjusts itself in the event of errors during training. The learning rate does not necessarily have to be a constant as there can be advantages in adapting the rate during training. However, we will restrict it to a constant value for the sake of simplicity.

The TrainingStep variable is an internal representation that keeps track of the overall training process and can be ignored for now.

type NeuralNetwork struct {
  Inodes       int
  Hnodes       int
  Onodes       int
  Learningrate float64
  TrainingStep uint64
  Wih          [][]float64
  Who          [][]float64 
}

For good results, the network's weights should be initialized with certain values before the first training step. We use the normal distribution function implemented in the distuv package of gonum.org (v0.12.0) for this purpose. In the code below, the CreateWeights helper function receives the normally distributed random generator as a function and prepares a two-dimensional weights array with random values. We can later use the NewNeuralNetwork function to create neural networks with our desired layer dimensions and learning rate.

import (
  "math"
  "golang.org/x/exp/rand"
  "gonum.org/v1/gonum/stat/distuv"
)

func CreateWeights(rows, columns int, fill func() float64) [][]float64 {
  var data [][]float64 = make([][]float64, rows)
  for i, _ := range data {
    data[i] = make([]float64, columns)
    if fill != nil {
      for j, _ := range data[i] {
        data[i][j] = fill()
      }
    }
  }
  return data
}

func NewNeuralNetwork(inodes, hnodes, onodes int, learningrate float64) *NeuralNetwork {
  wihNormal := distuv.Normal{0, math.Pow(float64(hnodes), -0.5), rand.NewSource(0)}
  whoNormal := distuv.Normal{0, math.Pow(float64(onodes), -0.5), rand.NewSource(0)}
  wih := CreateWeights(hnodes, inodes, wihNormal.Rand)
  who := CreateWeights(onodes, hnodes, whoNormal.Rand)
  return &NeuralNetwork{inodes, hnodes, onodes, learningrate, 0, wih, who}
}

Activation-Function

We apply weight matrices in between each layer. In detail, we build the dot product of a weight matrix with the transposed input vector. Afterwards, the resulting matrix is passed through an activation function, before we use that as input at the next layer.

There are different activation functions for all kinds of purposes. They can be linear, logistic, map to boolean values, etc. In our case, we use the logistic function (also called expit) that maps to the range $(0, 1)$: $$ \text{expit}(x) = \frac{1}{1+e^{-x}} $$ This function has a very easily calculated derivative which will come in handy for us later: $$ \frac{d}{dx}\text{expit}(x) = \text{expit}(x)\cdot (1-\text{expit}(x)) $$ In the code below, the ActivationFunction receives a matrix and applies the expit function on a copy of the received matrix.

func expit(x float64) float64 {
  return 1 / (1 + math.Exp(-x))
}

func ActivationFunction(matrix [][]float64) (outMatrix [][]float64) {
  outMatrix = make([][]float64, len(matrix))
  for i, _ := range outMatrix {
    outMatrix[i] = make([]float64, len(matrix[i]))
    for j, _ := range outMatrix[i] {
      outMatrix[i][j] = expit(matrix[i][j])
    }
  }
  return
}

Matrix Operations

All that is left before we can define a training procedure is the introduction of a couple of matrix operations. Although we could use libraries such as gonum, it can be a fun exercise to come up with a personal implementation.

As already mentioned, we will need a DotProduct function. We also need a way to transpose matrices and perform element wise computations on matrix pairs. Here, you can find the function headers of the methods that need to be implemented. Try to find your own algorithms before you check the solutions below!

func TransposeArray(array []float64) [][]float64
func TransposeMatrix(matrix [][]float64) [][]float64
func DotProduct(matrixA, matrixB [][]float64) [][]float64
func MatrixOperation(matrixA, matrixB [][]float64, operator func(float64, float64) float64) [][]float64

// Example Use Case for DotProduct:
matrixA := [][]float64{{1,2}, {3,4}}
matrixB := [][]float64{{5,6}, {7,8}}
matrixC := DotProduct(matrixA, matrixB)
// matrixC: [][]float64{{19,22}, {43,50}}

// MatrixOperation:
addition := func(a, b float64) float64 { return a + b }
matrixC = MatrixOperation(matrixA, matrixB, addition)
// matrixC: [][]float64{{6,8}, {10,12}}

func TransposeArray(array []float64) [][]float64 {
  var arrayTransposed [][]float64 = make([][]float64, len(array))
  for i, _ := range arrayTransposed {
    arrayTransposed[i] = []float64{array[i]}
  }
  return arrayTransposed
}

func TransposeMatrix(matrix [][]float64) [][]float64 {
  if len(matrix) < 1 {
    return matrix
  }
  var matrixTransposed [][]float64 = make([][]float64, len(matrix[0]))
  for i, _ := range matrixTransposed {
    matrixTransposed[i] = make([]float64, len(matrix))
    for j, _ := range matrixTransposed[i] {
      matrixTransposed[i][j] = matrix[j][i]
    }
  }
  return matrixTransposed
}

func DotProduct(matrixA, matrixB [][]float64) [][]float64 {
  var outMatrix [][]float64 = make([][]float64, len(matrixA))
  for i, _ := range matrixA {
    outMatrix[i] = make([]float64, len(matrixB[0]))
    for j, _ := range matrixB[0] {
      var sum float64 = 0
      for k, _ := range matrixB {
        sum += matrixA[i][k] * matrixB[k][j]
      }
      outMatrix[i][j] = sum
    }
  }
  return outMatrix
}

func MatrixOperation(matrixA, matrixB [][]float64, operator func(float64, float64) float64) [][]float64 {
  var outMatrix [][]float64 = make([][]float64, len(matrixA))
  for i, _ := range matrixA {
    outMatrix[i] = make([]float64, len(matrixA[i]))
    for j, _ := range matrixB[i] {
      outMatrix[i][j] = operator(matrixA[i][j], matrixB[i][j])
    }
  }
  return outMatrix
}

Training!

During training, the neural network receives data samples and corresponding labels. The produced output is compared to the provided labels. Based on this comparison, an error vector is computed that is propagated back to the weight matrices.

Let $x$ be the input vector and $y$ the one-hot encoded output target. Then the hidden layer's values are computed as $$ x' = \text{ActivationFunction}(w_{\text{ih}}\cdot x^\text{T}) $$ where $x'$ corresponds to hiddenOutputs in the code below.
After passing $x'$ forward, the values at the output layer are obtained: $$ x'' = \text{ActivationFunction}(w_{\text{ho}}\cdot x'). $$ Here, $x''$ corresponds to finalOutputs.

If we were only interested in the network's output for certain input data, we would be done with the last step above, as $x''$ resembles the output data. During training, however, we have to give the network feedback such that it can improve itself.

In the next step, we therefore compute an error vector $e_\text{ho}$ (corresponding to outputErrors) that will be used for adjusting the weights matrix $w_\text{ho}$: $$ e_\text{ho} = y^\text{T} - x''. $$ The error vector $e_\text{ih}$ (hiddenErrors) can be computed by propagating the previously calculated vector: $$ e_\text{ih} = w_\text{ho}^\text{T}\cdot e_\text{ho} $$

Finally, we can compute new weight matrices based on the error vectors, the derivative of the activation function, and the learning rate $\alpha$: $$ w_\text{ih}' = w_\text{ih} + \alpha \cdot (e_\text{ih}\circ x'\circ (1-x'))\cdot x^\text{T} $$ $$ w_\text{oh}' = w_\text{oh} + \alpha \cdot (e_\text{ho}\circ x''\circ (1-x''))\cdot x' $$ The symbol $\circ$ represents the Hadamard product, i.e., element-wise multiplication.

The function below performs all computations described above and returns a new NeuralNetwork object that embodies the new weights $w_\text{ih}'$ and $w_\text{ho}'$.

func (n NeuralNetwork) Train(inputs []float64, targets []float64) *NeuralNetwork {
  var inputsTransposed = TransposeArray(inputs)
  var targetsTransposed = TransposeArray(targets)

  var hiddenInputs = DotProduct(n.Wih, inputsTransposed)
  var hiddenOutputs = ActivationFunction(hiddenInputs)

  var finalInputs = DotProduct(n.Who, hiddenOutputs)
  var finalOutputs = ActivationFunction(finalInputs)

  var outputErrors [][]float64 = MatrixOperation(targetsTransposed, finalOutputs, func(a float64, b float64) float64 {
    return a - b
  })
  var hiddenErrors = DotProduct(TransposeMatrix(n.Who), outputErrors)

  // Computation below: who += learningrate * Dot(output*finalOutput*(1.0-finalOutput), hiddenOutput.Tranpose)
  who := MatrixOperation(n.Who, DotProduct(MatrixOperation(outputErrors, finalOutputs, func(a float64, b float64) float64 {
    return a * b * (1.0 - b)
  }), TransposeMatrix(hiddenOutputs)),
    func(a float64, b float64) float64 {
      return a + (n.Learningrate)*b
    })

  wih := MatrixOperation(n.Wih, DotProduct(MatrixOperation(hiddenErrors, hiddenOutputs, func(a float64, b float64) float64 {
    return a * b * (1.0 - b)
  }), TransposeMatrix(inputsTransposed)),
    func(a float64, b float64) float64 {
      return a + n.Learningrate*b
    })
  return &NeuralNetwork{Inodes: n.Inodes, Hnodes: n.Hnodes, Onodes: n.Onodes, Learningrate: n.Learningrate, TrainingStep: n.TrainingStep + 1, Wih: wih, Who: who}
}

Since we have already implemented network querying functionality within the Train function, let's quickly add a function that is dedicated to only that. In short, the function below simply returns $x''$ from above.

func (n NeuralNetwork) Query(inputs []float64) [][]float64 {
  var inputsTransposed = TransposeArray(inputs)
  var hiddenInputs = DotProduct(n.Wih, inputsTransposed)
  var hiddenOutputs = ActivationFunction(hiddenInputs)
  var finalInputs = DotProduct(n.Who, hiddenOutputs)
  var finalOutputs = ActivationFunction(finalInputs)
  return finalOutputs
}

And that's it! The functions above are sufficient to train and use feed-forward neural networks. Of course, it would be beneficial to evaluate our network based on datasets. Continue with the sections below to learn how the network can be applied on the MNIST-dataset.

Working with Data

For simplicity, we will be working with the MNIST dataset in CSV format (find it here). Each CSV row consists of 785 integer entries where the first one represents the label. The last 784 values represent a flattened greyscale image with 28 by 28 pixels. Each image depicts a hand-written digit that matches the label. If you are interested, you can convert the CSV files to PNG (276MB in total) using the code below.

import (
  "encoding/csv"
  "fmt"
  "image"
  "image/color"
  "image/png"
  "os"
  "path/filepath"
  "strconv"
)
func CsvToPng(filePath, targetDir string) error {
  file, err := os.Open(filePath)
  if err != nil {
    return err
  }
  defer file.Close()
  reader := csv.NewReader(file)
  record, err := reader.Read()
  counter := 0
  for record != nil {
    var img = image.NewGray(image.Rect(0, 0, 28, 28))
    for i, s := range record[1:] {
      value, err := strconv.Atoi(s)
      if err != nil {
        return err
      }
      img.Pix[i] = byte(value)
    }
    var label = record[0]
    imgFile, err := os.Create(filepath.Join(targetDir, fmt.Sprintf("%v-image-%v.png", label, counter)))
    counter++
    if err != nil {
      return err
    }
    err = png.Encode(imgFile, img)
    if err != nil {
      return err
    }
    record, err = reader.Read()
    if err != nil {
      return err
    }
  }
  return nil
}

Let's start by initializing a neural network and loading the training data:

import (
  "encoding/csv"
  "os"
)
func main() {
  var inputNodes = 784
  var hiddenNodes = 200
  var outputNodes = 10
  var learningRate float64 = 0.005
  var n *NeuralNetwork = NewNeuralNetwork(inputNodes, hiddenNodes, outputNodes, learningRate)

  var trainFile = "/path/to/mnist_train.csv"
  file, err := os.Open(trainFile)
  if err != nil {
    panic(err)
  }
  reader := csv.NewReader(file)
  records, _ := reader.ReadAll()
}

Next, we need to define functions that split the records object into input data and labels. Remember that each CSV row starts with the label.

In the two functions below, we iterate over all rows and obtain either the data or the label respectively. We expect pixel data to lie in the range $[0,255]$ and normalize that to $(0,1]$. The function PrepareTrainLabels does not only read the label value of each row but performs one-hot encoding. I.e., for digits from 0 to 9, a zero-valued array of length 10 is created. The index that matches the label value is then filled with the value $0.999$.

import (
  "errors"
  "strings"
  "strconv"
)
func PrepareDataset(rawData [][]string) ([][]float64, error) {
  outputData := make([][]float64, len(rawData))
  for i, _ := range rawData {
    outputData[i] = make([]float64, len(rawData[i])-1)
    for j, _ := range rawData[i] {
      if j == 0 {
        // skip the label
        continue
      } else {
        value, err := strconv.Atoi(strings.Trim(rawData[i][j], " "))
        if err != nil {
          return nil, err
        }
        outputData[i][j-1] = ((float64(value) / 255.0) * 0.999) + 0.001
      }
    }
  }
  return outputData, nil
}

func PrepareTrainLabels(rawData [][]string, labelMax int) ([][]float64, error) {
  labels := make([][]float64, len(rawData))
  numLabels := labelMax + 1
  for i, _ := range rawData {
    labels[i] = make([]float64, numLabels)
    for k := 0; k < numLabels; k++ {
      labels[i][k] = 0
    }
    value, err := strconv.Atoi(strings.Trim(rawData[i][0], " "))
    if err != nil {
      return nil, err
    }
    if value < 0 || value > labelMax {
      return nil, errors.New("Label is not in allowed space: " + strconv.Itoa(value))
    }
    labels[i][value] = 0.999
  }
  return labels, nil
}

We can then use these functions to extend the main method

...
  trainData, err := PrepareDataset(records)
  if err != nil { panic(err) }
  trainLabels, err := PrepareTrainLabels(records, 9)
  if err != nil { panic(err) }

Training and Testing

Finally, we can define a training loop. In each iteration, we train with the complete dataset (= one epoch) and shuffle it. For shuffling, we use the Fisher-Yates algorithm that is easy to implement in Go:

...
  var epochs = 30
  for epoch := 0; epoch < epochs; epoch++ {
    fmt.Printf("Epoch %v\n", epoch+1)
    // Randomly Shuffle the dataset (Fisher–Yates algorithm):
    for i := len(trainData) - 1; i > 0; i-- {
      j := rand.Intn(i + 1)
      trainData[i], trainData[j] = trainData[j], trainData[i]
      trainLabels[i], trainLabels[j] = trainLabels[j], trainLabels[i]
    }
    for i, _ := range trainData {
      n = n.Train(trainData[i], trainLabels[i])
    }
  }

The time it takes for 30 epochs to complete can be long. Check out the code on GitHub to see how you can perform validation steps in-between epochs. We will omit that implementation here and instead skip to network testing after the training is completed.

Testing is similar to training, except that we concentrate on the Query function instead of Train. By querying the network with test data, we receive for each sample an array of 10 float64 numbers. Within that array, each number represents the network's assumed probability for the corresponding label to be the correct one. By finding the index with the highest number, we can derive the network's classification decision. For this task, we add a new function to the NeuralNetwork struct. The Validate function receives the input dataset and returns both the number of correctly classified samples and the total number of assessed samples. The helper function Argmax is used to derive the classified label from the networks' output data.

The Validate function is written such that you can easily extend it to also return the raw classification results. Simply append the classifications array to the return statement.

func (n NeuralNetwork) Validate(inputs [][]float64, labels []int) (int, int) {
  var classifications []float64
  var correct int = 0
  for i, _ := range inputs {
    outputs := n.Query(inputs[i])
    if labels[i] == Argmax(outputs) {
      classifications = append(classifications, 1)
      correct++
    } else {
      classifications = append(classifications, 0)
    }
  }
  return correct, len(classifications)
}

func Argmax(values [][]float64) int {
  curMax := values[0][0]
  index := 0
  for i, _ := range values {
    localMax := values[i][0]
    for j, _ := range values[i] {
      if values[i][j] > localMax {
        localMax = values[i][j]
      }
    }
    if localMax > curMax {
      curMax = localMax
      index = i
    }
  }
  return index
}

The Validate function expects labels to be represented in pure integer values so lets add another function that reads such integer labels from CSV files:

func PrepareTestLabels(rawData [][]string, labelMax int) ([]int, error) {
  labels := make([]int, len(rawData))
  for i, _ := range rawData {
    value, err := strconv.Atoi(strings.Trim(rawData[i][0], " "))
    if err != nil {
      return nil, err
    }
    if value < 0 || value > labelMax {
      return nil, errors.New("Label is not in allowed space: " + strconv.Itoa(value))
    }
    labels[i] = value
  }
  return labels, nil
}

Now, we can add this code to the main method and obtain the accuracy of our network!

...
  var testFile = "/path/to/mnist_test.csv"
  file, err = os.Open(testFile)
  if err != nil {
    panic(err)
  }
  reader = csv.NewReader(file)
  records, _ = reader.ReadAll()
  testData, err := PrepareDataset(records)
  if err != nil { panic(err) }
  testLabels, err := PrepareTestLabels(records, 9)
  if err != nil { panic(err) }

  correct, length := n.Validate(testData, testLabels)
  accuracy := float64(correct) / float64(length)
  fmt.Println("Accuracy: ", accuracy)

Without any further hyperparameter tuning, I obtained a test accuracy of 97% after 18 epochs, but you can surely beat that!

Summary & Extensions

With just these lines of code, you now have your very own feed-forward neural network. You have learned how to deal with network layers, forward data along weight matrices, backpropagate errors, and evaluate the network with real data.

Still, there are many more aspects that you could find interesting. E.g., if you want to have functionality to store and load your network weights, head over to the GitHub repo where you also find a more advanced training loop that allows you to safely abort training via CTRL+C without losing the progress. The repo also offers a function to read 28 by 28 PNG images for training and classifying your own data.

As you may have noticed, the network has lots of room for optimization. If you are motivated, you can try to come up with your own solutions for the following topics:

Batch Training
Multiple Hidden Layers
Parallelized Matrix Operations
Improved Learning Rate