Visualizing how a deep neural network learns

Last weekend I was working through some Tensor Flow tutorials and thought that it would be nice to be able to visualize the weight change over time in a multi layer neural network. For this I will be using the example of a pizza delivery company. Unfortunately I could not find any pizza delivery data set online so I decided to generate my own fake data instead (the aim here is to show the change in the network, not to solve a realistic problem, so the data I use is largely irrelevant).

I wrote a small python module to generate a pandas dataframe containing n data points with the following features:

  • distance to customer (in kilometers, value from 0 >= x <= 20)
  • order size/complexity (0 >= x <= 1)
  • delivery driver vehicle type (categorical: none,car,scooter,bicycle)
  • time of day (0 <= x < 24)

Here is a correlation matrix plot of the generated data, not great, but it will do (worlds worst delivery service with many deliveries taking more than 200 min !).

After importing the data and doing the preprocessing I started building the model in TensorFlow. After a bit of trial and error I ended up with a model that worked relatively well. The network has three hidden layers with 16 neurons, another 16 neurons and 8 neurons respectively. Three layers are probably not needed for obtaining good predictions, but my goal here is getting a nice visual not building a good model. For visualizing the network I used code from Oli Blum’s answer to this stack overflow question and modified it so that I could change the way the weights of the neurons are drawn. The magnitude of each weight can be seen by the thickness of the line representing it in the plot. Initially all weights are set to random values.

You might have noticed that there are 5 inputs in the input layer even though the data has only got 4 features. This is because I am using a 2 dimensional embedding feature column for the ‘vehicle type’ feature which is categorical ( read about TensorFlow feature columns here ). The model itself is very basic, I am using the TF DNNRegressor estimator with a dropout of 10% and the elu activation function.

I wanted this example to be usable as a live demo, so I need to train it for a few steps, extract weights and plot the network again. Further I would also like to show how the average error over the test set decreases over time. I want the gradual decrease of the error to be visible, since the error will decrease sharply initially I trained the network 5 steps at a time with a batch size of 5. Over time as the improvement in the model got more gradual I increase the step and batch size (up to a step size of 2000 and a batch size of 128 in order to speed up the training and get a less noisy gradient. I know I know, “Friends don’t let friends use minibatches larger than 32” , but this is just for the sake of the live demo).

Looks nice so far, but unfortunately the weight change is hard to see initially since it is very gradual when only training for a few steps at a time. To make this clearer I decided to show the change in weight between two plot updates by coloring the weight vectors depending on the magnitude of the change (biggest change = RED, smallest change = BLUE, and everything else somewhere in between). I implemented this with a different (also fake) dataset. This dataset is about the percentage of people wearing colorful clothes outside depending on weather type (sun, rain etc.), time of day, temperature and day of the week.

The result makes it a lot easier to observe how the network changes. If you want to play with the code you can find it on my github. Please keep in mind that my focus here was the visual stuff, so the code is by no means good for actually training a model to do something useful. I have only recently started looking at TensorFLow, and this is my first blog post, so any constructive criticism or suggestions are greatly appreciated 🙂 .

Leave a Reply

Your email address will not be published. Required fields are marked *