Transpose a CNN (Tensoflow)

Overview

This is a quick example of how to transpose a conv layer and as well how to use duse a dense layer, tf.layers.dense as a convolutional layer tf.layers.conv2d.

Transposed layers

Transposed Convolutions worked as backward strided convolution to help in upsampling the previous layer to a higher resolution or dimension. Upsampling is a classic signal processing technique which is often accompanied by interpolation. The term transpose mean transfer to a different place or context. We can use a transposed convolution to transfer patches of data onto a sparse matrix, then we can fill the sparse area of the matrix based on the transferred information. Helpful animations of convolutional operations, including transposed convolutions, can be found here.

As an example, suppose you have a 3x3 input and you wish to upsample that to the desired dimension of 6x6. The process involves multiplying each pixel of your input with a kernel or filter. If this filter was of size 5x5, the output of this operation will be a weighted kernel of size 5x5. This weighted kernel then defines your output layer. However, the upsampling part of the process is defined by the strides and the padding. In TensorFlow, using the tf.layers.conv2d_transpose, a stride of 2, and “SAME” padding would result in an output of dimensions 6x6. Let’s look at a simple representation of this. If we have a 2x2 input and a 3x3 kernel; with “SAME” padding, and a stride of 2 we can expect an output of dimension 4x4.

The following code example gives an idea of the process.

    import numpy as np
    import tensorflow as tf
    
    num_outputs = 2
    x = tf.placeholder(tf.float32, [1, 4, 4, 1])
    
    
    # custom init with the seed set to 0 by default
    def custom_init(shape, dtype=tf.float32, partition_info=None, seed=0):
        return tf.random_normal(shape, dtype=dtype, seed=seed)
    
    
    np.random.seed(1)
    random_array = np.random.rand(1, 4, 4, 1)
    print(random_array)
    
    dense_out = tf.layers.dense(x, num_outputs, kernel_initializer=custom_init)
    conv_out = tf.layers.conv2d(x, num_outputs, kernel_size=1, strides=1, kernel_initializer=custom_init)
    
    relu = tf.nn.relu(conv_out)
    fc0 = tf.contrib.layers.flatten(relu)
    
    # up-sample
    upsample_conv = tf.layers.conv2d_transpose(conv_out, 1, (1, 1), (1, 1), kernel_initializer=custom_init)
    upsample_dense = tf.layers.conv2d_transpose(dense_out, 1, (1, 1), (1, 1), kernel_initializer=custom_init)
    
    with tf.Session() as sess:
        sess.run(tf.global_variables_initializer())
    
        a = sess.run(dense_out, feed_dict={x: random_array})
        b = sess.run(conv_out, feed_dict={x: random_array})
        c = sess.run(relu, feed_dict={x: random_array})
        d = sess.run(fc0, feed_dict={x: random_array})
    
        print("Dense Output =", a)
        print("Conv 1x1 Output =", b)
        print("Same output? =", np.allclose(a, b, atol=1.e-5))
    
        result_conv = sess.run(upsample_conv, feed_dict={x: random_array})
        print('transpose of a conv layer: {} \n {}'.format(result_conv.shape, result_conv))
    
        result_dense = sess.run(upsample_dense, feed_dict={x: random_array})
        print('transpose of a dense layer: {} \n {}'.format(result_dense.shape, result_dense))
        print("Same output? =", np.allclose(result_conv, result_dense, atol=1.e-5))

results

np.random.rand(1, 4, 4, 1)	Transpose of a dense layer	Transpose of a conv layer
[[[[4.17022005e-01]	[[[[1.9132932e+00]	[[[[1.9132932e+00]
[7.20324493e-01]	[3.3048427e+00]	[3.3048427e+00]
[1.14374817e-04]	[5.2475062e-04]	[5.2475062e-04]
[3.02332573e-01]]	[1.3870993e+00]]	[1.3870993e+00]]

[[1.46755891e-01]	[[6.7331469e-01]	[[6.7331469e-01]
[9.23385948e-02]	[4.2364866e-01]	[4.2364866e-01]
[1.86260211e-01]	[8.5456026e-01]	[8.5456026e-01]
[3.45560727e-01]]	[1.5854297e+00]]	[1.5854297e+00]]

[[3.96767474e-01]	[[1.8203658e+00]	[[1.8203658e+00]
[5.38816734e-01]	[2.4720864e+00]	[2.4720864e+00]
[4.19194514e-01]	[1.9232609e+00]	[1.9232609e+00]
[6.85219500e-01]]	[3.1437812e+00]]	[3.1437812e+00]]

[[2.04452250e-01]	[[9.3802512e-01]	[[9.3802512e-01]
[8.78117436e-01]	[4.0287952e+00]	[4.0287952e+00]
[2.73875932e-02]	[1.2565404e-01]	[1.2565404e-01]
[6.70467510e-01]]]]	[3.0760992e+00]]]]	[3.0760992e+00]]]]

import tensorflow as tf
import numpy as np

x1 = tf.ones(shape=[64, 7, 7, 256])
y1 = tf.layers.conv2d_transpose(x1, 128, 3, strides=2, padding='SAME')

w = tf.ones([3, 3, 128, 256])
y2 = tf.nn.conv2d_transpose(x1, w, output_shape=[64, 14, 14, 128], strides=[1, 2, 2, 1], padding='SAME')

x2 = tf.nn.conv2d(y2, w, strides=[1, 2, 2, 1], padding='SAME')

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    y1_value, y2_value, x2_value=sess.run([y1, y2, x2])
    print('downsampleg example')
    print(y1_value.shape)
    print(y2_value.shape)
    print(x2_value.shape)

tf.reset_default_graph()
image = tf.ones(shape=[64, 14, 14, 128])
w = tf.ones([3, 3, 128, 256])

x = tf.nn.conv2d(image, w, strides=[1, 3, 3, 1], padding='VALID')

y1 = tf.layers.conv2d_transpose(x, 128, kernel_size=3, strides=3, padding='VALID')
y2 = tf.nn.conv2d_transpose(x, w, output_shape=[64, 14, 14, 128], strides=[1, 3, 3, 1], padding='VALID')

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    x_value, y1_value, y2_value = sess.run([x, y1, y2])
    print('upsample example')
    print(x_value.shape)
    print(y1_value.shape)
    print(y2_value.shape)

downsample example
(64, 14, 14, 128)
(64, 14, 14, 128)
(64, 7, 7, 256)
upsample example
(64, 4, 4, 256)
(64, 12, 12, 128)
(64, 14, 14, 128)

As you see above, the underlying math will be the same for dense layer and conv layer, but the spatial information will be preserved allowing seamless use of future convolutional layers.

`tf.layers.conv2d_transpose(conv_out, 1, (1, 1), (1, 1), kernel_initializer=custom_init)`

The second argument 1 is the number of kernels/output channels.
The third argument is the kernel size, (1, 1). Note that the kernel size could also be (1, 1) and the output shape would be the same. However, if it were changed to (3, 3) note the shape would be (9, 9), at least with ‘VALID’ padding.
The fourth argument, the number of strides, is how we get from a height and width from (4, 4) to (8, 8). If this were a regular convolution the output height and width would be (2, 2).`

Would you like to see a simple example on how to use transpose layers for segmentation. Take a look at my road segmentation project

Reference

Concepts from this blog come from Udacity self-driving CarND program

Manuel Cuevas

Hello, I'm Manuel Cuevas a Software Engineer with background in machine learning and artificial intelligence.