U-Net
U-Net is a CNN architecture used for most of the segmentation tasks. It consists of a contraction and expansion path which gives it the name UNet. The contraction path consists of a convolution layer, followed by ReLu followed by max-pooling layers. Along the contraction path, the features get extracted and the spatial information is reduced. Along the expansion path, a series of up convolutions along with concatenation of nigh resolution features are done from the contraction path.
For this project, we will use the encoder of the VGG16 model as it is already trained on the ImageNet dataset and has learned some features. If the original UNet encoder is used it will learn everything from scratch and will take more time.
Python3
base_model = keras.applications.vgg16.VGG16( include_top = False , input_shape = (width, height, 3 )) layer_names = [ 'block1_pool' , 'block2_pool' , 'block3_pool' , 'block4_pool' , 'block5_pool' , ] base_model_outputs = [base_model.get_layer( name).output for name in layer_names] base_model.trainable = False VGG_16 = tf.keras.models.Model(base_model. input , base_model_outputs) |
Now define the decoder
Python3
def fcn8_decoder(convs, n_classes): f1, f2, f3, f4, p5 = convs n = 4096 c6 = tf.keras.layers.Conv2D( n, ( 7 , 7 ), activation = 'relu' , padding = 'same' , name = "conv6" )(p5) c7 = tf.keras.layers.Conv2D( n, ( 1 , 1 ), activation = 'relu' , padding = 'same' , name = "conv7" )(c6) f5 = c7 # upsample the output of the encoder # then crop extra pixels that were introduced o = tf.keras.layers.Conv2DTranspose(n_classes, kernel_size = ( 4 , 4 ), strides = ( 2 , 2 ), use_bias = False )(f5) o = tf.keras.layers.Cropping2D(cropping = ( 1 , 1 ))(o) # load the pool 4 prediction and do a 1x1 # convolution to reshape it to the same shape of `o` above o2 = f4 o2 = (tf.keras.layers.Conv2D(n_classes, ( 1 , 1 ), activation = 'relu' , padding = 'same' ))(o2) # add the results of the upsampling and pool 4 prediction o = tf.keras.layers.Add()([o, o2]) # upsample the resulting tensor of the operation you just did o = (tf.keras.layers.Conv2DTranspose( n_classes, kernel_size = ( 4 , 4 ), strides = ( 2 , 2 ), use_bias = False ))(o) o = tf.keras.layers.Cropping2D(cropping = ( 1 , 1 ))(o) # load the pool 3 prediction and do a 1x1 # convolution to reshape it to the same shape of `o` above o2 = f3 o2 = (tf.keras.layers.Conv2D(n_classes, ( 1 , 1 ), activation = 'relu' , padding = 'same' ))(o2) # add the results of the upsampling and pool 3 prediction o = tf.keras.layers.Add()([o, o2]) # upsample up to the size of the original image o = tf.keras.layers.Conv2DTranspose( n_classes, kernel_size = ( 8 , 8 ), strides = ( 8 , 8 ), use_bias = False )(o) # append a softmax to get the class probabilities o = tf.keras.layers.Activation( 'softmax' )(o) return o |
Combining everything and creating a final model and compiling it
Python3
def segmentation_model(): inputs = keras.layers. Input (shape = (width, height, 3 )) convs = VGG_16(inputs) outputs = fcn8_decoder(convs, 3 ) model = tf.keras.Model(inputs = inputs, outputs = outputs) return model opt = keras.optimizers.Adam() model = segmentation_model() model. compile (optimizer = opt, loss = tf.keras.losses.SparseCategoricalCrossentropy( from_logits = True ), metrics = [ 'accuracy' ]) |
Image Segmentation Using TensorFlow
Image segmentation refers to the task of annotating a single class to different groups of pixels. While the input is an image, the output is a mask that draws the region of the shape in that image. Image segmentation has wide applications in domains such as medical image analysis, self-driving cars, satellite image analysis, etc. There are different types of image segmentation techniques like semantic segmentation, instance segmentation, etc. To summarize the key goal of image segmentation is to recognize and understand what’s in an image at the pixel level.
For the image segmentation task, we will use “The Oxford-IIIT Pet Dataset” which is free to use dataset. They have 37 category pet dataset with roughly 200 images for each class. The images have large variations in scale, pose and lighting. All images have an associated ground truth annotation of breed, head ROI, and pixel-level trimap segmentation. Each pixel is classified into one of the three categories:
- Pixel belonging to the pet
- Pixel bordering the pet
- Pixel belongs neither in class 1 nor in class 2