Training and Loss Function
First, we take each training region of interest labeled with ground truth class u and ground truth bounding box v. Then we take the output generated by the softmax classifier and bounding box regressor and apply the loss function to them. We defined our loss function such that it takes into account both the classification and bounding box localization. This loss function is called multi-task loss. This is defined as follows:
where Lcls is classification loss, and Lloc is localization loss. lambda is a balancing parameter and u is a function (the value of u=0 for background, otherwise u=1) to make sure that loss is only calculated when we need to define the bounding box. Here, Lcls is the log loss and Lloc is defined as
Fast R-CNN | ML
Before discussing Fast R-CNN, let’s look at the challenges faced by R-CNN.
- The training of R-CNN is very slow because each part of the model such as (CNN, SVM classifier, and bounding box) requires training separately and cannot be paralleled.
- Also, in R-CNN we need to forward and pass every region proposal through the Deep Convolution architecture (that’s up to ~2000 region proposals per image). That explains the amount of time taken to train this model
- The testing time of inference is also very high. It takes 49 seconds to test an image in R-CNN (along with selective search region proposal generation).
Fast R-CNN works to solve these problems. Let’s look at the architecture of Fast R-CNN.
First, we generate the region proposal from a selective search algorithm. This selective search algorithm generates up to approximately 2000 region proposals. These region proposals (RoI projections) combine with input images passed into a CNN network. This CNN network generates the convolution feature map as output. Then for each object proposal, a Region of Interest (RoI) pooling layer extracts the feature vector of fixed length for each feature map. Every feature vector is then passed into twin layers of softmax classifier and Bbox regression for classification of region proposal and improve the position of the bounding box of that object.