Object Detection using TensorFlow
Setting Up TensorFlow
Begin by installing TensorFlow using pip:
!pip install tensorflow
Ensure that you have the necessary dependencies, and if you have a compatible GPU, consider installing TensorFlow with GPU support for faster training.
Choosing a Pre-trained Model
TensorFlow provides pre-trained models on large datasets like COCO (Common Objects in Context). These models serve as a starting point for transfer learning. Common models include Faster R-CNN, SSD (Single Shot Multibox Detector), and YOLO (You Only Look Once). For this tutorial we will be using the ssd_mobilenet_v2_fpnlite_640x640_coco17_tpu-8 model.
Understanding the ssd_mobilenet_v2_fpnlite_640x640_coco17_tpu-8 Model
- SSD (Single Shot Multibox Detector): SSD is a popular object detection algorithm known for its speed and accuracy. It’s designed to detect objects of different scales and aspect ratios in a single pass.
- MobileNetV2: MobileNetV2 is a lightweight neural network architecture optimized for mobile and edge devices. It strikes a balance between efficiency and performance, making it ideal for real-time applications.
- 640×640: This denotes the input image size the model expects. Larger input sizes often yield more accurate results but require more computational resources. These models are also smaller in size than models trained on larger images like 1024×1024. Also the inference time is shorter.
- Example: centernet_hg104_1024x1024_coco17_tpu-32 is a model of 1.33 GBs
- while ssd_mobilenet_v2_fpnlite_640x640_coco17_tpu-8 stands at 19 MBs
- and efficientdet_d1_coco17_tpu-32 is of 50 MB (for 640×640 images)
- the inference time for all three in Google Colab is around 42s, 0s and 4s. You can clearly see how size affects the inference time of the models
- COCO (Common Objects in Context) Dataset: The COCO dataset is a large-scale dataset for object detection, segmentation, and captioning. It encompasses a diverse range of object categories and is widely used for training and evaluating computer vision models.
- TPU-8 (Tensor Processing Unit – 8): TensorFlow’s TPUs are custom hardware accelerators designed for machine learning workloads. The “8” refers to the number of cores, indicating enhanced parallel processing capabilities.
Now that we have everything needed, let’s begin with the code:
Step 1: Import Libraries
First let’s import the necessary libraries for TensorFlow, NumPy, OpenCV, Pillow, and Matplotlib.
Python3
import tensorflow as tf import numpy as np import cv2 from PIL import Image from matplotlib import pyplot as plt from random import randint |
Step 2: Download, Extract and Load the Pre-trained Model
Now, load the pre-trained model using TensorFlow’s SavedModel format.
Python3
!wget http: / / download.tensorflow.org / models / object_detection / tf2 / 20200711 / ssd_mobilenet_v2_fpnlite_640x640_coco17_tpu - 8.tar .gz !tar - xzvf ssd_mobilenet_v2_fpnlite_640x640_coco17_tpu - 8.tar .gz model = tf.saved_model.load( "ssd_mobilenet_v2_fpnlite_640x640_coco17_tpu-8/saved_model" ) |
Step 3: Load and Preprocess Image
In this step, load an image, convert it to a NumPy array, and preprocess it for input to the model, as the model can’t directly work on an image therefore we first converted it into a tensor.
Python3
image = Image. open ( "detect.jpg" ) image_np = np.array(image) input_tensor = tf.convert_to_tensor(np.expand_dims(image_np, 0 ), dtype = tf.uint8) image |
Output:
Step 5: Perform Object Detection
Here we use the loaded model to perform object detection on the input image and extract bounding box coordinates, class IDs, and scores.
Python3
detection = model(input_tensor) # Parse the detection results boxes = detection[ 'detection_boxes' ].numpy() classes = detection[ 'detection_classes' ].numpy().astype( int ) scores = detection[ 'detection_scores' ].numpy() |
Step 6: Add the COCO Labels
These are the labels for the COCO dataset, which contains class names corresponding to class IDs.
The Model only gives us the integer values of classes that it was trained on i.e. the COCO dataset, to translate those integer values into meaningful class names we need these labels.
Python3
labels = [ '__background__' , 'person' , 'bicycle' , 'car' , 'motorcycle' , 'airplane' , 'bus' , 'train' , 'truck' , 'boat' , 'traffic light' , 'fire hydrant' , 'stop sign' , 'parking meter' , 'bench' , 'bird' , 'cat' , 'dog' , 'horse' , 'sheep' , 'cow' , 'elephant' , 'bear' , 'zebra' , 'giraffe' , 'backpack' , 'umbrella' , 'handbag' , 'tie' , 'suitcase' , 'frisbee' , 'skis' , 'snowboard' , 'sports ball' , 'kite' , 'baseball bat' , 'baseball glove' , 'skateboard' , 'surfboard' , 'tennis racket' , 'bottle' , 'wine glass' , 'cup' , 'fork' , 'knife' , 'spoon' , 'bowl' , 'banana' , 'apple' , 'sandwich' , 'orange' , 'broccoli' , 'carrot' , 'hot dog' , 'pizza' , 'donut' , 'cake' , 'chair' , 'couch' , 'potted plant' , 'bed' , 'dining table' , 'toilet' , 'tv' , 'laptop' , 'mouse' , 'remote' , 'keyboard' , 'cell phone' , 'microwave' , 'oven' , 'toaster' , 'sink' , 'refrigerator' , 'book' , 'clock' , 'vase' , 'scissors' , 'teddy bear' , 'hair drier' , 'toothbrush' ] |
Before going further let’s learn about some concepts:
- Confidence
- Confidence in object detection represents how certain the model is about its predictions. It’s like a measure of how sure the model is that it correctly identified an object in an image. Confidence values range from 0 to 1, where 1 means the model is very confident in its prediction.
- Normalized Coordinates
- Normalized coordinates are a way to describe the location of an object in an image in a standardized manner. Instead of using pixel values, which can vary based on image size, normalization scales coordinates to a consistent range, usually between 0 and 1.
Let’s understand with an Analogy
Think of a treasure map. Instead of saying “walk 50 steps north,” which depends on the map’s size, you say “walk halfway up the map.” Normalized coordinates provide a universal language for pinpointing locations.
Step 7: Visualize the detected objects
Now let’s look at the code
We iterate through the detected objects, filter out low-confidence detections, convert coordinates, get class names, and visualize the result with randomly colored boxes. Adjust the confidence threshold (0.5 in this case) and other parameters as needed.
Python3
for i in range (classes.shape[ 1 ]): class_id = int (classes[ 0 , i]) score = scores[ 0 , i] if np. any (score > 0.5 ): # Filter out low-confidence detections h, w, _ = image_np.shape ymin, xmin, ymax, xmax = boxes[ 0 , i] # Convert normalized coordinates to image coordinates xmin = int (xmin * w) xmax = int (xmax * w) ymin = int (ymin * h) ymax = int (ymax * h) # Get the class name from the labels list class_name = labels[class_id] random_color = (randint( 0 , 256 ), randint( 0 , 256 ), randint( 0 , 256 )) # Draw bounding box and label on the image cv2.rectangle(image_np, (xmin, ymin), (xmax, ymax), random_color, 2 ) label = f "Class: {class_name}, Score: {score:.2f}" cv2.putText(image_np, label, (xmin, ymin - 10 ), cv2.FONT_HERSHEY_SIMPLEX, 0.5 , random_color, 2 ) # Display the result plt.imshow(image_np) plt.axis( 'off' ) plt.show() |
Output:
Applications of object detection:
Object detection finds applications in diverse fields, including:
- Autonomous Vehicles: Identifying pedestrians, other vehicles, and obstacles.
- Surveillance Systems: Monitoring and tracking objects in real-time.
- Medical Imaging: Detecting anomalies or specific structures in medical images.
- Retail Analytics: Tracking products and customer behavior in stores.
- Augmented Reality: Overlapping digital information on real-world objects.
- Implementing Object Detection using TensorFlow
Conclusion
Object detection with models like these opens doors to a myriad of applications. From autonomous vehicles and surveillance systems to retail analytics and augmented reality, the impact is profound. As technology advances, we can anticipate further developments in model architectures, dataset diversity, and real-time deployment, ushering in a new era of intelligent visual perception.
Object Detection using TensorFlow
Identifying and detecting objects within images or videos is a key task in computer vision. It is critical in a variety of applications, ranging from autonomous vehicles and surveillance systems to augmented reality and medical imaging. TensorFlow, a Google open-source machine learning framework, provides a robust collection of tools for developing and deploying object detection models.
In this article, we will go over the fundamentals of using TensorFlow for object identification. TensorFlow provides a flexible and efficient framework to match your demands, whether you’re working on a computer vision research project or designing apps that require real-time object identification capabilities. Let’s get into the specifics of utilizing TensorFlow to develop object detection and realize the full potential of this cutting-edge technology.