Quantcast
Viewing all articles
Browse latest Browse all 236

Grab a hands-on realtime-object-detection tool

Try to get a fast (what I mean is detecting in lesss than 1 second on mainstream CPU) object-detection tool from Github, I experiment with some repositories written by PyTorch (because I am familiar with it). Below are some conclusions:
1. detectron2
This the official tool from Facebook Corporation. I download and installed it successfully. The test python code is:

import detectron2
from detectron2.utils.logger import setup_logger
from detectron2.config import get_cfg
from detectron2.engine import DefaultPredictor
from detectron2 import model_zoo
setup_logger()

# import some common libraries
import numpy as np
import cv2
import sys
import time

cfg = get_cfg()
# add project-specific config (e.g., TensorMask) here if you're not running a model in detectron2's core library
cfg.merge_from_file(model_zoo.get_config_file("COCO-Detection/faster_rcnn_R_50_FPN_3x.yaml"))
cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.5  # set threshold for this model
# Find a model from detectron2's model zoo. You can use the https://dl.fbaipublicfiles... url as well
cfg.MODEL.WEIGHTS = model_zoo.get_checkpoint_url("COCO-Detection/faster_rcnn_R_50_FPN_3x.yaml")
cfg.MODEL.DEVICE = "cpu"
predictor = DefaultPredictor(cfg)

img = cv2.imread(sys.argv[1])

begin = time.time()
outputs = predictor(img)
print("time:", time.time() - begin)
print(outputs)

Although can’t recognize all birds in below image, it will cost more than 5 seconds on CPU (my MackbookPro). Performance is not as good as my expectation.
Image may be NSFW.
Clik here to view.

2. efficientdet
From the paper, the EfficientDet should be fast and accurate. But after I wrote a test program, it totally couldn’t recognize the object at all. Then I gave up this solution.

3. EfficientDet.Pytorch
Couldn’t download models from it’s model_zoo.

4. ssd.pytorch
Finally, I came to my sweet ssd(Single Shot Detection). Since have studied it for more than half a year, I wrote below snippet quickly:

def base_transform(image, size, mean):
    x = cv2.resize(image, (size, size)).astype(np.float32)
    x -= mean
    x = x.astype(np.float32)
    return x


class BaseTransform:
    def __init__(self, size, mean):
        self.size = size
        self.mean = np.array(mean, dtype=np.float32)

    def __call__(self, image, boxes=None, labels=None):
        return base_transform(image, self.size, self.mean), boxes, labels


def detect(img, net, transform):
    FONT = cv2.FONT_HERSHEY_SIMPLEX
    COLORS = [(255, 0, 0), (0, 255, 0), (0, 0, 255)]

    height, width = img.shape[:2]
    x = torch.from_numpy(transform(img)[0]).permute(2, 0, 1)
    x = Variable(x.unsqueeze(0))
    y = net(x)  # forward pass
    detections = y.data[0]
    # scale each detection back up to the image
    scale = torch.Tensor([width, height, width, height])
    for index, loc in enumerate(detections[3]):
        score = loc.numpy()[0]
        if score >= 0.5:
            loc = loc[1:]
            pt = loc * scale
            print(score, pt)
            cv2.rectangle(
                img,
                (int(pt[0]), int(pt[1])),
                (int(pt[2]), int(pt[3])),
                COLORS[index % 3],
                2,
            )
            cv2.putText(
                img,
                str(score),
                (int(pt[0]), int(pt[1])),
                FONT,
                1,
                (255, 255, 255),
                1,
                cv2.LINE_AA,
            )
    return img


img = cv2.imread("bird_matrix.jpg")
net = build_ssd("test", 300, 21)  # initialize SSD
net.load_state_dict(torch.load("ssd300_mAP_77.43_v2.pth", map_location="cpu"))
transform = BaseTransform(net.size, (104 / 256.0, 117 / 256.0, 123 / 256.0))

img = detect(img, net, transform)

cv2.imwrite("result.jpg", img)

The result is not perfect but good enough for my current situation.
Image may be NSFW.
Clik here to view.


Viewing all articles
Browse latest Browse all 236

Trending Articles