Try to get a fast (what I mean is detecting in lesss than 1 second on mainstream CPU) object-detection tool from Github, I experiment with some repositories written by PyTorch (because I am familiar with it). Below are some conclusions:
1. detectron2
This the official tool from Facebook Corporation. I download and installed it successfully. The test python code is:
import detectron2 from detectron2.utils.logger import setup_logger from detectron2.config import get_cfg from detectron2.engine import DefaultPredictor from detectron2 import model_zoo setup_logger() # import some common libraries import numpy as np import cv2 import sys import time cfg = get_cfg() # add project-specific config (e.g., TensorMask) here if you're not running a model in detectron2's core library cfg.merge_from_file(model_zoo.get_config_file("COCO-Detection/faster_rcnn_R_50_FPN_3x.yaml")) cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.5 # set threshold for this model # Find a model from detectron2's model zoo. You can use the https://dl.fbaipublicfiles... url as well cfg.MODEL.WEIGHTS = model_zoo.get_checkpoint_url("COCO-Detection/faster_rcnn_R_50_FPN_3x.yaml") cfg.MODEL.DEVICE = "cpu" predictor = DefaultPredictor(cfg) img = cv2.imread(sys.argv[1]) begin = time.time() outputs = predictor(img) print("time:", time.time() - begin) print(outputs)
Although can’t recognize all birds in below image, it will cost more than 5 seconds on CPU (my MackbookPro). Performance is not as good as my expectation.
2. efficientdet
From the paper, the EfficientDet should be fast and accurate. But after I wrote a test program, it totally couldn’t recognize the object at all. Then I gave up this solution.
3. EfficientDet.Pytorch
Couldn’t download models from it’s model_zoo.
4. ssd.pytorch
Finally, I came to my sweet ssd(Single Shot Detection). Since have studied it for more than half a year, I wrote below snippet quickly:
def base_transform(image, size, mean): x = cv2.resize(image, (size, size)).astype(np.float32) x -= mean x = x.astype(np.float32) return x class BaseTransform: def __init__(self, size, mean): self.size = size self.mean = np.array(mean, dtype=np.float32) def __call__(self, image, boxes=None, labels=None): return base_transform(image, self.size, self.mean), boxes, labels def detect(img, net, transform): FONT = cv2.FONT_HERSHEY_SIMPLEX COLORS = [(255, 0, 0), (0, 255, 0), (0, 0, 255)] height, width = img.shape[:2] x = torch.from_numpy(transform(img)[0]).permute(2, 0, 1) x = Variable(x.unsqueeze(0)) y = net(x) # forward pass detections = y.data[0] # scale each detection back up to the image scale = torch.Tensor([width, height, width, height]) for index, loc in enumerate(detections[3]): score = loc.numpy()[0] if score >= 0.5: loc = loc[1:] pt = loc * scale print(score, pt) cv2.rectangle( img, (int(pt[0]), int(pt[1])), (int(pt[2]), int(pt[3])), COLORS[index % 3], 2, ) cv2.putText( img, str(score), (int(pt[0]), int(pt[1])), FONT, 1, (255, 255, 255), 1, cv2.LINE_AA, ) return img img = cv2.imread("bird_matrix.jpg") net = build_ssd("test", 300, 21) # initialize SSD net.load_state_dict(torch.load("ssd300_mAP_77.43_v2.pth", map_location="cpu")) transform = BaseTransform(net.size, (104 / 256.0, 117 / 256.0, 123 / 256.0)) img = detect(img, net, transform) cv2.imwrite("result.jpg", img)
The result is not perfect but good enough for my current situation.