In this project page, we mainly introduce CDbin.



Description to CDbin

Abstract




As an important computer vision task, image matching requires efficient and discriminative local descriptors. Most of existing descriptors like SIFT and ORB are hand-crafted. It is necessary to study more optimized descriptors through end-to-end learning. This paper proposes compact binary descriptors learned with a lightweight Convolutional Neural Network (CNN), which is efficient for training and testing. Specifically, we propose a CNN with no larger than five layers for descriptor learning. The resulting descriptors, i.e., Compact Discriminative binary descriptors (CDbin) are optimized with four complementary loss functions, i.e.,

1) triplet loss to ensure the discriminative power,
2) quantization loss to decrease the quantization error,
3) correlation loss to ensure the feature compactness,
4) even-distribution loss to enrich the embedded information.
Extensive experiments on two image patch datasets and three image retrieval datasets show that CDbin exhibits competitive performance compared with existing descriptors. For example, 64-bit CDbin substantially outperforms 256-bit ORB and 1024-bit SIFT on Hpatches dataset. Although generated by a shallow CNN, CDbin also outperforms several recent deep descriptors.




Fig. 2. mAP of CDbin descriptors generated by different networks in retrieval tasks defined by HPatches. Network architecture CDbin(x-k) represents a k-bit CDbin descriptor extracted by a x-layer convolutional network. The network forward time, i.e., average time for a descriptor extraction is also compared.




Fig. 3. mAP of 256-bit CDbin on the retrieval task of HPatches dataset by setting different loss weights.




Fig. 5. mAP of CDbin(4-256) learned with different settings of training batch size on retrieval tasks defined by Hpatches. The solid and dashed lines represent real-valued and binary descriptors, respectively.




Fig. 6. Examples of patch retrieval results on HPatches dataset. The results of SIFT, ORB, and CDbin(5-256) are compared. In each example, the top 5 most similar patches to the query are presented, where the true positive and false positive are annotated with green and red dots, respectively.




Fig. 7. Examples of patch verification results on HPatches dataset. Each example shows the top 4 false matched pairs and their positions in ranking list. Larger ranking positions means the descriptor is more discriminative in identifying false matches.




Fig. 8. Sample image matching results on HPatches dataset. CDbin(5-256) and SIFT use MSER as keypoint detector. ORB use Harris detector.




Fig. 9. ROC curves of CDbin(5-256) and other binary descriptors on Brown. We use ND, YOS, LIB to denote Notredame, Yosemite, and Liberty respectively. Best viewed in color.




Comparison of mAP with other descriptors on three tasks defined by Hpatches. † denotes using deeper CNNs than ours. SP denotes supervised methods, USP denotes unsupervised methods and HC denotes hand-crafted methods.




Comparison of fpr95 with other descriptors on Brown dataset. In each column, dataset name with underline denotes the training set, the other one denotes the testing set. And test of Liberty(LIB), Notredame(ND), Yosemite(YOS) denotes supervised methods, all the combinations of the trian are shown. † denotes using deeper CNNs than ours. SP denotes supervised methods, USP denotes unsupervised methods and HC denotes hand-crafted methods.