![]() In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. Misra, I., Girdhar, R., Joulin, A.: An end-to-end transformer model for 3D object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. Meyer, G.P., Laddha, A., Kee, E., Vallespi-Gonzalez, C., Wellington, C.K.: LaserNet: an efficient probabilistic 3D object detector for autonomous driving. Mao, J., et al.: Voxel transformer for 3D object detection. Liu, Z., et al.: Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE International Conference on Computer Vision, pp. Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. Law, H., Deng, J.: CornerNet: detecting objects as paired keypoints. Lang, A.H., Vora, S., Caesar, H., Zhou, L., Yang, J., Beijbom, O.: PointPillars: fast encoders for object detection from point clouds. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: Leibe, B., Matas, J., Sebe, N., Welling, M. Huang, G., Sun, Yu., Liu, Z., Sedra, D., Weinberger, K.Q.: Deep networks with stochastic depth. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2020) Guizilini, V., Ambrus, R., Pillai, S., Raventos, A., Gaidon, A.: 3D packing for self-supervised monocular depth estimation. ![]() Graham, B., van der Maaten, L.: Submanifold sparse convolutional networks. Ge, R., et al.: AFDet: anchor free one stage 3D object detection. arXiv preprint arXiv:2112.06375 (2021)įan, L., Xiong, X., Wang, F., Wang, N., Zhang, Z.: RangeDet: in defense of range view for lidar-based 3D object detection. IEEE Access 9, 134826–134840 (2021)įan, L., et al.: Embracing single stride 3D object detector with sparse transformer. arXiv preprint arXiv:2010.11929 (2020)Įngel, N., Belagiannis, V., Dietmayer, K.: Point transformer. arXiv preprint arXiv:1810.04805 (2018)ĭosovitskiy, A., et al.: An image is worth \(16 \times 16\) words: transformers for image recognition at scale. 34 (2021)ĭevlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Advances in Neural Information Processing Systems, vol. ĭai, Z., Liu, H., Le, Q., Tan, M.: CoatNet: marrying convolution and attention for all data sizes. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. 16000–16009 (2021)Ĭheng, S., et al.: Improving 3D object detection through progressive population based augmentation. Ĭhai, Y., et al.: To the point: efficient 3D object detection in the range image with graph convolution kernels. In: Conference on Robot Learning (2020)Ĭarion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. 3286–3295 (2019)īewley, A., Sun, P., Mensink, T., Anguelov, D., Sminchisescu, C.: Range conditioned dilated convolutions for scale invariant 3D object detection. Experimental results on the Waymo Open Dataset show our SWFormer achieves state-of-the-art 73.36 L2 mAPH on vehicle and pedestrian for 3D object detection on the official test set, outperforming all previous single-stage and two-stage models, while being much more efficient.īello, I., Zoph, B., Vaswani, A., Shlens, J., Le, Q.V.: Attention augmented convolutional networks. ![]() ![]() To further address the unique challenge of detecting 3D objects accurately from sparse features, we propose a new voxel diffusion technique. In addition to self-attention within each spatial window, our SWFormer also captures cross-window correlation with multi-scale feature fusion and window shifting operations. Built upon the idea of window-based Transformers, SWFormer converts 3D points into sparse voxels and windows, and then processes these variable-length sparse windows efficiently using a bucketing scheme. In this paper, we propose Sparse Window Transformer ( SWFormer), a scalable and accurate model for 3D object detection, which can take full advantage of the sparsity of point clouds. ![]() A key challenge in 3D object detection comes from the inherent sparse nature of point occupancy within the 3D scene. 3D object detection in point clouds is a core component for modern robotics and autonomous driving systems. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |