无人系统技术

2025, 06, v.8 122-132

基于改进的YOLO和DeepSORT的多目标定位与跟踪算法

礼冬雪¹ 张宁¹ 王宏宇² 袁春阳¹ 徐长振¹

1.北京计算机技术及应用研究所 2.北京石油化工学院

基金项目(Foundation): 国家自然科学基金（62206183）

邮箱(Email):

DOI: 10.19942/j.issn.2096-5915.2025.06.60

投稿时间： 2025-08-25

投稿日期（年）： 2025

终审时间： 2025-12-27

终审日期（年）： 2025

审稿周期（年）： 1

发布时间： 2025-12-15

出版时间： 2025-12-15

移动端阅读

693	1	1277
下载次数	被引频次	阅读次数

引用本文下载本文

PDF

引用导出

GB/T 7714-2015 MLA APA Refworks EndNote NoteExpress NoteFirst

摘要全文参考文献出版信息相关文章

摘要：

针对传统算法在复杂场景下多目标检测与跟踪精度不足的问题，开展基于改进YOLO目标检测算法和深度简单在线实时跟踪（DeepSORT）算法的研究。首先，面向复杂场景多目标检测需求，对YOLO算法进行双重优化：在特征提取网络中集成改进卷积注意力模块（CBAM），通过扩大空间注意力分支卷积核增强小目标特征响应能力；采用交并比（IoU）替代欧氏距离的K-means聚类算法重构先验框，提升不同尺度目标的先验框匹配度，使平均IoU较传统聚类提升。随后，改进DeepSORT算法的观测关联策略，构建“运动预测-动态特征融合-级联匹配”三阶模型：通过卡尔曼滤波预测目标运动状态，融合当前帧外观特征与轨迹历史特征均值生成动态描述子，结合IoU运动匹配与余弦相似度外观匹配实现分层关联，增强遮挡场景下的轨迹连续性。最后，在无人系统自动行驶数据集上验证，结果表明，跟踪任务中目标跟踪准确度（MOTA）指标达0.65,IDF1达到72%。对比分析显示，该算法通过检测模块与跟踪模块的协同优化，显著增强了复杂场景下多目标检测的尺度适应性和跟踪的时空鲁棒性，为智能交通、安防监控等领域的目标感知任务提供了高性能解决方案。

关键词： 无人系统导航; 智能感知; 目标检测; 改进的YOLO算法; 目标定位与跟踪; Deep-SORT算法; 计算机视觉;

Abstract：

In order to address the insufficient accuracy of traditional algorithms in multi-object detection and tracking under complex environments, this study conducts research on an improved algorithm based on You Only Look Once(YOLO) and Deep Simple Online and Realtime Tracking(DeepSORT). Firstly, to meet the demand for multi-object detection in complex environments, the YOLO algorithm is optimized doubly, an improved Convolutional Block Attention Module(CBAM) is integrated into the feature extraction network, enhancing the response capability to small-target features by expanding the convolution kernel of the spatial attention branch; the K-means clustering algorithm is reconstructed using Intersection over Union(IoU) instead of Euclidean distance to redesign anchor boxes, improving the matching degree of anchor boxes for targets of different scales, which results in a higher average IoU compared to traditional clustering methods. Subsequently, the observation association strategy of the DeepSORT algorithm is improved by constructing a three-stage model of “motion prediction-dynamic feature fusion-cascaded matching”: the Kalman filter is used to predict the target motion state, dynamic descriptors are generated by fusing the appearance features of the current frame with the mean value of historical trajectory features, and hierarchical association is realized by combining IoU-based motion matching and cosine similarity-based appearance matching, thereby enhancing trajectory continuity in occluded scenarios. Finally, verification on the autonomous driving dataset of unmanned systems shows that, in the tracking task, the Multiple Object Tracking Accuracy(MOTA) index reaches 0.65, and the Identity F1 Score(IDF1) reaches 72%. Comparative analysis indicates that through the collaborative optimization of the detection and tracking modules, the proposed algorithm significantly improves the scale adaptability of multi-object detection and the spatiotemporal robustness of tracking under complex scenarios. It provides a high-performance solution for target perception tasks in fields such as intelligent transportation and security monitoring.

KeyWords： Unmanned System Navigation; Intelligent Perception; Target Detection; Improved YOLO Algorithm; Target Localization and Tracking; DeepSORT Algorithm; Computer Vision;

参考文献

[1]DOSOVITSKIY A, LUCAS B, ALEXANDER K. An image is worth 16×16 words:Transformers for image recognition at scale[C]//International Conference on Learning Representations, Chicago, USA, 2021.

[2]LIANG M. Multi-modal 3d object detection for autonomous driving[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence(IEEE T-PAMI), 2022,44(7):1401-1414.

[3]赵尚弘，王恒，王璐.低慢小无人机目标探测关键技术与发展趋势[J].无人系统技术，2024, 7(6):1-14.ZHAO S H, WANG H, WANG L. Key technologies and development trends of low-slow-small UAV target detection[J]. Unmanned System Technology, 2024, 7(6):1-14.(in Chinese)

[4]LIU Z. Swin transformer:Hierarchical vision transformer using shifted windows[C]//IEEE International Conference on Computer Vision(ICCV), Montreal, Canada,2021.

[5]BARRON J T. Mip-NeRF:A multiscale representation for anti-aliasing neural radiance fields[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), New Orleans, USA, 2022.

[6]段续庭，吴思凡，王奇，等.车机协同智能无人系统关键技术研究与展望[J].无人系统技术，2025, 8(2):1-18.DUAN X T, WU S F, WANG Q, et al. Research and prospect of key technologies for vehicle-machine collaborative intelligent unmanned systems[J]. Unmanned System Technology, 2025, 8(2):1-18.(in Chinese)

[7]何嘉凯，朱明，马艳，等.基于CZT变换-TDOA估计的目标定位方法[J].无人系统技术，2024, 46(6):71-76.HE J K, ZHU M, MA Y, et al. Target localization method based on CZT transform-TDOA estimation[J].Unmanned System Technology, 2024, 46(6):71-76.(in Chinese)

[8]WEN W. UrbanLoco:A full sensor suite dataset for mapping and localization[J]. IEEE Robotics and Automation Letters(IEEE RAL), 2021.

[9]PRABU S, BE A B H. Multi-object detection and tracking with modified optimization classification in video sequences[J]. Journal of Computer Allied Intelligence, 2024, 2(3):15-27.

[10]ZHOU Y, TUZEL O. End-to-end autonomous driving perception:A survey[J]. IEEE Transactions on Intelligent Transportation Systems, 2023, 24(3):1234-1256.

[11]LIU Z, LIN Y, CAO Y. Vision-centric BEV perception:Challenges and solutions[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024, 46(2):345-367.

[12]WANG C, ZHANG Y, QIAO S. Real-time object detection for autonomous vehicles[J]. IEEE Robotics and Automation Letters, 2022, 7(2):1234-1241.

[13]CHEN X, XIE S, HE K. Robust visual perception in adverse conditions[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, Canada, 2023.

[14]ZHANG Y, ZHOU D, CHEN X. Efficient onboard computing for autonomous driving[J]. Nature Machine Intelligence, 2023, 5(3):234-245.

[15]王强，李伟.复杂场景下的目标检测算法研究综述[J].自动化学报，2021, 47(8):1721-1735.WANG Q, LI W. A Review of target detection algorithms in complex scenes[J]. Acta Automatica Sinica,2021, 47(8):1721-1735.(in Chinese)

[16]GIRSHICK R, DONAHUE J, DARRELL T. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), Columbus,USA, 2014.

[17]REN S, HE K, GIRSHICK R. Faster R-CNN:Towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence(IEEE TPAMI), 2017, 39(6):1137-1149.

[18]REDMON J, DIVVALA S, GIRSHICK R. You only look once:Unified, real-time object detection[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), Las Vegas, USA, 2016.

[19]张明，刘芳. YOLO系列算法在复杂场景中的应用研究[J].计算机学报，2022, 45(3):512-528.ZHANG M, LIU F. Research on application of YOLOseries algorithms in complex scenes[J]. Chinese Journal of Computers, 2022, 45(3):512-528.(in Chinese)

[20]张伟，李明.基于深度学习的目标跟踪算法研究进展[J].计算机研究与发展，2021, 58(3):511-525.ZHANG W, LI M. Research progress of target tracking algorithms based on deep learning[J]. Journal of Computer Research and Development, 2021, 58(3):511-525.(in Chinese)

[21]Bewley A, Ge Z, OTT L. Simple online and realtime tracking[C]//IEEE International Conference on Image Processing(ICIP), Phoenix, USA, 2016.

[22]JIANG P, ERGU D, LIU F, et al. A review of Yolo algorithm developments[J]. Procedia Computer Science,2022, 199:1066-1073.

[23]王强，刘芳.复杂场景下的多目标跟踪算法研究[J].自动化学报，2022, 48(2):321-335.WANG Q, LIU F. Research on multi-target tracking algorithms in complex scenes[J]. Acta Automatica Sinica, 2022, 48(2):321-335.(in Chinese)

[24]王明，李强.实时目标检测算法的研究进展[J].计算机学报，2022, 45(6):1201-1215.WANG M, LI Q. Research progress of real-time target detection algorithms[J]. Chinese Journal of Computers, 2022, 45(6):1201-1215.(in Chinese)

[25]贺愉婷，车进，吴金蔓.基于YOLOv5和重识别的行人多目标跟踪方法[J].液晶与显示，2022, 37(7):880-890.HE Y T, CHE J, WU J M. A pedestrian multi-target tracking method based on YOLOv5 and re-identification[J]. Chinese Journal of Liquid Crystals and Displays,2022, 37(7):880-890.(in Chinese)

[26]WOJKE N, BEWLEY A, PAULUS D. Simple online and realtime tracking with a deep association metric[C]//2017 IEEE International Conference on Image Processing(ICIP), Beijing, China, September 17-20,2017.

[27]ALAZEB A, CHUGHTAI B R, AL MUDAWI N, et al.Remote intelligent perception system for multi-object detection[J]. Frontiers in Neurorobotics, 2024, 18:1398703.

[28]ALSHEHRI M, XUE T, MUJTABA G, et al. Integrated neural network framework for multi-object detection and recognition using UAV imagery[J]. Frontiers in Neurorobotics, 2025, 19:1643011.

基本信息:

DOI：10.19942/j.issn.2096-5915.2025.06.60

中图分类号:TP391.41

引用信息:

[1]礼冬雪,张宁,王宏宇,等.基于改进的YOLO和DeepSORT的多目标定位与跟踪算法[J].无人系统技术,2025,8(06):122-132.DOI:10.19942/j.issn.2096-5915.2025.06.60.

基金信息:

国家自然科学基金（62206183）

投稿时间：

2025-08-25

投稿日期（年）：

2025

终审时间：

2025-12-27

终审日期（年）：

2025

审稿周期（年）：

发布时间：

2025-12-15

出版时间：

2025-12-15

请选择需要下载的pdf数据

无人系统技术

使用微信“扫一扫”功能。
将此内容分享给您的微信好友或者朋友圈

引用

GB/T 7714-2015 格式引文

MLA格式引文

APA格式引文

请选择需要下载的pdf数据

无人系统技术

使用微信“扫一扫”功能。将此内容分享给您的微信好友或者朋友圈

引用

使用微信“扫一扫”功能。
将此内容分享给您的微信好友或者朋友圈