Research on Human Pose Estimation Model Based on Long-Range Fine-Grained Modeling
Downloads
To address the issues of lacking long-range spatial position learning ability and excessive loss of fine-grained feature information during spatial feature pooling in human pose estimation models, a novel human pose estimation model based on the Yolopose network is proposed. Firstly, an enhanced coordinate attention module is introduced and embedded into the backbone network to endow the model with long-range spatial position modeling capability. Secondly, a fine-grained cascaded spatial pyramid pooling module is proposed to mitigate the loss of fine-grained feature information caused by spatial feature pooling. Finally, an implicit knowledge learning module is incorporated to reduce the model parameter count and enhance the model's capability for multi-task joint optimization.
LU J, YANG T F, ZHAO B et al. A review of human posture estimation methods based on deep learning[J]. Advances in Lasers and Optoelectronics, 2021,58(24):69-88.
LIU B L, ZHOU S, DONG J F, et al. Research progress of skeleton based human action recognition technology[J]. Journal of Computer Aided Design and Graphics,2023,35(09):1299-1322.
XIE Y, YANG R L, LIU G X et al. Human skeleton action recognition algorithm based on dynamic topological map[J]. Computer Science, 2022,49(02):62-68.
GUO Q, DENG Z Y, CHENG S L et al. A workload evaluation method for human-computer interaction implemented by motion capture[J]. Journal of Computer Aided Design and Graphics, 2020,32(10):1697-1706.
WANG S J, JIANG Z D. A multiplayer network teaching system based on virtual reality live streaming[J]. Computer Application and Software,2022,39(10):132-140.
HUANG Y Q, HUANG Q B, YANG M Q. Virtual try-on technology based on augmented reality and face pose estimation[J]. Computer System Applications,2022,31(02):335-341.
SHEN G, YUAN P T. Application of attitude estimation algorithm in video surveillance[J]. Computer Age,2020(12):33-37.
RAMANAN D. Learning to parse images of articulated bodies[J]. Advances in neural information processing systems, 2006, 19.
SAPP B, TOSHEV A, TASKAR B. Cascaded models for articulated pose estimation[C]//Computer Vision–ECCV 2010: 11th European Conference on Computer Vision, Heraklion, Crete, Greece, September 5-11, 2010, Proceedings, Part II 11. Springer Berlin Heidelberg, 2010: 406-420.
HAN G J, ZHU H. Human posture estimation based on HOG and color feature fusion[J]. Pattern Recognition and Artificial Intelligence, 2014,27(09):769-777.
SHI X B, DING X, DAI Q et al. A human posture estimation method based on connectivity and symmetry relations[J]. Journal of System Simulation, 2014,26(09):2091-2096+2103.
NEWELL A, YANG K, DENG J. Stacked hourglass networks for human pose estimation[C]//Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part VIII 14. Springer International Publishing, 2016: 483-499.
CHEN Y, WANG Z, PENG Y, et al. Cascaded pyramid network for multi-person pose estimation[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 7103-7112.
SUN X, XIAO B, WEI F, et al. Integral human pose regression[C]//Proceedings of the European conference on computer vision (ECCV). 2018: 529-545.
CAO Z, SIMON T, WEI S E, et al. Realtime multi-person 2d pose estimation using part affinity fields[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 7291-7299.
GENG Z, SUN K, XIAO B, et al. Bottom-up human pose estimation via disentangled keypoint regression[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021: 14676-14686.
MAJI D, NAGORI S, MATHEW M, et al. YOLO-Pose: Enhancing YOLO for Multi Person Pose Estimation Using Object Keypoint Similarity Loss[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022: 2637-2646.
ZHU X, LYU S, WANG X, et al. TPH-YOLOv5: Improved YOLOv5 based on transformer prediction head for object detection on drone-captured scenarios[C]//Proceedings of the IEEE/CVF international conference on computer vision. 2021: 2778-2788.
LIU J W,LIU J W,LUO X L. Progress in the study of attention mechanism in deep learning[J]. Journal of Engineering Science,2021,43(11):1499-1511.
HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 7132-7141.
WOO S, PARK J, LEE J Y, et al. Cbam: Convolutional block attention module[C]//Proceedings of the European conference on computer vision (ECCV). 2018: 3-19.
HOU Q, ZHOU D, FENG J. Coordinate attention for efficient mobile network design[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2021: 13713-13722.
WANG C Y, YEH I H, LIAO H Y M. You only learn one representation: Unified network for multiple tasks[J]. arXiv preprint arXiv:2105.04206, 2021.
HAN K, WANG Y, TIAN Q, et al. Ghostnet: More features from cheap operations[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2020: 1580-1589.
LIN T Y, MAIRE M, BELONGIE S, et al. Microsoft coco: Common objects in context[C]//Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13. Springer International Publishing, 2014: 740-755.
HE K, GKIOXARI G, DOLLAR P, et al. Mask r-cnn[C]//Proceedings of the IEEE international conference on computer vision. 2017: 2961-2969.
NEFF C, SHETH A, FURGURSON S, et al. Efficienthrnet: Efficient scaling for lightweight high-resolution multi-person pose estimation[J]. arXiv preprint arXiv:2007.08090, 2020.
PAPANDREOU G, ZHU T, KANAZAWA N, et al. Towards accurate multi-person pose estimation in the wild[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 4903-4911.
PAPANDREOU G, ZHU T, CHEN L C, et al. Personlab: Person pose estimation and instance segmentation with a bottom-up, part-based, geometric embedding model[C]//Proceedings of the European conference on computer vision (ECCV). 2018: 269-286.
KREISS S, BERTONI L, ALAHI A. Pifpaf: Composite fields for human pose estimation[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2019: 11977-11986.
CHENG B, XIAO B, WANG J, et al. Higherhrnet: Scale-aware representation learning for bottom-up human pose estimation[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2020: 5386-5395.