Image Classification and Object Detection Algorithm Based on Convolutional Neural Network
##plugins.themes.bootstrap3.article.main##
##plugins.themes.bootstrap3.article.sidebar##
Abstract
Traditional image classification methods are difficult to process huge image data and cannot meet people’s requirements for image classification accuracy and speed. Convolutional neural networks have achieved a series of breakthrough research results in image classification, object detection, and image semantic segmentation. This method broke through the bottleneck of traditional image classification methods and became the mainstream algorithm for image classification. Its powerful feature learning and classification capabilities have attracted widespread attention. How to effectively use convolutional neural networks to classify images have become research hotspots. In this paper, after a systematic study of convolutional neural networks and an in-depth study of the application of convolutional neural networks in image processing, the mainstream structural models, advantages and disadvantages, time / space used in image classification based on convolutional neural networks are given. Complexity, problems that may be encountered during model training, and corresponding solutions. At the same time, the generative adversarial network and capsule network based on the deep learning-based image classification extension model are also introduced; simulation experiments verify the image classification In terms of accuracy, the image classification method based on convolutional neural networks is superior to traditional image classification methods. At the same time, the performance differences between the currently popular convolutional neural network models are comprehensively compared and the advantages and disadvantages of various models are further verified. Experiments and analysis of overfitting problem, data set construction method, generative adversarial network and capsule network performance.
##plugins.themes.bootstrap3.article.details##
Convolutional Neural Network, Deep Learning, Feature Expression, Transfer Learning
2. Hinton GE, Osindero S, Teh YW. A fast learning algorithm for deep belief nets. Neur Comput 2006; 18(7):1527-1554.
3. Lee H, Grosse R, Ranganath R, et al. Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations // ICML ‘09: Proceedings of the 26th Annual International Conference on Machine Learning. New York: ACM, 2009:609-616.
4. Huang G B, Lee H, Erik G. Learning hierarchical representations for face verification with convolutional deep belief networks // CVPR ‘12: Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC: IEEE Computer Society, 2012:2518-2525.
5. Krizhevsky A, Sutskever I, Hinton GE. ImageNet classification with deep convolutional neural networks // Proceedings of Advances in Neural Information Processing Systems. Cambridge, MA: MIT Press, 2012:1106-1114.
6. Girshick R, Donahue J, Darrell T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation // Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC: IEEE Computer Society, 2014:580-587.
7. Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation // Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC: IEEE Computer Society, 2015:3431-3440.
8. Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. (2015-11-04)
9. Szegedy C, Liu W, Jia Y, et al. Going deeper with convolutions // Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC: IEEE Computer Society, 2015:1-8.
10. He K, Zhang X, Ren S, et al. Deep residual learning for image recognition. (2016-01-04).
11. Pan S J, Yang Q. A survey on transfer learning. IEEE Transact Knowled Data Engineer 2010; 22(10):1345-1359.
12. Collobert R, Weston J, Bottou L, et al. Natural language processing (almost) from scratch. J Machin Learn Res 2011; 12(1):2493-2537.
13. Oquab M, Bottou L, Laptev I, et al. Learning and transferring mid-level image representations using convolutional neural networks // Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC: IEEE Computer Society, 2014:1717-1724.
14. Hubel DH, Wiesel TN. Receptive fields, binocular interaction, and functional architecture in the cat’s visual cortex. J Physiol 1962; 160(1):106-154.
15. Fukushima K. Neocognitron: a self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biol Cybernet, 1980; 36(4):193-202.
16. Waibel A, Hanazawa T, Hinton G, et al. Phoneme recognition using time-delay neural networks (M)// Readings in Speech Recognition. Amsterdam: Elsvier, 1990:393-404.
17. Vaillant R, Monrocq C, Le Cun Y. Original approach for the localization of objects in images. IEE Proceed Vis Imag Sig Process 1994; 141(4):245-250.
18. Lawrence S, Giles CL, Tsoi AC, et al. Face recognition: a convolutional neural-network approach. IEEE Transact Neur Network 1997; 8(1):98-113.
19. Deng J, Dong W, Socher R, et al. ImageNet: a large-scale hierarchical image database // Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC: IEEE Computer Society, 2009:248-255.
20. Donahue J, Hendricks LA, Guadarrama S, et al. Long-term recurrent convolutional networks for visual recognition and description // Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC: IEEE Computer Society, 2015:2625-2634.
21. Vinyals O, Toshev A, Bengio S, et al. Show and tell: a neural image caption generator // Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC: IEEE Computer Society, 2015:3156-3164.
22. Malinowski M, Rohrbach M, Fritz M. Ask your neurons: a neural-based approach to answering questions about images // Proceedings of the 2015 IEEE International Conference on Computer Vision. Piscataway, NJ: IEEE, 2015:1-9.
23. Antol S, Agrawal A, Lu J, et al. VQA: visual question answering // Proceedings of the 2015 IEEE International Conference on Computer Vision. Piscataway, NJ: IEEE, 2015:2425-2433.
24. Zeiler MD, Fergus R. Visualizing and understanding convolutional networks // Proceedings of European Conference on Computer Vision, LNCS 8689. Berlin: Springer, 2014:818-833.
25. Ji S, Xu W, Yang M, et al. 3D convolutional neural networks for human action recognition. IEEE Transact Pattern Anal Mach Intel 2013; 35(1):221-231.
26. Lowe DG. Distinctive image features from scale-invariant keypoints. International J Comput Vis 2004; 60(2):91-110.
27. Dalal N, Triggs B. Histograms of oriented gradients for human detection // Proceedings of the 2005 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC: IEEE Computer Society, 2005:886-893.
28. Lecun Y, Bengio Y, Hinton GE. Deep learning. Nature 2015; 521(7553):436-444.
29. Sun ZJ, Xue L, Xu YM, et al. Overview of deep learning. Appl Res Comput 2012; 29(8):2806-2810.
30. Donahue J, Jia Y, Vinyals O, et al. DeCAF: a deep convolutional activation feature for generic visual recognition. Comput Sci 2013; 50(1):815-830.
31. Razavian AS, Azizpour H, Sullivan J, et al. CNN features off-the-shelf: an astounding baseline for recognition. (2015-11-22).
32. Sermanet P, Kavukcuoglu K, Chintala S, et al. Pedestrian detection with unsupervised multi-stage feature learning // CVPR ‘13: Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC: IEEE Computer Society, 2013:3626-3633.
33. Karpathy A, Toderici G, Shetty S, et al. Large-scale video classification with convolutional neural networks // CVPR ‘14: Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC: IEEE Computer Society, 2014:1725-1732.
34. Toshev A, Szegedy C. DeepPose: human pose estimation via deep neural networks // Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC: IEEE Computer Society, 2014:1653-1660.
35. Kalchbrenner N, Grefenstette E, Blunsom P. A convolutional neural network for modelling sentences. (2016-01-07).
36. Kim Y. Convolutional neural networks for sentence classification. (2016-01-07).
37. Abdel-Hamid O, Mohammed A, Jiang H, et al. Convolutional neural networks for speech recognition. IEEE/ACM Transact Aud Speech Lang Process 2014; 22(10):1533-1545.
38. Silver D, Huang A, Maddison C J, et al. Mastering the game of Go with deep neural networks and tree search. Nature 2016; 529(7587):484-489.
39. Zeiler MD, Fergus R. Stochastic pooling for regularization of deep convolutional neural networks. (2016-01-11).
40. Murphy KP. Machine Learning: A Probabilistic Perspective. Cambridge, MA: MIT Press, 2012:82-92.
41. Chatfield K, Simonyan K, Vedaldi A, et al. Return of the devil in the details: delving deep into convolutional nets. (2016-01-12).
42. Goodfellow IJ, Warde-Farley D, Mirza M, et al. Maxout networks. (2016-01-12).
43. Lin M, Chen Q, Yan S. Network in network. (2016-01-12).
44. Montavon G, Orr G, Mvller KR. Neural Networks: Tricks of the Trade. London: Springer, 2012:49-131.
45. Bengio Y, Simard P, Frasconi P. Learning long-term dependencies with gradient descent is difficult. IEEE Transact Neur Network 1994; 5(2):157-166.
46. He K, Zhang X, Ren S, et al. Delving deep into rectifiers: surpassing human-level performance on ImageNet classification // Proceedings of the 2015 IEEE International Conference on Computer Vision. Piscataway, NJ: IEEE, 2015:1026-1034.
47. Hinton GE, Srivastava N, Krizhevsky A, et al. Improving neural networks by preventing co-adaption of feature detectors (R/OL). (2015-10-26).
48. Wan L, Zeiler M, Zhang S, et al. Regularization of neural networks using dropconnect // Proceedings of the 2013 International Conference on Machine Learning. New York: ACM Press, 2013:1058-1066.
49. He K, Sun J. Convolutional neural networks at constrained time cost // Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC: IEEE Computer Society, 2015:5353-5360.
50. Springenberg JT, Dosovitskiy A, Brox T, et al. Striving for simplicity: the all convolutional net. (2015-12-24).
51. Van Der Maaten L, Hinton G. Visualizing data using t-SNE. (2015-12-24).
52. Oliva A, Torralba A. Modeling the shape of the scene: a holistic representation of the spatial envelope. Int J Comput Vis 2001; 42(3):145-175.
53. Wang J, Yang J, Yu K. Locality-constrained linear coding for image classification // Proceedings of the 2010 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC: IEEE Computer Society, 2010:3360-3367.
54. Zeiler MD, Taylor GW, Fergus R. Adaptive deconvolutional networks for mid and high level feature learning // ICCV ‘11: Proceedings of the 2011 International Conference on Computer Vision. Piscataway, NJ: IEEE, 2011:2018-2025.
55. Nguyen A, Yosinski J, Clune J, et al. Deep neural networks are easily fooled: high confidence predictions for unrecognizable images // Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC: IEEE Computer Society, 2015:427-436.
56. Floreano D, Mattiussi C. Bio-inspired Artificial Intelligence: Theories Methods and Technologies(M). Cambridge, MA: MIT Press, 2008:1-97.
57. Zhuang FZ, Luo P, He Q, et al. Survey on transfer learning research. J Software 2015; 26(1):26-39.
58. Li F, Fergus R, Perona P. One-shot learning of object categories. IEEE Transact Pattern Anal Mach Intel 2006; 28(4):594-611.
59. Griffin BG, Holub A, Perona P. The Caltech-256 (R/OL). (2016-01-03).
60. Zhou B, Lapedriza A, Xiao J, et al. Learning deep features for scene recognition using places database // Proceedings of Advances in Neural Information Processing Systems. Cambridge, MA: MIT Press. 2014:487-495.
61. Loffe S, Szegedy C. Batch normalization: accelerating deep network training by reducing internal covariate shift. (2016-01-06).
62. Girshick RB. Fast R-CNN. (2016-01-06).
63. Ren S, He K, Girshick R, et al. Faster R-CNN: towards real-time object detection with region proposal networks. (2016-01-06).
64. Uijlings J, Sande K, Gevers T, et al. Selective search for object recognition. International Journal of Computer Vision, 2013, 104 (2):154-171.
65. Khan SH, Bennamoun M, Sohel F, et al. Automatic feature learning for robust shadow detection // CVPR’14: Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC: IEEE Computer Society, 2014:1939-1946.
66. Taigman Y, Yang M, Ranzato M, et al. DeepFace: closing the gap to human-level performance in face verification // CVPR’14: Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC: IEEE Computer Society, 2014:1701-1708.
67. Schroff F, Kalenichenko D, Philbin J. FaceNet: a unified embedding for face recognition and clustering // Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC: IEEE Computer Society, 2015:815-823.
68. Levi G, Hassner T. Age and gender classification using convolutional neural networks // Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops. Washington, DC: IEEE Computer Society, 2015:34-42.

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.