In Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy
will defend his dissertation
Human Detection in the Wild
Human detection remains a challenging task due to the problems caused by occlusion variance. Visible-body bounding boxes are typically used as an extra supervision signal to improve the performance of human detection to predict the full body. However, visible-body assisted approaches produce a large number of false positives, which result from a lack of adequate and discriminative full-body contextual information. As the most discriminative features of head and human, face detection has attracted much attention. Despite the great progress that has been achieved for accurate face detection, detecting multi-scale faces, especially for small faces, remains a challenging problem. Existing approaches that tackle multi-scale face detection problems could be categorized into two directions: (1) two-stage face detectors. To learn discriminative facial features at various scales, the input pyramids or multi-scale feature maps are deployed to provide more facial information of various scales of faces, especially small faces for the network to learn features in various scales. However, they could increase the training difficulty and complexity of the network. (2) To enhance the representation power of learned features for various scales of faces, the combination of feature maps at different levels and context aggregation modules have been applied to state-of-the-art face detectors. However, treating reliable information and noise equally could result in much noise in the fused features at different levels. Some of the proposed context aggregation modules employed stacked convolutions or dilated convolutions to enhance the contextual information, which is not efficient and could introduce the gridding artifacts problem. To solve the above problems, this thesis presents three works. A decoupled visible region network for human detection (DVRNet) is designed for solving the occlusion problem. A robust two-stage face detector (SSFD+) is proposed to detect multi-scale faces by employing a single-scale feature map to learn multi-scale facial features. A smoothed attention network for a single-stage face detector (SANet) is designed to detect multi-scale faces by leveraging attention mechanism alternative convolutions and dilated convolutions.
Date: Monday, July 13, 2020
Time: 10:00 - 11:00 AM
Place: Online Presentation - Zoom meeting
Advisor: Dr. Ioannis Kakadiaris
Faculty, students, and the general public are invited.