TY - GEN
T1 - Berkeley MHAD
T2 - 2013 IEEE Workshop on Applications of Computer Vision, WACV 2013
AU - Ofli, Ferda
AU - Chaudhry, Rizwan
AU - Kurillo, Gregorij
AU - Vidal, Rene
AU - Bajcsy, Ruzena
PY - 2013
Y1 - 2013
N2 - Over the years, a large number of methods have been proposed to analyze human pose and motion information from images, videos, and recently from depth data. Most methods, however, have been evaluated on datasets that were too specific to each application, limited to a particular modality, and more importantly, captured under unknown conditions. To address these issues, we introduce the Berkeley Multimodal Human Action Database (MHAD) consisting of temporally synchronized and geometrically calibrated data from an optical motion capture system, multi-baseline stereo cameras from multiple views, depth sensors, accelerometers and microphones. This controlled multimodal dataset provides researchers an inclusive testbed to develop and benchmark new algorithms across multiple modalities under known capture conditions in various research domains. To demonstrate possible use of MHAD for action recognition, we compare results using the popular Bag-of-Words algorithm adapted to each modality independently with the results of various combinations of modalities using the Multiple Kernel Learning. Our comparative results show that multimodal analysis of human motion yields better action recognition rates than unimodal analysis.
AB - Over the years, a large number of methods have been proposed to analyze human pose and motion information from images, videos, and recently from depth data. Most methods, however, have been evaluated on datasets that were too specific to each application, limited to a particular modality, and more importantly, captured under unknown conditions. To address these issues, we introduce the Berkeley Multimodal Human Action Database (MHAD) consisting of temporally synchronized and geometrically calibrated data from an optical motion capture system, multi-baseline stereo cameras from multiple views, depth sensors, accelerometers and microphones. This controlled multimodal dataset provides researchers an inclusive testbed to develop and benchmark new algorithms across multiple modalities under known capture conditions in various research domains. To demonstrate possible use of MHAD for action recognition, we compare results using the popular Bag-of-Words algorithm adapted to each modality independently with the results of various combinations of modalities using the Multiple Kernel Learning. Our comparative results show that multimodal analysis of human motion yields better action recognition rates than unimodal analysis.
UR - https://www.scopus.com/pages/publications/84875595728
U2 - 10.1109/WACV.2013.6474999
DO - 10.1109/WACV.2013.6474999
M3 - Conference contribution
AN - SCOPUS:84875595728
SN - 9781467350532
T3 - Proceedings of IEEE Workshop on Applications of Computer Vision
SP - 53
EP - 60
BT - 2013 IEEE Workshop on Applications of Computer Vision, WACV 2013
PB - IEEE Computer Society
Y2 - 15 January 2013 through 17 January 2013
ER -