A Novel Multi-Stage Training Approach for Human Activity Recognition from Multimodal Wearable Sensor Data

The prevalence of various wearable devices allows performing human activity recognition. Nevertheless, selecting effective features is still complicated when using many sensors. A recent paper on arXiv.org proposes a novel multi-stage training methodology to overcome current difficulties.

Fitness tracker. Image credit: StockSnap via Pixabay, CC0 Public Domain

Fitness tracker. Image credit: StockSnap via Pixabay, CC0 Public Domain

A novel deep convolutional neural network architecture enables feature extraction from numerous transformed spaces instead of relying on a single space. The separate networks are then combined using multi-stage sequential training to attain the most robust and accurate feature representation.

The method achieves optimization with a smaller amount of training data and avoids noise or other perturbations. It outperforms state-of-the-art approaches with an 11.49{394cb916d3e8c50723a7ff83328825b5c7d74cb046532de54bc18278d633572f} average improvement. The scheme can also be applied in other fields that require to train neural networks deploying transformed representations of data.

Deep neural network is an effective choice to automatically recognize human actions utilizing data from various wearable sensors. These networks automate the process of feature extraction relying completely on data. However, various noises in time series data with complex inter-modal relationships among sensors make this process more complicated. In this paper, we have proposed a novel multi-stage training approach that increases diversity in this feature extraction process to make accurate recognition of actions by combining varieties of features extracted from diverse perspectives. Initially, instead of using single type of transformation, numerous transformations are employed on time series data to obtain variegated representations of the features encoded in raw data. An efficient deep CNN architecture is proposed that can be individually trained to extract features from different transformed spaces. Later, these CNN feature extractors are merged into an optimal architecture finely tuned for optimizing diversified extracted features through a combined training stage or multiple sequential training stages. This approach offers the opportunity to explore the encoded features in raw sensor data utilizing multifarious observation windows with immense scope for efficient selection of features for final convergence. Extensive experimentations have been carried out in three publicly available datasets that provide outstanding performance consistently with average five-fold cross-validation accuracy of 99.29{394cb916d3e8c50723a7ff83328825b5c7d74cb046532de54bc18278d633572f} on UCI HAR database, 99.02{394cb916d3e8c50723a7ff83328825b5c7d74cb046532de54bc18278d633572f} on USC HAR database, and 97.21{394cb916d3e8c50723a7ff83328825b5c7d74cb046532de54bc18278d633572f} on SKODA database outperforming other state-of-the-art approaches.

Link: https://arxiv.org/abs/2101.00702

Next Post

Privacy-sensitive Objects Pixelation for Live Video Streaming

The elevated popularity of on line video streaming indicates that privacy-delicate objects need to be pixelated prior to broadcasting. However, recent pixelation strategies emphasis on offline videos and cannot offer with shaky digital camera movements or noisy backgrounds, regular in dwell streaming. Graphic credit score: pxhere.com, CC0 Public Domain As […]

Subscribe US Now