Skip to the content.

Human Activity Recognition Networks (HARNets)

Introduction

Prediction of human activity and detection of subsequent actions is crucial for improving the interaction between humans and robots during collaborative operations. Deep-learning techniques are being applied to recognize human activities, including industrial applications. However, the lack of sufficient dataset in the industrial domain and complexities of some industrial activities such as screw driving, assembling small parts, and others affect the model development and testing of human activities. Recently, the InHard dataset (Industrial Human Activity Recognition Dataset) has been published to facilitate industrial human activity recognition for better human-robot collaboration, which still lacks extended evaluation. In this regard, we employ human activity recognition memory and sequential networks (HARNets) combining convolutional neural network (CNN) and long short-term memory (LSTM) techniques.

Prerequisites

Objective

Methodology

Employ existing deeplearning techniques based on CNN and LSTM for human activity recongition based on publically available datasets. Evaluate the results from different dataset and deeplearning models.

There have been three types of datasets and deeplearning techniques employed for evaluating Inhard Dataset and our own dataset captured at lab environment.

Results

Dividing the InHard datasets into short activity (SA) and long activity (LA), we obtained the followed results. The results are from IMU dataset, RGB side view and RGB spatial view (RGB-SP) which is typically from top view.

No. of Hidden Layers Scenario Training accuracy % (Batchsize 4) Training accuracy % (Batchsize 8) Training accuracy % (Batchsize 16) Validation accuracy % (Batchsize 4) Validation accuracy % (Batchsize 8) Validation accuracy % (Batchsize 16)
1 IMU-SA 91 98 90 64 65 70
  IMU-LA 95 97 99 65 72 68
  RGB-SA 85 88 92 68 69 67
  RGB-LA 90 88 90 62 61 58
  RGB-SP-SA 74 75 75 70 72 73
  RGB-SP-LA 71 71 73 69 70 71
2 IMU-SA 91 94 96 67 70 69
  IMU-LA 90 98 95 65 76 72
  RGB-SA 85 91 91 67 70 67
  RGB-LA 90 92 92 63 63 61
  RGB-SP-SA 70 77 79 69 75 75
  RGB-SP-LA 75 75 78 68 72 71
3 IMU-SA 90 85 94 70 70 72
  IMU-LA 91 96 98 70 68 69
  RGB-SA 87 81 89 69 68 75
  RGB-LA 89 84 91 65 61 61
  RGB-SP-SA 72 74 79 68 72 71
  RGB-SP-LA 75 76 74 68 69 70

Getting started with HARNets?

Spatial based activity recognition using RGB data

  1. Place the videos by downloading from InHard datasets according to the following hierarchy. Each video type should have its own folder as shown below.
	| data/test
		| Assembly
                    |Activity_sample_01.mp4
                    |Activity_sample_02.mp4
                               :
		| picking_front
                    |Activity_sample_01.mp4
                    |Activity_sample_02.mp4
                               :
	| data/train
		| Assembly
                    |Activity_sample_01.mp4
                    |Activity_sample_02.mp4
                               :
		| Picking_front
                    |Activity_sample_01.mp4
                    |Activity_sample_02.mp4
                               :
		...
  1. Extract frames from video using script extract_files.py in data folder.
$ python extract_frames.py
  1. Following Parameter can be set as per the user requirement in rgb_traning.py before traning
          i) sequence_length = 50 or 20 or 70 - should be minimum to process all data. defult is 50
          ii) class_limit = 9 - No. of classes
          iii) image_height = 300 - height of video. defult is 300.
          iv) image_width args = 500 - width of video frame. defult is 500. 
    
  2. Run train.py script for traning of CNN + LSTM model with sequence_length, class_limit, image_height, image_width args.
$ python rgb_training.py
  1. The best model will be saved in Data/Checkpoints file.

  2. To evaluate model on test dataset and get a confusion matrix (Note: before running python script please copy the saved model to same folder and change the name of model to copied one in script)

$ python rgb_evaluation.py

Prediction

  1. To predict the activity from video, copy the tranined weight file to main directory.

  2. Set the following paprameter in predict.py file

        sequence_length = Sequences of frame to be process (should be equal to value during training).
        class_limit = No. of Class to be consider for training.
        saved_model_file = Name of trained model weight.
        video_filename = Name of video file you want to predict.
  1. Use predict.py script to predict the activity from video.
$ python predict.py

IMU based activity recognition using BVH data

Following are the steps for training a model (LSTM) and prediction for Skeleton data

  1. Place the Skeleton BVH files in Dataset folders. Each BVH file should have its own folder as shown below.
     | Dataset/
               | Assemble_system/
                    |Activity_sample_01.bvh
                    |Activity_sample_02.bvh
                              :
               | Picking_front/
                    |Activity_sample_01.bvh
                    |Activity_sample_02.bvh
                              :
               | Turn_Sheets/
                    |Activity_sample_01.bvh
                    |Activity_sample_02.bvh
                              :
         ...
    
  2. Following Parameter can be set as per the user requirement.
        i) nrows = 200 or 150 - No. of rows to read from the each CSV. defult is 200.
        ii) time_steps = 200 or 150 - Decides the row data to be consider for traning (length of the activity.)
        iii) Batch size = 4, 8 or 16. defult is 8
        iv) epochs = As per user. defult is 300.
        Note: nrows and time_steps should be equal for good results.
  1. After setting the above papramete manually Run the ‘Skeleton_LSTM_Training.py’ to train the LSTM model for Skeleton data.
$ python Skeleton_LSTM_Training.py
  1. The above python script will save the trained model weight as ‘Skeleton_model.h5’, plot and saves confusion matrix, traning accuracy and traning loss graph

Prediction

  1. Copy the saved model weight file (.h5) to same directory which contain Skeleton_Prediction.py python script.

  2. Put the BVH file you want to get predicted into ‘file_to_predict’ folder.

  3. Run the ‘Skeleton_prediction.py’ python script to get a prediction of classes.

$ python Skeleton_Prediction.py

Open pose based activity recognition using RGD data

This approach has employed a model from Open pose.

DataSet

References

Contribution

This work has been conducted at the University of Siegen, Germany, Institute of Production Technology.

Citation

 @inproceedings{Tuli-cpsl2022,
   title={Industrial Human Activity Prediction and Detection Using Sequential Memory Networks},
   author={Tuli, Tadele Belay and Patel, Valay Mukesh and Manns, Martin },
   booktitle={Conference on production systems and logistics (CPSL 2022)},
   year={2022},
 } 
 
 @software{tuli_HARNets,
  author       = {Tadele Belay Tuli and
                  Valay Mukesh Patel and
                  Martin Manns},
  title        = ,
  month        = mar,
  year         = 2022,
  publisher    = {Zenodo},
  version      = {v0.1},
  doi          = {10.5281/zenodo.6366665},
  url          = {https://doi.org/10.5281/zenodo.6366665} Add to Citavi project by DOI
}

Disclaimer

We do not own any license for modules or packages that are used in our model. Please check the specific license requirements for further use.