3D body shape estimation from video

Use your smartphone camera to estimate person's precise body shape from a video.

Intro

Per-frame 3D body shape estimation results - right: video from Vlada Karpovich, left: video from Polina Tankilevitch on Pexels.

The very first project at Luxolis was concerned with using AI to estimate person’s 3D mesh from a video stream. And, I mean an accurate, bodyshape aware estimation.

To make absolutely clear what I mean by “bodyshape aware”, here’s an excerpt from the abstract of SHAPY paper:

While methods that regress 3D human meshes from images have progressed rapidly, the estimated body shapes often do not capture the true human shape. This is problematic since, for many applications, accurate body shape is as important as pose. The key reason that body shape accuracy lags pose accuracy is the lack of data

In fact, SHAPY allows to predict all of the standard measurements for chest, waist, hips (CWH), in addition to height, weight and a whole other body measurement parameters. While many other methods in 3D mesh estimation use just a default human mesh, SHAPY creates an accurate human body mesh, in addition to an accurate 3D posture of the said mesh. Below, is a video explaining all the model details:

The authors of the paper explain how the model works.

Replicating the results

Replicating SHAPY’s result was not easy. During my work, I’ve came across multitude of issues and errors, mainly from installing dependencies. Here’s an excerpt from my sufferingdev logs:

I’m currently working on building a human-shape-from-image paper, called SHAPY from CVPR2022. It’s a solid paper, with solid results, however, it is an absolute PITA to install and run. The documentation is lacking. Concrete example of this is the fact that you need to install libturbojpeg and there is no mention of this anywhere, except the issues tab (what would it take to update a README.md, like, 20 minutes?). Worse still is the fact that the library depends on OpenPose. A library that’s really complicated to install on linux.

Here’s the step-by-step process to creating a working Dockerfile for this thing:

  • Clone shapy repo and follow instructions precisely (if you get “import attributes” error, that means you need to look for the export $PATH command in the install guide and run that). Another error comes from a missing dependency, libturbojpeg. I did sudo apt-get install libturbojpeg, or a variation of the package’s name, can’t recall.
  • Did a sample run on the regression/demo.py. To run on new images, we have to change openpose dataset location arg.
  • SHAPY depends on JSON pose keypoints. This is generated by openpose. Here’s where the pain begins - openpose doesn’t provide linux builds. So you have to build one yourself within or outside of docker.
  • I managed to run openpose within a docker container, using this guide. This took over 3 hours. :/ Installation takes good 30 minutes or more on 12 vCPU machine.
  • Once done, find openpose docs for CLI usage, you will need to use --write_json arg for this.

In short, here’s what I built (the whole pipeline):

  • Use FFMPEG to extract frames from a video
  • Use openpose to create JSONs for frames
  • Use SHAPY regressor to predict shape -> ply file
  • Use shapy virtual measurements tool to extract measurements from generated shapes per frame
  • Average measurements out for more accuracy
  • Return average measurements from API

The work specifics

Life isn’t simple, and business usually isn’t happy with just the given outputs that the SHAPY code-base provides (mesh, chest, waist, hips, mass, height). We also need to calculate arm span, leg length, torso height, and head circumference.

This interactive demo shows the final 3D T-pose result is generated by averaging shape parameters across frames.
Even without a video, you can generate a reliable 3D body shape.
This 3D body shape is in an usual pose, leaning against a wall.