Questions about ML models & techniques for feature requests

Bodysoulspirit · June 13, 2021, 2:55pm

Had some questions regarding the submission of feature requests based on ML models & techniques.
There are some blob tracking feature requests already live and also some other discussions and questions :

• Keithlang’s CoreML integration discussion, regarding multiple hand tracking
• Cremaschi’s ML-based background removal discussion

My questions are can I submit feature requests for

Image segmentation for people / background ?
Using non-depth cameras, for example to extract the person only from its background from a webcam, something like Tensorflow’s BodyPix.
Wonder if this would be the best for head segmentation for webcam face / background segmentation though since this is is whole body segmentation.
Cremashi’s topic mentioned above seem also to be an alternative based on Apple’s code, though broader, not only for people segmentation.

Also, some skeletal tracking without Kinect, using Tensorflow’s BodyPix mentioned above, or Tensorflow’s Pose Detection or Apple’s PoseNet Model & BodyPose API

Face Landmark Detection
Now we have already “Find Faces” with eyes recognition, but what about like a 3D mesh of the face like TensorFlow again their Face landmarks detection ?)

•••

Regarding Apple Models vs libraries like Tensor’s Models, I guess it’s again kinda the question about Vulkan/OpenGL or Metal ? If a Windows version is still in the pipelines, the Vuo Team would have to implement CoreML for Mac users, and add some workload to implement different techniques for other platforms, so open source multi platform tools require less effort ?
Of course, some Apple tools are really optimised for Apple products, so I it’s about testing out performances and possibilities and finding the right balance ? How for example does Apple’s Skeletal Tracking performs vs TensorFlow’s one ?
I’ve stumbled across some what seem to be very efficient technologies, like the Banuba ones which of course come with a paid license. Maybe those are the technologies used by Zoom for background removal (thought Snapchat would use it too, but I see they acquired another startup in this domain for their technology).
I say this because sometimes the open source and free libraries seem less performant, and I guess it’s up to the team to find those that work best, but those Tensor models seem to work pretty great !?

jmcc · July 6, 2021, 6:09pm

@Bodysoulspirit,

First, in response to Image segmentation for people - background, the best path might be for you to add to the ML-based background remover feature request more specific scenarios and skeleton parts (fingers, feet, etc.) that you’d like supported. That might encourage other community members to add theirs too. Then, whenever we are able to work on this, we’ll have information to cover the specific cases the community is interested in. At that time, we can pick the best library or libraries to implement. As for all feature requests that include external libraries, we’re limited to incorporating those libraries whose licenses are compatible with Vuo’s license.

Second, we’ve changed the title on what was Node set for skeletal tracking with kinect to reflect that we might implement a solution that doesn’t need a depth camera.

Third, If you create a feature request for Face Landmark Detection, we’ll need to know more about what you want to do. There is a difference between outputting a a list of points (MediaPipe) and a mesh. So the solution here would be to investigate libraries and determine an appropriate workflow.

Thanks for your continued interest in and support of Vuo!