https://www.mapleidentity.com/forums/showthread.php?tid=2560
https://tfaforum.org/showthread.php?tid=13609
https://forummafia.net/showthread.php?tid=62180
http://prayformypet.com/board/showthread.php?tid=26739
http://forum.realtor-room.ru/showthread.php?tid=1786
http://www.nksoa.org/mybb/showthread.php?tid=34927
http://forum.engesoftbi.com.br/showthread.php?tid=27341
https://mysourcetelevision.com/forum/showthread.php?tid=149469
https://ekgelirrehberi.com/showthread.php?tid=10733&pid=11251#pid11251
An aspect of video calls that many of us take for granted is the way they can switch between feeds to highlight whoever’s speaking. Great — if speaking is how you communicate. Silent speech like sign language doesn’t trigger those algorithms, unfortunately, but this research from Google might change that.
It’s a real-time sign language detection engine that can tell when someone is signing (as opposed to just moving around) and when they’re done. Of course it’s trivial for humans to tell this sort of thing, but it’s harder for a video call system that’s used to just pushing pixels.
A new paper from Google researchers, presented (virtually, of course) at ECCV, shows how it can be done efficiency and with very little latency. It would defeat the point if the sign language detection worked but it resulted in delayed or degraded video, so their goal was to make sure the model was both lightweight and reliable.
https://www.fiftyeyes.com/showthread.php?tid=714
https://insigniagsdrivers.co.uk/showthread.php?tid=92178
http://forum.naronanews.com/showthread.php?tid=7174
http://concerns.sportshouse.com.ph/showthread.php?tid=7752
http://concerns.sportshouse.com.ph/showthread.php?tid=43851
https://forum.hacknslashworld.com/viewtopic.php?f=9&t=43584
https://forum.hacknslashworld.com/viewtopic.php?t=5428
https://forum.hacknslashworld.com/viewtopic.php?t=11650
https://forum.hacknslashworld.com/viewtopic.php?t=470
https://forum.hacknslashworld.com/viewtopic.php?t=8145
https://forum.hacknslashworld.com/viewtopic.php?t=7301
https://forum.hacknslashworld.com/viewtopic.php?t=191
https://forum.hacknslashworld.com/viewtopic.php?t=16293
https://forum.hacknslashworld.com/viewtopic.php?t=12916
https://forum.hacknslashworld.com/viewtopic.php?t=6431
https://forum.hacknslashworld.com/viewtopic.php?t=13507
https://forum.hacknslashworld.com/viewtopic.php?t=8554
http://www.flyingfish.nl/forum/viewtopic.php?t=1929920
http://www.flyingfish.nl/forum/viewtopic.php?p=2326980
http://www.flyingfish.nl/forum/viewtopic.php?p=2333407
The system first runs the video through a model called PoseNet, which estimates the positions of the body and limbs in each frame. This simplified visual information (essentially a stick figure) is sent to a model trained on pose data from video of people using German Sign Language, and it compares the live image to what it thinks signing looks like.
This simple process already produces 80 percent accuracy in predicting whether a person is signing or not, and with some additional optimizing gets up to 91.5 percent accuracy. Considering how the “active speaker” detection on most calls is only so-so at telling whether a person is talking or coughing, those numbers are pretty respectable.
In order to work without adding some new “a person is signing” signal to existing calls, the system pulls clever a little trick. It uses a virtual audio source to generate a 20 kHz tone, which is outside the range of human hearing, but noticed by computer audio systems. This signal is generated whenever the person is signing, making the speech detection algorithms think that they are speaking out loud.
http://www.flyingfish.nl/forum/viewtopic.php?p=2405895
http://www.flyingfish.nl/forum/viewtopic.php?p=2386502
http://www.flyingfish.nl/forum/viewtopic.php?p=2407424
http://www.flyingfish.nl/forum/viewtopic.php?p=2341767
http://www.flyingfish.nl/forum/viewtopic.php?p=2331743
https://modelcarsforum.com/showthread.php?tid=24000
Comments
Post a Comment