A video codec for machines seems like a good topic for the first of April, or an article on the Onion. But based on a recent press release by Gyrfalcon Technology, this may become a real thing as the company partnered with China Telecom, and proposed a new video codec called “Video Coding for machines” (VCM) that provides compression coding for machine vision and human-machine hybrid vision.
Apparently a recent study published by Cisco in 2018, humans will become bit players in the “video watching business”, and Machine-to-Machine (M2M) applications will represent the greatest usage of Internet video traffic over the next four years. So the goal of the VCM group will be to establish a new standard that will improve the previous generation video coding and decoding standards such as H.264 (AVC), H.265 (HEVC) and H.266 (VVC).
Few details are provided so far, and I can’t find any VCM group in a web search. Obviously, this will not be for robots watching TV;), but more likely for AI and IoT applications that may use computer vision. Humans like to have plenty of colors (10-bit/12-bit color depth) and resolution (1080p and 4K), but recent machine learning algorithma are often happy with 4-bit depth and fairly low resolutions (320×240 or even lower), so I can only assume VCM will be optimized for those use cases.
Beside China Telecom and Gyrfalcon, the VCM group will include Johanneum Research of Graz, Austria, Leibniz University of Hannover, Germany, Peking University, Zhejiang University, the Institute of Computing Technology, as part of the Chinese Academy of Sciences, Huawei, ZTE, Lulu, Sony, NEC, Softbank, Honda, Samsung and LG.
If you want to follow the progress of the standard a mailing list has been setup for this purpose, and presentation entitled “Requirements of video analysis and semantic compression_Yuan ZHANG.pptx” was shared in one of the threads.
Jean-Luc started CNX Software in 2010 as a part-time endeavor, before quitting his job as a software engineering manager, and starting to write daily news, and reviews full time later in 2011.
7 Replies to “MPEG Video Coding for Machines (VCM) is in the Works”
Regarding Cisco’s predictions… I attended one of their presentations 20 years ago where they showed an awesome graph with one straight line representing their prediction of data usage over the next decades, and an exponential curve representing voice usage, thus justifying their move to the VoIP business…. Nowadays people have “phones” in their pockets, that are usable and used for everything but voice: games, facebook, movies, … So if you don’t mind I’ll take this new prediction with a grain of salt!
Now I know where they got those predictions. Cisco’s Visual Networking Index: https://www.cisco.com/c/en/us/solutions/collateral/service-provider/visual-networking-index-vni/white-paper-c11-741490.pdf
Every forecast for IoT, of devices connected and equipped with sensors and cameras, shows the population of Machines being in the 10’s of billions (30b-100b) in the next decade. Machines arent like people, they dont sleep and can constantly collect, monitor and act of information and data. Combine this insight with AI and the prediction of machines dominating the generation and use of video is one of the most beleivable I have encountered in the last 10 years.
The vast majority of such devices, by far, will just be basic sensors, collecting simple data such as temperature every minute or so, and sharing it over MQTT. Also it’s important to consider the global power consumption : at 100 billion devices, you cannot expect to make them too smart, because if you eat even just 1 watt per device, you’ll need 100 additional nuclear reactors on the earth just to power them. We’re simply not even able to build them in just a decade, so power usage will necessarily have to be limited, and devices intelligence as well.
Why are you talking about power consumption ? Theses “new” devices will replace older ones and only a small part will be for new usages. In most cases, you can replace one big box with many sensors (including a PC + monitor + keyboard + mouse) with a few IoT MCU and dividing the power needed by 100.
Just because if we predict that every single person in the world (babies included) will have 2 to 14 devices connected all the time, you will see that it will definitely require a lot more juice than what we’re currently using. All of these will *not* replace PCs but will complement them. You’ll still need the human-machine interface to communicate with them (be it a smartphone, tablet or PC) and you’ll see that the vast majority of people on this planet who still don’t have one will need one just to access such devices.
Added VCM presentation at the end. Note that I was told that “the sildes were just some primitive thoughts and VCM has changed over time”