Can computer science take the glitch and stall out of real-time video?

A Stanford team replaces the current patchwork of video conferencing technologies with an integrated approach to compressing and transmitting moving images over the internet.

Anyone who has participated in an internet video conference has experienced the frrrrrrrrrrrrreezes, st-st-st-st-st-utters and stalls.

Usually, the streaming resumes and the herky-jerkies end. Other times, the screen goes dark. Such glitches can range from an annoyance in an online teleconference to a life-threatening emergency in a remote-control tele-surgery.

Now Stanford computer scientists have developed a way to compress and transmit video over the internet that dramatically reduces delays and greatly improves picture quality relative to such familiar services as Skype, FaceTime, Google Hangouts and the WebRTC protocol built into the Chrome browser.

Doctoral student Sadjad Fouladi, who led the project, is presenting the research at a networking conference. The Stanford scientists are making their “clean slate” approach freely available so that companies that create and deploy streaming video may incorporate some or all of these new ideas into their products and services.

“Internet video has been developed for decades, and today’s systems have evolved into a bit of a patchwork,” said Keith Winstein, an assistant professor of computer science. “Sadjad has shown how to put the pieces together in a new and different way that improves the overall quality and robustness.”

Winstein cautioned that improvements may not be immediate. “We are rethinking how live video could someday make new applications like tele-surgery or robotic operation more reliable,” he said. “These improvements are going to be harder to incorporate into existing systems.”

New approach, new name

The Stanford team has code-named the video framework Salsify. Fouladi said it solves a problem that stems from the fact that today’s video conferencing programs are built from two separate pieces. One is a “codec” that compresses video. The other is a “transport protocol” that transmits packets of data based on an estimate of how fast data can be streamed without overloading the network.

These components have been designed and improved separately over many years, often by different companies, then combined into programs such as Skype or FaceTime. As Fouladi explained, the transport protocol and codec must work together to calculate how much data the network can stream depending on a variety of factors including the strength of the connection.

“When the transport protocol and codec get out of sync, we get glitches or stalls,” he said.

In Salsify, the Stanford team designed a codec that is closely integrated with a transport protocol. Salsify unites the frame-by-frame control of compression and packet-by-packet control of transmission into a single algorithm. This lets the video stream track the network’s capacity at each moment in time.

“Even a single bad frame can cause a glitch,” Fouladi said. “Salsify never sends a frame that might congest the network.”

Seeing is believing

The researchers used a battery of tests to compare Salsify with Microsoft’s Skype, Google Hangouts, Apple’s FaceTime and the internet standard WebRTC protocol implemented in the Google Chrome browser. On average, Salsify was measured to reduce delays fourfold, while increasing picture quality by at least 60% on a standardized scale (known as structural similarity, or SSIM). (The researchers have created a side-by-side comparison of Salsify versus the internet standard WebRTC protocol and created a website to share their research.)

The Stanford team has released Salsify as open-source software, meaning anyone is free to download, study and change the code to use, improve or adapt it. They hope elements of their approach will percolate into practice.

“We’ve all had bad experiences with video conferencing,” Fouladi said. “It’s fun to work on a problem that’s been a big headache for a lot of people.”

Other co-authors on the research are PhD students John Emmons and Riad S. Wahby, master’s student Emre Orbay, and Catherine Wu, an intern who is now a junior at Saratoga High School in Saratoga, California.