Why is Video Conferencing so SH*T — PART I
Surely we can do better than this?
A job with 50 flights a year, and a life in back-to-back audio and video call meetings, and the same question over and over again in my mind: Why are all these conferencing platforms so shit? We’ve got AI making deep fakes from dead people in 4K with Dolby surround, and my colleague is still an unintelligible single delayed pixel falling off the edge of the screen out of sync, and no one seems to ever be able to share the right screen. No, I’m still seeing the Presenter View.
I quit a dream job with a multinational technology company last year to try and fix this. This is not my first rodeo but I quickly found out the problem was a lot deeper then I expected. Then Covid happened and now everyone’s asking the same question I was asking: Why are all these conferencing platforms so shit?
Let me take you down the rabbit hole I find myself that’s all about choosing where you attack on the value stacks, market dynamics of an oddly sized fragmented market, and what I think could be one of the greatest business opportunities I’ve seen in ages.
There’s been numerous surveys done on the problems with video conferencing, you’ll know the list: Audio issues, sync issues between video and voice, poor picture quality, latency, echo problems and hearing yourself, people not hearing you and not being able to interrupt them, the list goes on, long and familiar.
And then there’s been many physiological studies on why we have such a negative experience, from the effect of lack of eye contact suggesting dishonesty, to the overloading of the brain the division of people into little boxes on the screen causes, and the decremental effects of hearing yourself. So why haven’t the titans of technology, the Google’s and the Microsoft’s of this world, or the younger customer focused companies like Zoom, fixed these already?
The Heart of the Problem
The reason these tools are not great is actually really obvious if you’ve used a specialised commercial software or service, like a homemade HR or holiday booking system in your company, or one of those websites for a small local government service for example. They’re all terrible to use, full of bugs, and don’t work well or as expected. The reason is there’s just not that big a userbase to justify the investment to build something better. Building robust reliable services with well thought out UX takes a lot of investment.
Before Covid hit, the the video conferencing market globally was worth only $3.85B (USD). Which isn’t that big a number if you think about it. It was split about 50/50 between HW and SW/Service. Also, corporate conferencing, the bit you first think of, is only about a quarter of the market again, with other submarkets like Education, Healthcare, Government and Defence making the other pieces. If you think the corporate platforms are not great, you should see the other specialised ones, they’re shockingly bad. So we’ve started with not that great a number, divided it by 2, and then divided by 4. You’re now in the 100s of millions of dollars range.
And then of course, the market is fragmented, and each company has a smaller piece of this. When you get to those levels of revenue, quite frankly the return on investment on building, say a new advance echo cancellation system, isn’t great.
If you look at the teams behind many of these products, a large part of the operational cost is around the sales organisation, winning and keeping those customers, and the engineering teams are grossly under resourced and overloaded. The teams are working hard, they just don’t have the resources to implement all your AI powered signal processing desires, as well as evolving and tailoring their UX for different use cases.
Which brings me to the other characteristic of this market: The need for extremely high reliability. Imagine you were using one of those virtual background capabilities that are very fashionable at the moment, the ones that cut people out of their background and make them look like they’re on the beach (for some reason) and it didn’t recognise the face of your colleague that has darker skin properly, and they disappeared. You can imagine that happening, and that’s the exact reason you wouldn’t use that feature in a serious meeting.
For a capability to reach that enterprise level quality it needs to be truly well qualified for all edge scenarios, and all biases to be removed, and again that adds significant cost, and reduces the ROI further. And that’s the main reason these tools aren’t great. Making them better isn’t outside the roam of human capability, it’s just that they’re not worth it for the companies to do. All the best engineers in the world are off pushing the envelope on deep fakes, they don’t have time for this stuff.
How to Fix it
To find the solution we need to go back to those budget local government websites, and free service apps again. In the last 5 years some of them have got a lot better, Why? In big part this is due to the availability of various intermediate technology stacks that you can build your service on top of.
Everyone knows you’re not gonna go out there now and buy your own servers, you’ll use AWS/GCP/Azure for the compute and serving capability. But a lot of the underlying SW you need to deal with capabilities like accounts, billing, data management, security, and frameworks for UIs are either available for free, or there’s great paid for services, and they are so much better than anything you could ever make yourself, because they have the economy of scale. You’re left to make the application SW only, so your investment goes a lot further.
The world of video and audio conferencing doesn’t have this, you can’t buy in all those well-developed underlying capabilities you need, well not yet anyway. You’d have to build a lot of it yourself.
If you set out to build your own video conferencing platform tomorrow, say with a better screen sharing flow that was more usable, or say you wanted to focus and build something with certain features suitable for remote exercise classes. You’d likely get the HW on AWS, you’ll use open source for a lot of the infrastructure, you’ll likely use WebRTC which is open source, you’ll even be able to find a great white-label video conferencing SW platform, free or paid to take and tailor, with media streaming already implemented. But if you use these pieces and build your application on top of it, you’d likely have poor audio quality, and echo problems, and disappointing video performance.
If you want better quality there, even on par with the Zooms and Microsofts of this world, you’ll have to sink millions into making your own audio/video signal processing, since there isn’t anything out there free or paid for that’s any good and properly qualified. And yes big guys like Microsoft, Google and Zoom have their own, but they’re not gonna sell just that capability to you to build a competing platform to them, and there’s isn’t even that good, because they couldn’t justify the investment needed in the first place (as explained above).
Go a bit more application specific, that background swap feature, you’d have to build your own and fully qualify it to make sure it works for all kinds of webcams, and rooms shapes, and backgrounds, and skin tones, etc. That’s a reasonable investment, and that’s only to get you on par with the incumbents.
So unfortunately, despite your best intentions your platform will be just as bad, if not worse, and even though you have better screen sharing flow, or suitable application features, people will not want to use your product.
So how do we get out of this situation? How do we break the cycle? Join me next week for Part II, where we’ll look at how this market has started evolving and the great opportunities openning to truly disrupt this market.
If you want to learn and read more on the space, here’s some links to the studies, market analysis and companies I talked about in the article above:
OWLLabs — State of Video Conferencing 2018
A nice survey by OWLLabs on the the state of video conferencing in 2018, and some commentary on what some of the biggest challenges in using video conferencing is
Ksenia Klykova — The impact of videoconferencing on business travel: an historical point of view
Some commentary in the article on study of the effect of lack of eye contact in video conferencing, and how it causes lack of trust
Tome Warren — Microsoft Teams’ new Together Mode is designed for pandemic-era meetings
New feature from Microsoft Teams, but interestingly their study in seeing that having people in different boxes on the screen seems to make you more tired
Grand View Research — Video Conferencing market Size, Share 7 Trends Analysis Report
If you want to find out more information about the video conferencing market sizes and segmentations
Fortune Business Insights — Video Conferencing Market Size, Share & Industry Analysis, By Type, By Application, By Enterprise Size and Regional Forecast
Another look at the video conferencing market, with sub market breakdown — Corporate, Education, Healthcare, etc
OpenVidu and Kurento
There’s a few, but OpenVidu is an free or paid for white label video conferencing platform, based on Kurento, that you can take and build your own video conferencing platform on top of pretty quickly for example