[Note: this is a guest post by Ori Pessach. Ori is one of the most gifted software developers I have ever had the pleasure to work with and fortunately, also our senior-most developer across our video platform]
I was asked to be on a conference call with a customer a few weeks ago. The customer was using our system for a while and it was working well, but then, mysteriously and suddenly, video streaming stopped working. It was a baffling problem – the customer assured us that he did not install any new software on his PC, that every other application that uses the network was still working fine, and we just couldn’t figure out what could be causing that.
I was asked to be on that call because I’m in charge of developing our video player. One of the challenges we had to tackle around the time I joined Envysion two years ago was making video playback rock solid. We install our recorders in many different environments, from small mom and pop stores with the cheapest DSL connection that money can buy to tightly controlled corporate computing environments where the customer’s IT staff is very wary of any new piece of equipment that appears on their network. Our system is designed to work reliably with no need to configure anything on the video recorder – in most cases, installation involves plugging everything in and powering the system up.
Since our client is web based, it works everywhere where our customers can get connected to the internet – if you can browse the web, you can use our application. But there is one exception: The video player is a browser plug in, and it opens its own network connections to stream video from our system, and we found that it can be a little tricky to get that to work reliably. Sometimes, web proxies didn’t like the way we formatted our video stream and blocked it. We found that some anti-virus products handled our network traffic incorrectly and blocked it. We fixed those issues, and did so relatively quickly, largely thanks to a feature of our video player that most of our users will never see – the debug log.
I’ve been writing software for over 20 years, and over that time I came to realize that software can be broadly (and, I have to admit, glibly) categorized into two groups: Software that produces extensive debug logs, and software with mysterious, hard to reproduce bugs that never get fixed.
A debug log is simply a file containing messages that the software outputs during its normal operation. Those messages aren’t meant to be seen by users, but they can be an invaluable tool when tracking down bugs. It’s a bit as if the software is talking to itself while it’s working, describing everything it does, sometimes in great detail.
Why is that important? Remember that we were trying to track down various issues that manifested themselves in different network configurations, on machines running various combinations of software that may or may not have an effect on the problem. In almost all the cases when playback failed, the bug report was identical: The user tries to play video, the player displays a message saying that it’s connecting, but video never plays.
This sort of problem can be very difficult to fix because the developer most likely never sees it. And it happens all the time – take this cnet report of a bug in Adobe’s Flash plugin:
http://news.cnet.com/8301-17939_109-10027752-2.html
and this response from an Adobe employee:
http://blogs.adobe.com/jd/2008/08/firefox_video_dropouts.html
If you’re not in the mood to read two lengthy articles about an obscure bug in Flash, here’s the executive summary: The cnet blog post talks about a bug that causes Flash to stop playing video after 2 seconds, and the Adobe employee’s reply boils down to “it works for us, and saying that it doesn’t work does not a bug report make.”
Which is very true. And also very wrong. It’s technically true, because most bug reports don’t provide enough useful information for a developer to fix the bug, or even know what the bug might be. In most cases, a developer fixing a bug in a piece of software spends a lot more time trying to figure out what’s broken than actually fixing the problem, and it’s a long and tedious process which can involve spending time with the customer trying to understand what it is that they’re doing, and what exactly they mean when they say “it doesn’t work.”
But it’s very wrong, because the customer doesn’t care about any of those things. The customer sent you a check, and he expects the product to work. He doesn’t want to spend time talking to technical support, and he doesn’t want to read obscure error messages over the phone, and he has better things to do than spend any time trying to help us fix the product he paid good money for thinking that it was working fine, because that’s what we told him when we sold it to him.
And this is exactly why I like software that logs extensively – because it makes customers happy. How? Simple – now, if a customer sees a problem with our player, I see the problem, because the player tells me where the problem is. I don’t have to ask the customers and have them repeat their actions while describing them over the phone, because I have a log of their actions and everything that the player did in response to those actions. Typically, once I have the log it takes me a few minutes to figure out what went wrong and offer a workaround.
I was reminded of this a few weeks ago while on the conference call with our customer. I was getting ready to run some diagnostics on the customer’s PC when someone in the room suggested that we use the Big Red Button.
The Big Red Button is a feature in our application that submits an instant ticket report from the application itself. One of the things that the ticket report includes is a short summary of the player log. I helped implement that feature, but I completely forgot about it. Still, there it was, and in less than a minute I was looking right at the problem – the customer had a new firewall that was installed recently, right when our application started misbehaving, and the firewall’s response to our network messages appeared in the player log – the player was being blocked. We were able to offer a workaround on the spot, and to confirm that it works.
We didn’t implement that feature because it was cool. We implemented it because it’s 100% necessary to serving our customers, and to making sure that they’re happy. Our job isn’t done when we cash the customer’s check – it’s just beginning, and that’s exactly why what we do is called Managed Video As A Service.
