Capturing A/V Multimedia Streams From the Browser

Background

I’ve recently been working on finishing up an application to help gather presentation feedback when demo’ing a talk for the first time in front of a live audience. The app will play back a video stream and overlay the audiences comments on top of the video in real time. However I haven’t open sourced it yet because there was a dependency on launching an external video application to do the recording, and have the user upload that to the server. I wanted the application to do “in browser” audio/video recording so that it was more seamless with the overall web application.

Turns out A/V recording is still pretty experimental on the web, but is slowly rolling out in major web browsers. The key component to understand when dealing with A/V capture is the getUserMedia API. The getUserMedia API is an in progress API to access audio and video streams. You can find a decent intro on this API here.

I was then wondering how to capture the streams recorded by the API and found this article which covers video only recording. The MediaStreamRecorder API is currently unimplemented in chrome, but there is a nice interface implemented in whammy.js that the author uses to demonstrate recording video. However, we also need sound, and it turns out based on the comments in that article that others are looking for that too.

There is another github project that I found called recorder.js which can help us record the audio stream. It uses the AudioContext interface to export the audio as a blob.

Combining these two ideas, I can capture separate audio and video streams, and so they must be combined. Theoretically this could be done with a javascript encoder, but I wasn’t sure if one exists that fits this use case or how much computing power that would take on the client side. Therefore I chose to encode the streams server side with avconv (aka. ffmpeg). When the recording is completed the client sends HTTP POST requests to the server with the content of each stream. The video is webm format the and audio is wav.

The command to do our transcoding is pretty simple. On the server side after you have received the files from our POST requests, you would do something like this, replacing the input files names with those from the recording:

avconv -i input.wav -i input.webm -acodec copy -vcodec copy output.mkv

And the file output.mkv that is produced will contain the video with your audio stream.

Implementation

If you’re interested in an example implementation you can find my code here. My next steps are to see if I can dig deeper into WebRTC to send both streams to my server and transcode them on the fly, instead of consuming the resources on the client side to hold the audio and video streams in memory until the video is complete. If you have a better way of accomplishing A/V recording please let me know in the comments!