A couple of months ago, I posted a "call to action
" regarding the state of Google's MediaStream API, which is meant to be used with cool new web technology like WebRTC. You've probably seen WebRTC already. It allows you to record video and audio through your webcam without having to use Flash or some sort of third-party technology--it's all handled through the browser. This is particularly cool because it can allow for things like real-time video chatting without all parties having to download extra software (for example, Chat Roulette operates on Flash... if you hate Flash, you're fucked). With WebRTC, as long as your browser supports it, you're in.
A lot of people don't know about WebRTC though and kind of float away from it because of that, so I'm going to share my experience with it in a non-complainy context because it's kind of awesome!
Right now, the backend of the API for MediaStream is unimplemented and as a result, most of the web applications using it today are only using it for exporting single-frame images. For example, if you are a Tumblr user, you've probably seen WebRTC in use! From your dashboard, if you click to add a photo, and then select "Take a Photo," Tumblr will ask your permission to use your camera through your browser. It'll then give you the option to take a single frame photo or build an animated GIF.
I'm such a nice young lady.
This is kind of mysterious though, right? Is the NSA spying on me through my webcam every time I log in to RedTube and watch porn? I mean, I don't know about you guys, but I do some pretty weird crap when I'm left alone with my laptop, like make weird dinosaur faces and yell at things and sing songs that I've adlibbed, so I don't want iSight to just randomly start recording and sending my shit off to some rogue website.
But it's not really mysterious if you look into it some.
So here's where we begin:
For the sake of avoiding hairy spaghetti code, I'm condensing all the different browsers' implementations of getUserMedia, which are vendor-prefixed into navigator.getUserMedia so that I can reuse that.
getUserMedia is an HTML5 API that has
been implemented. This is the API which will fetch input from your webcam (or other video input device) or microphone. If you give the request permission and it is successfully initiated, it will return to the success callback a stream object. The most obvious thing to do with this stream object is to give the user some affirmation that it has been successfully initiated by showing it back to them.
Your browser will ask for your permission to let websites access your camera every single time it tries to do so, unless the site has an SSL certificate. In which case, you are given the option to permanently allow it access for that one site.
So for the sake of simplifying things, let's pretend we're just recording video!
So what is that going to get us? Well.
We can use the HTML5 File API to take our stream and turn it into a URI that is usable by elements on the webpage, using URL.createObjectURL. This is pretty nifty and can take either a file or a blob object (which streams are!). In my example above, self.videoRecorder is a reference to a <video> element on the page:
But I can't really do anything with this, can I? I mean, it's just a read-only video at this point. That's incredibly useless unless I just want a mirror to look at while I put on makeup or whatever. So what now?
This is when it gets interesting and kind of hackishly amusing.
Canvas Exporting to an Array. Oh god, why.
This is a recursively called function that will get called for as long as the user is "recording." It draws an individual frame of the stream, as it is played inside of the video element onto our canvas (which is not attached to the page's DOM anywhere) and converts those individual canvas images to a data URL, stored in webp format. toDataURL is a method of canvas. You could pass it another format other than webp, such as png or jpeg. It'll default to png if you don't pass in that parameter. This code stores all of the frames inside of an array.
If we were just trying to take a photo with the webcam, we'd be done... we'd have our one image, no need to recursively call this method over and over again. But we're recording a video, which has several frames so we still have more work ahead of us.
But now we just have an array of still frames. "What the fuck am I going to do with that, Aimee?" you ask.
We're only halfway done, chill.
My Blobby Babby
We want to push this back into a blob once more. I've used Whammy.js
to do that. It's probably the most popular tool for encoding webm using JS alone.
I also created another data URL from that blob and fed that to the <video> tag that we were showing the stream in during the recording session, so that the user can watch what they just recorded.
What you do with this blob from here is your own business. In my case, I pass it over an ajax request to my Rails controller to do further processing through ffmpeg
(which has its own varied caveats, including configuring nginx to not barf trying to receive binary data larger than 3MB and extending request timeouts, but that's beyond the scope of WebRTC).
If you want to continue with the trend of doing everything sans server, however, you can easily offer the video to the user to download by passing the data URL into a link!
That's the long and short of it. There's a lot of really cool stuff you can do with WebRTC, especially if you are using NodeJS and can take full advantage of multithreading. So the next time someone mentions WebRTC to you, don't get that glazed over look in your eyes, think about the possibilities