Kafècho: Using Akka for video editing.

I am super lucky to work for Quantel a UK-based company which makes high-end video-editing hardware and software for post-production studios and TV broadcasters.

Our equipment has been used to edit many movies over the years (Avatar, Lord of the Rings) and is used by major broadcasters throughout the world (the BBC, BSkyB, Fox, etc..).

Quantel's broadcast systems roughly follow a client/server architecture. A server is a beefy machine which can ingest a large number of high definition feeds very quickly and make them available via speedy network connections to video editing workstations. For reason of speed and efficiency, the server is a blend of custom hardware (DSP, FPGA), custom operating system, custom network stack, etc, etc.. Video editing workstations are mostly C++ again because they have to move and display things very quickly.

At Quantel, I work in the broadcast domain, mostly building JVM RESTful web services to help our video editing workstations with metadata level workflows (find video clips with certain properties, organise assets so they can be found).

Although a lot of the current stack uses custom hardware and proprietary software, there has been a trend in the industry to produce solutions based on off the shelf operating systems, industry storage and standard video formats. To that end Quantel has started a new product architecture called RevolutionQ ( more information here) which is based around a video editing standard called AS02.

Unlike the existing Server technology, the new stack relies heavily on the Scala, Akka and the JVM for doing some of the video processing.

We are building Scamp, a video processing framework out of Akka which works in a similar fashion to GStreamer, but is entirely based on Actors and runs on the JVM. We've used Actors to implement video processing pipelines where each stage is a standalone media processing operation such as extracting H264 video access units, manipulating AAC audio blocks, AC3 data, timecode, metadata, etc, etc.. Most of what we do does not require transcoding (I think this would be too costly), we mostly do "transwrapping" which is the process of extracting bits from one format and wrapping it in a different format.

Currently we are using Scamp to build a video recorder which can record a large number of satellite feeds with high-bitrate content (over 50 Mb/s) and make that content available for near-live video editing via HTTP. The recorder works in a similar way to a PVR. It receives a multiplexed signal (MPEG2 Transport Stream) which is a series of small packets. We use a chain of actors to re-combine packets into their corresponding tracks (video track, audio track) and actors to persist video and audio data to storage. The framework is quite flexible as you essentially plug-in actors together so you can further process the video and audio essence for long-term storage or re-streaming via HTTP or other protocols. So for example, you could build a system which records a satellite feed and make the content available straight away via HTTP live streaming or MPEG DASH.

But more importantly, the system is built to be non-blocking and purely reactive. As Akka takes care of threading and memory management, we don't have to worry about that and the system scales up very well (we run on 24 cores machines and Akka has no trouble spreading workloads around). The code is also very immutable thanks to the use of the ByteString structure. In my experience, immutability makes it very easy to reason about the code. If you've looked at the source code of existing C++ video frameworks like VLC or some of the Intel video processing libraries, you will find a lot of rather hairy threading / locking / shared memory code which is rather hard to write properly.

I appreciate we are not writing code which can run as fast as its C++ equivalent, but we can write the code faster with the confidence that it will auto-scale because the framework takes care of that.

Akka has many interesting features. The most recent ones I've been experimenting with are FSM for representing state machines. There is an industry standard for modelling a processing job which captures video (think of it as if someone standardised the protocol of a PVR). A job can be in different states, and based on the state can perform certain actions. It turns out Akka FSM are a very elegant way of modelling this in very few lines of code.

Another interesting feature is the new Akka I/O layer and the use of pipelines for encoding and decoding. A lot of trans wrapping is just that, extracting binary data from an envelope, combining it with more binary data into a different structure and so on. For example, when you demultiplex an RTP stream, you build the following chain: UDP -> RTP -> TS packets -> PES Packets. The new Akka pipeline approach (which comes from Spray) makes it very easy to build typed, composable and CPU efficient chains like those.

And finally, I am also looking at using Spray (soon to be Akka-Http) to build web interface, so potentially we will have a RESTful way of controlling the recording of a lot of streams at the same time.

Towards a giant PVR.

Kafècho

Thursday, 24 October 2013

Using Akka for video editing.

2 comments: