As IT and consumer technology make increasing advances into the broadcast production environment, an area which still remains largely hardware-dependent is the transmission suites themselves. The production and delivery of a moderately complex television channel may involve a dozen or more discrete hardware components linked together by an automation system.
With the continued emergence of new platforms and distribution channels, the costs associated with this conventional approach are a considerable barrier to entry into the broadcast market for new players. Furthermore, the industry is seeing an increased demand from new and emerging markets for less complex solutions.
COMPLEXITY IN ACTION
In the broadcast industry the drive for new technology has been relentless and broadcasters are always looking to build increasingly advanced production facilities with higher levels of automation and more sophisticated workflows.
On top of this, the ever-changing market is demanding more channels from a wider range of delivery platforms at lower cost.
So what has been the industry's answer to this?
Increasingly complex and specialized products from a diverse number of manufacturers, brought together on a customer-by-customer basis in an attempt to offer a fully-integrated solution from a number of discrete parts. In other industries the explosion of IT and the switch from dedicated hardware solutions to software applications have drastically altered business models.
And, while huge advances have been made in our industry in the increased use of IT solutions for storage, editing and content production, the actual act of pulling all these elements together to produce a finished transmission channel remains highly specialized and hardware intensive.
Let's take a look at a typical complex transmission chain (Figure 1).
When we add automation to this, it gets even more complicated
Typical Transmission Chain with Automation Layer
Each device in the chain has to be interfaced to the automation; sometimes this is done via a network connection but, in many cases, it is still done with serial cables, all of which use different standards and protocols.
Of course, not all transmission chains are as complex as this but, in many cases, this is because the costs and complexity of owning and operating a fully-featured transmission chain can be prohibitive. Many large multi-channel broadcasters simply resort to cutting content together using a router or directly in the video server and have little or no branding except for that recorded on the video clips.
The market, however, is demanding ever more sophistication and enhanced production values in order to differentiate between the increasing number of channels on offer to consumers.
There is a huge growth opportunity for new television channels, especially in emerging markets such as Africa and Asia and community television in the US, and that this is being held back by the cost and complexity of the production of the linear transmission stream.
With the growth of IP distribution do we even need linear transmission streams any more? Is it not enough to just produce your programs and release them to the Internet for people to download as and when they like? For a few simple reasons, I believe the answer is a very firm 'yes' in favor of the linear transmission stream.
> Broadcast television is still by far the most efficient way of distributing a program to a large audience, the vast majority of which will want to watch it as soon as it is available. And, although there is a trend towards 'lean forward' viewing habits - especially amongst the younger generation, who wish to take more control over what they watch and when they watch it - there will always be a time when people will want to sit back and watch an evening of quality television that has been selected and assembled for them.
Another compelling argument is radio, In theory, there is no reason for us to need radio stations at all. People can download music from the Internet and build their own playlists; many radio stations now keep an archive of all their programs on line for you to listen to at your leisure.
So, have technology and IP distribution killed radio? I think not. In fact, there has been an increase in the number of radio stations and the cost of building new radio stations has fallen considerably driven, interestingly, by exactly the same kind of technology revolution that we are talking about here.
Ten years ago radio was exactly where television is today - dependent on highly specialized custom hardware and proprietary systems. Nowadays, the average small radio station is little more than a network of PCs and some hard disk storage and I see no reason why the same transition cannot occur in television.
SCOPE OF THE PROJECT
Approximately eighteen months ago we began a project to see how far we could get in building a complete end-to-end playout solution using only standard computer hardware and software. We all know that computers can play back video and that, with the addition of a specialist output card, high-spec machines are quite capable of playing broadcast-quality 601 video.
But for this project to deliver genuine benefits, it needed to offer far more than this and consequently the goals we set ourselves were fairly aggressive:
(1) Multi-format video content with real-time resolution conversion - MPEG, WM, DV etc.
(2) Real-time master control vision effects
(3) 2D DVE for squeeze and tease
(4) Sophisticated audio transitions
(5) Voice-over with ducking
(6) Multiple audio tracks including AES and Ac3
(7) Multiple simultaneous linear keys
(8) Animated logos
(9) Still store
(10) Character generator
(11) Preservation of VANC data through the video path
(12) Real-time insertion of VANC data on the output for subtitles, VChip etc.
The first decision we had to make was the Operating System and environment. From my perspective the choice was simple: Microsoft Windows XP and 2003 Server. Yes, I know Windows is not a real-time Operating System and the chances are that your own IT department has difficulty keeping your email system running for more than a few hours at a time. But the reality is that Windows-based machines run some of the largest and most mission-critical systems in the world and the huge advantages that come from using an open and well-supported development environment far outweigh the perceived reliability issues.
The AMD architecture was particularly suited to the extreme graphics-intensive operations that we were going to demand. We worked closely with HP and AMD during the development period and have standardized on the Opteron line of processors.
We decided to design the system around the Windows DirectShow Framework which enabled us to take maximum advantage of existing technology such as software decoding and to selectively replace sections of functionality where the demands of the broadcaster were beyond what is provided as standard.
The first challenge was to achieve reliable full-resolution video playback. Windows Media Player will quite happily play back the broadcast resolution video file but you will find that, if you play a 2-hour movie, it might play for 2 hours and 30 seconds or 2 hours and 3 seconds depending on what else your machine is doing at the time.
What we needed to achieve was a deterministic performance of one video frame off the disc, through the PC and out of the video card every frame without any frames being dropped due to processor usage or disc/network activity.
This was done by developing our own input buffers upstream of the software decoders and an accurate timecode-locked reference acting as a master clock for the software.
The next and most complex phase of the project was the development of our own software vision mixer that would enable us to perform all the standard master control effects, including 2D DVE moves. While Windows supplies mechanisms for blending video pictures together, it does not provide anything like the level of performance we were looking for and, at best, only operated at frame rate when, to achieve silky-smooth DVE moves, the image must be manipulated every field.
The first transition we focused on was the humble mix on the grounds that if we could do this, we could do all the others. Blending two full-resolution Dl images together at field rate involves doing mathematical calculations of every pixel and is not for the fainthearted.
Early results were encouraging. The sheer horse-power of AMD processors has allowed us to blend two full-screen broadcast images together in less than 30 milliseconds. This was within our window of the single video frame, but did not leave a lot of time to do other things. However, by utilizing the DSP functionality contained within the latest-generation chips, we were able to make a significant improvement in this performance and reduce the average time to blend two full-resolution video frames together to less than two milliseconds.
Windows is a multi-tasking environment and, by default, the performance is not guaranteed. To achieve the real-time performance we needed for a broadcast system, a lot of effort went into developing a control mechanism for threads. This now ensures that the transitions between consecutive videos are smooth and the output frame/field is always there at the correct time.
A similar technique was used to give us independent control of downstream logos with full linear key support. But, being software based, we were not limited in the number of logos we could have on screen at any one time. In fact, we found that we can completely cover the screen wilh independent logos and still maintain the performance.
The completed software architecture looks like this:
This high-level view of the software architecture shows how we constructed the software equivalent of a typical transmission path.
In order to do vision effects we need to manage at least two streams at the same time; these are read from the storage across the LAN in chunks and stored in software buffers before being passed to the DEMUX, which splits the stream into its component parts. These are then passed to the appropriate decoder.
The software decoders use the plug-in architecture of DirectShow and give us the ability to select from a wide ^variety of third-party decoders for different content types.
After the decoder, the image is scaled/sub-sampled into the correct size and shape for the defined output resolution. This can range from web quality SIF and QSIF right up to HD 1920x1080, although HD obviously requires much more processing horse-power and so runs on AMD's dual-core processor systems.
The vision mixer is the heart of the system and uses / DSP instructions to perform highly optimized, full-screen transitions including all the standard master control effects.
The vision mixer module is also responsible for doing 2D DVE moves and allows the independent sizing and positioning of 2 channels of full resolution video over a background graphic. One of the unexpected benefits we found of moving to a software-based solution is that we do not suffer from frame delays when the DVE is put in and out of circuit.
One of the areas where a software system really comes into its own is audio mixing and effects. Rather than simply having the audio follow the video on transitions, we are able to provide a much higher degree of control that is increasingly being demanded by the customer base. This allows the independent control of the incoming and the outgoing audio levels, sometimes referred to as Lead and Lag.
Real-time audio level control is also a trivial task in a software environment and, for the first time, offers the capability of adjusting levels on individual clips in the play list.
The next stage in the chain is the linear keyer. This allows any standard graphics file to be re-sized, positioned and used as a logo. If the file contains transparency data, this is used to blend the image over the underlying video stream.
Animated logos are also becoming extremely common and animated .gif files can be produced with many standard software applications. These can also be handled by the software and looped to provide continuous playback.
Although the vision mixer module was capable of doing audio mixing, we decided to follow the broadcast chain analogy and provide an additional downstream audio mixer for the insertion of recorded voice-overs, usually in the form of .wav files. The module provides the ability to adjust the amount of background audio suppression as well as the speed of the ramps.
The final stage of the process is the re-insertion of any VANC data - such as closed captions - that was captured at the ingest point. As there is no mechanism for handling this as part of the video stream, it is stored separately in the database and frame accurately reinserted at the final output stage. In addition we retain the ability to insert additional VANC data, such as VCHIP data, in real time. The flexibility of the software model makes it very simple to add additional customer-specific VANC data without the need to keep adding downstream hardware.
Integration with the automation system is also extremely important for this project, so we decided to build a next-generation automation product around the capabilities of this playout solution. Everything can be controlled from directly in the playlist, providing more creative control yet at the same time making the system very intuitive to use.
In traditional systems, graphics and captions are stored on external systems and merely triggered at the correct time by the automation. Changes and updates must first be made in the external system and then referenced from the automation. Using a new integrated approach, the captions and graphics are stored directly in the playlist and can be previewed and edited in place using the same user interface.
The main automation interface uses both a playlist and a timeline to represent the schedule. Other displays can show many channels at the same time and the timeline makes it very easy to represent multiple overlapping secondary events.
The result of this attention to the user interface and complete integration to the master control chain is that learning times and operational errors are greatly reduced. Experience with Beta users has shown that most customers are building and running complex schedules in a matter of hours.
Having demonstrated that the system could work, the next question we needed to consider was 'is it reliable?' Well, the simple answer is 'yes'. We can quite happily play complex schedules with master control effects, voice-overs, logos etc. 24/7.
As the first real-world systems are deployed I believe that, in the long term, they will prove to be inherently more reliable than conventional technology simply because there are fewer moving parts and, consequently, less to go wrong.
All well and good, but what if the single server that replaces all of this goes wrong? All your eggs are in that one basket. The solution is simple: have two baskets. The huge reduction in cost of the software approach makes it financially viable to have 100% redundancy of your entire transmission chain with the simple addition of a single 1U HP DL145 server.
Let's take a look at the hardware, then. What would a typical IT-based TV station look like?
What are the key advantages of this approach?
Firstly, a massively-improved user experience. Since all the software, including the automation engine, is integrated in a single package the users have everything immediately available at their fingertips. They can create and edit captions, check and adjust subtitles, create DVE moves, preview and modify clips - all without leaving the single user interface.
Next, it provides an open media architecture. A large range of standard file lormats can be easily imported into the system and taken to air. Video clips in a variety of formats and resolutions, both standard and high definition, can be mixed in the same playlist. Graphics and logos can be created using standard off-the-shelf packages such as Final Cut Pro and Photoshop and imported into the system simply by dropping them onto a network fileshare.
Installation and training are simplified as a result of the reduced number of components.
The cost of ownership and maintenance is drastically .reduced due to trielack of specialist hardware. Furthermore, hardware replacement parts are easily available from standard IT sources.
The systern_is easily expandable to accommodate changing business requirements. Additional channels can be added with minimal financial outlay: a fully-featured channel requires just 1U of rack space and correspondingly reduced cooling and power requirements.
And finally, as a software solution, enhancements and, upgrades are easily applied thereby providing a future-proof environment.
THE END OF THE LINE? So does this mean the end of conventional broadcast equipment as we know it? I don't believe so. It will inevitably challenge many hardware manufacturers to investigate areas where they can add value over and above standard IT solutions. However, the broadcast arena will always remain a very demanding and high-performance industry and will therefore continue to require a large amount of specialist hardware and control panels to surround these software systems.