Programmable Media Processor TriMedia TM-1000 The TriMediaTM TM-1000 is a general-purpose microprocessor for real-time processing of audio, video, graphics, and communications datastreams. In a single chip, TM-1000 combines an ultra-high performance, low cost CPU with a full complement of I/O and coprocessing peripheral units. In consumer electronics appliances and personal computing products, the TM-1000 media processor performs multimedia functions with the advantages of special-purpose, embedded DSP solutions -- low cost and single-chip packaging -- and the programmability of general-purpose CPUs. It improves time-to-market through highlevel C/C++ language programmability and delivers throughput of up to four billion operations per second. MULTIMEDIA APPLICATIONS TM-1000 is an ideal building block for any multimedia application that requires processing of video, audio, graphics, and communications datastreams. It is well suited for applications ranging from single-purpose systems such as videophones, to reprogrammable, multipurpose devices such as set-top boxes or web browsers. FEATURES + Processes audio, video, graphics, and communications datastreams on a single chip + Powerful, fine-grain parallel, 100 MHz VLIW CPU with separate instruction and data caches + Independent, DMA-driven multimedia I/O units to format data and multimedia coprocessors to offload the TriMedia CPU of specific multimedia algorithms + High-performance bus and memory system to manage communication between TriMedia processing units + Instruction set includes RISC, multimedia, SIMD-type DSP, and IEEE-compliant floating point operations + Robust software development tools and libraries that enable multimedia application development entirely in C/C++ programming languages + Configurable for standalone and plug-in card applications in consumer electronics and PC products TM-1000 easily implements popular multimedia standards such as MPEG-1 and MPEG-2, but its orientation around a powerful, programmable general-purpose CPU makes it capable of a variety of multimedia algorithms, whether open or proprietary. HARNESSING THE POWER OF VLIW TM-1000 delivers top performance through its elegant implementation of a fine-grain parallel architecture known as very-long instruction word, or VLIW. Unique to the TriMedia processor's VLIW implementation, parallelism is optimized at compile time by the TriMedia compilation system. No specialized scheduling hardware is required to parallelize code during execution. Hardware saved by eliminating complex scheduling logic reduces cost and allows the integration of multimedia-specific features. With the capacity to pack multiple operations into one VLIW instruction and 27 functional units in which to process them, TM-1000 can execute up to five operations in parallel with each clock cycle. Such parallel processing is an ideal complement to the inherently parallel nature of multimedia applications. Another key contributor to TM-1000's top performance is its use of conditional execution. During program creation, an instruction scheduler adds conditional code to each operation to enable guarded execution -- a technique that increases fine-grain parallelism and significantly decreases code branching and execution time. TM-1000 a single-chip multimedia workhorse First in the family of TriMedia processors, the TM-1000 is more than just an integrated microprocessor with unusual peripherals. It is a fluid single-chip computer system controlled by a small real-time operating system kernel running on a VLIW CPU. PROGRAMMABLE VLIW CPU At the heart of the TM-1000 is a powerful DSP-like, 32-bit CPU core. Its VLIW architecture utilizes a five-issue-slot engine. Parallelism is achieved by simultaneously targeting up to five of the 27 pipelined functional units in the TM-1000 processor within one clock cycle. The most common operations have their results available in one clock cycle; more complex operations have multi-cycle latencies. Functional units include integer and floating-point arithmetic units S D R A M and data-parallel DSP-like units. They can access 128 fully generalpurpose, 32-bit registers during execution. The registers are not sepa- INSTRUCTION INSTRUCTION CACHE CACHE rated into banks; any operation can use any register for any operand. TM-1000's instruction set includes common RISC operations, special ISSUE SLOT 1 ISSUE SLOT 2 ISSUE SLOT 3 ISSUE SLOT 4 ISSUE SLOT 5 DSP operations that perform powerful SIMD functions, custom multimedia functions, and a full complement of 32-bit, IEEE-compliant, floating point operations. Both big and little endian byte ordering are FUNCTIONAL UNITS supported. The TriMedia CPU provides special support for instruction and data breakpoints, useful in debugging and program development. TRIMEDIA INSTRUCTION EXECUTION TM-1000's unique VLIW CPU utilizes separate instruction and data cac hes, five issue slots, 27 pipelined functional units , and 128 general-purpose, 32-bit registers to process up to five operations in one clock cycle. DEDICATED INSTRUCTION AND DATA CACHE TM-1000's CPU is supported by separate, dedicated on-chip data and instruction caches. To improve cache behavior and performance, both caches have a locking mechanism. Cache coherency is maintained by software. Unique to the TriMedia Data cache is dual-ported to allow two simultaneous accesses. It is non-blocking, thus handling cache misses and CPU cache accesses can processor's VLIW proceed simultaneously. Early restart techniques reduce read-miss latency. Background copyback reduces CPU stalls. Partial word (8-bit and 16-bit) memory operations are supported. To reduce internal bus bandwidth requirements, instructions in main memory and cache use a compressed format. Instructions are decompressed in the instruction cache decompression unit before being processed by the CPU. implementation, parallelism is optimized at compile time by the TriMedia No external second-level cache is required to deliver media performance an order of magnitude more than x86 processors. compilation system. GLUELESS MEMORY SYSTEM INTERFACE The TM-1000 memory system balances cost and performance by coupling substantial on-chip caches with a glueless interface to synchronous DRAM (SDRAM). Higher bandwidth SDRAM permits the TM-1000 to use a narrower and simpler interface than would be required to achieve similar performance with standard DRAM. TM-1000's memory interface provides sufficient drive capacity for an up to 100-MHz, 8-MB memory system (four 2Mx8 SDRAMS). Larger memories can be implemented by using lower memory system S D R A M clock frequencies or external buffers. Programmable speed ratios allow SDRAM to have a different clock speed than the TM-1000 CPU. MAIN MEMORY INTERFACE Support for a variety of memory types, speeds, bus widths, and offchip bank sizes allow a range of TM-1000-based systems to be configured. HIGH-SPEED INTERNAL BUS (DATA HIGHWAY) VIDEO IN VLD COPROCESSOR AUDIO IN VIDEO OUT AUDIO OUT TIMERS I 2 C INTERFACE SYNCHRONOUS SERIAL INTERFACE TM-1000's internal bus, or data highway, connects all internal function units together and provides access to control registers in each function unit, to external SDRAM, and to the external PCI bus. It consists of separate 32-bit data and address buses; bus transactions use a block transfer protocol. On-chip peripheral units and coprocessors can be masters or slaves on the bus. Programmable bandwidth allocation enables the data highway to maintain real-time responsiveness in a variety of applications. INSTR. CACHE IMAGE COPROCESSOR VLIW CPU DATA CACHE PCI INTERFACE TM-1000 ARCHITECTURE On a single chip, the TM-1000 incorporates a powerful VLIW CPU and peripherals to accelerate processing of audio, video, graphics, and communications data. TO PCI BUS Multimedia I/O and coprocessing units To streamline data throughput, TM-1000 incorporates independent DMA-driven multimedia I/O and coprocessing units.These on-chip units manage input, output, and formatting of video, audio, graphics, and communications datastreams and perform operations specific to key multimedia algorithms. VIDEO INPUT VIDEO OUTPUT The video input (VI) unit reads digital video from an off-chip source, Essentially, the TM-1000 video out (VO) unit performs the inverse demultiplexes the YUV data, subsamples as needed, and writes it to function of the VI unit. The VO generates an 8-bit, multiplexed YUV SDRAM. Input is accepted from any CCIR656-compliant device that datastream by gathering bits from the separate Y, U, and V data struc- outputs 8-bit parallel, 4:2:2 YUV time-multiplexed video data at up to tures in SDRAM. It performs any programmed processing tasks then 19 Mpix/sec. Such devices include digital video camera systems (which outputs digital video data to off-chip video subsystems such as a digi- can connect gluelessly to TM-1000) or devices connected through tal video encoder chip, digital video recorder, or other CCIR656-com- ECL-level converters to the standard D1 parallel interface. patible device. The VO unit outputs continuous digital video in arbi- When needed, the VI unit can be programmed to perform on-the-fly trary formats including PAL or NTSC at up to 40 Mpix/sec. 2X horizontal resolution subsampling. This enables high-resolution While generating the multiplexed stream, the VO unit can provide images (640- or 720-pixels/line) to be captured and converted to 320- optional horizontal 2X upscaling. For simultaneous display of pixel or 360-pixels/line without burdening the CPU. When lower resolu- graphics and live video, it can also generate sophisticated graphics tion video is eventually desirable, performing subsampling during data overlays with alpha blending of arbitrary size and position within the capture can drastically reduce initial storage and bus bandwidth output image. requirements. Useful in multiprocessor designs, the VI unit can also be used to The VO unit can either supply or receive video clock and/or synchronizing signals from the external interface. Clock and timing registers receive raw data and unidirectional messages from another TM-1000's can be precisely controlled through programmable registers. video out port at up to 38 MB/sec. Programmable interrupts and dual buffers facilitate continuous data streaming by allowing the CPU to set up a buffer while another is being emptied by the VO unit. Like the VI unit, the VO unit can also be used to pass raw data and unidirectional messages from one TM-1000 to another. AUDIO INPUT AND AUDIO OUTPUT The ICP also provides display support for live video in overlapping The TM-1000 incorporates audio input (AI) and audio output (AO) windows, the number and sizes of which are limited only by band- units which use autonomous DMA to service datastreams required by width. The final resampled and converted image pixels are transmitted common serial audio DAC and ADC chips. Both units support glue- over the PCI bus to an optional off-chip graphics card/frame buffer. less I/O of stereo 16-bit audio data at sample rates up to 100 kHz. A small amount of glue logic enables output of up to eight channels. The audio interfaces are highly programmable, providing adaptability to custom protocols and future standards. VARIABLE LENGTH DECODER TM-1000's variable length decoder (VLD) offloads the processingintensive task of decoding Huffman-encoded video datastreams such as MPEG-1 and MPEG-2. The lower bit rate required by videoconfer- TM-1000's audio interfaces can be programmed to provide the master encing applications can be adequately handled by the TriMedia CPU clock to over-sampled ADCs and DACs. The clock generated on chip without the coprocessor. can be controlled with a resolution of .0006 ppm. This high resolution gives programmers subtle control over sampling frequency allowing them to simplify the synchronization algorithms required in complex multimedia systems. I2C INTERFACE TM-1000's I2C interface enables inter-chip connection to and control of other I2C devices. This allows TM-1000 to configure and inspect status of peripheral video devices such as video decoders and encoders IMAGE COPROCESSOR and some camera types. It is also used at boot time to read the boot The image coprocessor (ICP) offloads the TriMedia CPU of image program from the EPROM. processing and manipulation tasks such as copying an image from SDRAM to a host's video frame buffer. It can operate as either a memory-to-memory or a memory-to-PCI coprocessor device. In memoryto-memory mode, the ICP can perform horizontal or vertical image filtering and scaling. In memory-to-memory and memory-to-PCI SYNCHRONOUS SERIAL INTERFACE TM-1000's synchronous serial interface (SSI) provides serial access for a variety of multimedia applications, such as video phones or videoconferencing, and for general data communications in PC systems. modes, it can perform horizontal scaling and filtering followed by The SSI contains all the buffers and logic necessary to interface with YUV to RGB color-space conversion for screen display. simple analog modem front ends. When combined with the TriMedia V.34 software library, the SSI provides fully V.34-compliant modem capability. The TriMedia CPU performs the data pump, fax protocols, AT command handling, and error correction/detection. Alternatively, the TM-1000 SSI can connect to an ISDN interface chip to provide HOST-ASSISTED COPROCESSOR advanced digital modem capabilities. S D R A M TIMERS The TM-1000 contains four timers: three are available to program- VCR TV MONITOR CAMERA GRAPHIC CARD AUDIO mers, the fourth is reserved for the system. AUDIO HIGH-SPEED PCI BUS INTERFACE TM-1000's PCI interface connects the VLIW CPU and on-chip I/O RGB IMAGE SEQUENCES PCI BUS and coprocessing units to a PCI bus. In PC-based applications, HOST CPU TM-1000 can gluelessly interface to the standard PCI bus, allowing it MEMORY to be placed directly on the PC mainboard or on a plug-in card. In embedded applications where TM-1000 is the main processor, the STANDALONE PCI bus can be used to interface to peripheral devices that implement functions not provided by on-chip peripherals. S D R A M CAMERA VCR TV MONITOR PERIPHERAL AUDIO PERIPHERAL AUDIO PCI BUS ROM/FLASH The first member of the TriMedia family, the TM-1000 is designed for use both as a coprocessor in a PC-hosted environment and the sole CPU in standalone systems . UPWARD COMPATIBILITY TM-1000 is the first member of a family of chips that will carry investments in C/C++ media software forward in time. Software compatibility between family members is defined at the source code level, giving Philips the freedom to strike the optimum balance between cost and performance for all the chips in the TriMedia family. Powerful compilers ensure that programmers never need to resort to nonportable assembler programming. ROBUST SOFTWARE ENVIRONMENT The TriMedia software development environment (SDE) includes a full suite of system software tools to compile and debug code, analyze and optimize performance, and simulate execution for the TM-1000 processor. By enabling development of multimedia applications entirely in the C and C++ programming languages, the SDE dramatically lowers development costs, reduces time-to-market, and ensures code portability to next generation architecture. TriMedia software libraries shortcut development of many applications by providing a variety of standards-compliant algorithms to handle multimedia data. These C-callable routines are optimized for top performance on the TriMedia architecture and include such functions as MPEG-1 and MPEG-2 decode, V.34 modem, H.32x videoconferencing, audio synthesis, 2D graphics, and more. TRIMEDIA SPECIAL, C-CALLABLE OPERATIONS In addition to standard RISC and 32-bit floating point operations, the TriMedia instruction set includes highly parallelized custom and multimedia operations that accelerate the performance of SIMD (single instruction, multiple data) computations and saturation arithmetic common in multimedia applications. These DSP-like special operations By enabling development are invoked with familiar function-call syntax consistent with the C programming language. They are automatically scheduled to take full of multimedia applications entirely in the C and C++ programming languages, the SDE dramatically lowers development costs, reduces time-to-market, and ensures code portability to next generation architecture. advantage of the TM-1000's highly parallel VLIW implementation. TRIMEDIA REAL-TIME OPERATING SYSTEM KERNELS For multimedia applications requiring system resource and task management, the TM-1000 media processor supports the pSOS+TM (single processor) or pSOS+mTM (multiprocessor) embedded real-time operating system kernels. Developed by Integrated Systems, Inc. (ISI), the pSOS+ kernels are based on open system standards and are optimized to deliver the deterministic response essential for multimedia applications. TM-1000 Specifications CENTRAL PROCESSING UNIT Clock Speed 100 MHz PCI INTERFACE Speed 33 MHz Instruction Length variable (2 to 23 bytes); compressed Bus Width 32-bit Instruction Set RISC ops.; load/store ops.; special multimedia and DSP ops.; IEEE-compliant floating pt. ops. Address Space 32 bits (4 GB) Voltage drive and receive at 3.3V or 5V Standard Compliance PCI Local Bus Specification Rec 2.1 Issue Slots 5 Functional Units 27, pipelined Name/quantity/latency/recovery constant/5/1/1 integer ALU/5/1/1 memory load/store/2/3/1 shift/2/1/1 DSPALU/2/2/1 DSP multiply/2/3/1 branch/3/3/1 float ALU/2/3/1 integer/float mul./2/3/1 float compare/1/1/1 float sqrt./divide/1/17/16 Registers 128, 32-bit length Special Operations total number: 37 functions: DSP, multimedia, SIMD MEMORY SYSTEM Speed VIDEO IN Supported Signals Image Sizes VIDEO OUT Image Sizes CCIR 656 8-bit video up to 19 Mpix/sec raw 8-10-bit data up to 38 MB/sec all sizes, subject to sample rate flexible, including CCIR601; maximum 4K x 4K pixels (subject to 80 MB/sec data rate) Input Formats YUV 4:2:2, YUV 4:2:0 Output Format YUV 4:2:2 in CCIR656 format Clock Rates programmable (4-80 MHz), typically 27 MB/sec (13.5 Mpixels/sec for NTSC, PAL) Transfer Speeds 80 MB/sec in data-streaming and message passing modes; 40 Mpix/sec in YUV 4:2:2 mode 66/80/100 MHz AUDIO IN/AUDIO OUT Sample Size 8- or 16-bits CPU/Memory Speed Ratios programmable: 1:1, 5:4, 4:3, 3:2, and 2:1 Sample Rates 0 to 100 kHz, programmable with 0.0006 ppm resolution Off-chip Banks up to four Clock Source internal or external Devices Supported SDRAM (x4, x8, x16); SGRAM (x32) Number of Channels 2 input; 8 output Width 32-bit bus Native Protocol I2S and other serial 3-wire protocols Memory Size 512 KB to 64 MB Bandwidth 400 MB/sec (32-bit width at 100 MHz) Interface glueless up to 4 chips at 100MHz; more chips with slower clock and/or external buffers Signal Levels 3.3 V LVTTL CACHES Data Instruction 16 KB, 8-way set-associative with LRU replacement 32 KB, 8-way set-associative with LRU replacement INTERNAL DATA HIGHWAY Protocol 64-byte block-transfer separate 32-bit data and 32-bit address buses IMAGE COPROCESSOR Functions horizontal or vertical scaling and filtering of individual Y, U, or V horizontal scaling and filtering with color conversion and overlay: - YUV to RGB - RGB overlay and alpha blending - bit mask blanking Scaling programmable scale factor (0.2X to 10X) Filter 32-polyphase, each instance 5-tap, fully programmable filter coefficients Performance horizontal scaling and filtering: 80 MB/sec vertical scaling and filtering: 30 MB/sec horizontal scaling and filtering with color conversion: 33 Mpixels/sec peak for RGB output; 50 Mpixels/sec peak for YUV 4:2:2 output F O R M O R E I N F O R M AT I O N C O N TA C T : PHILIPS S E M I C O N D U C TO R S T R I M E D I A BUSINESS I2C INTERFACE Supported Modes single master only Addressing 7- and 10-bit FX 408-991-3300, E-MAIL info@trimedia.sv.sc.philips.com Rates Up to 400 kbps WEBSITE www.trimedia.philips.com External Interface 2 pins LINE 811 EAST ARQUES AVENUE M/S 71, SUNNYVALE CA 94088-3409 SYNCHRONOUS SERIAL INTERFACE Data Formats variable slots/frame External Interface 6 pins (2 can be used for tip and ring for phone connections) compatible with a majority of telecom devices can be configured with multiple chips Frame Synch external or internal Clock Source separate transmit, receive, frame synch transmit/receive clocks external source automatic frame synch error detection settable edge polarity for transmit, receive, and frame synch PHYSICAL Process C75:CMOS 0.35 micron; 4-layer metal Packaging MQUAD Number of Pins 240 Power supply: 3.3 V +/- 5% dissipation: 4W (typical) management: dynamic standby <200 mW PH 800-914-9239 (NORTH AMERICA), 408-991-3838 (WORLDWIDE) Philips Semiconductors - a worldwide company Argentina: see South America Australia: Tel. +61 2 9805 4455 Fax. +61 2 9805 4466 Austria: Tel. +43 1 60 1010, Fax. +43 1 60 101 1210 Belarus: Tel. +375 172 200 733, Fax. +375 172 200 773 Belgium: see The Netherlands Brazil: see South America Bulgaria: Tel. +359 2 689 211, Fax. +359 2 689 102 Canada: Tel. +1 800 234 7381 China/Hong Kong: Tel. +852 2319 7888, Fax. +852 2319 7700 Colombia: see South America Czech Republic: see Austria Denmark: Tel. +45 32 88 2636, Fax. +45 31 57 0044 Finland: Tel. +358 9 615800, Fax. +358 9 61580920 France: Tel. +33 1 40 99 6161, Fax. +33 1 40 99 6427 Germany: Tel. +49 40 23 53 60, Fax. +49 40 23 536 300 Greece: Tel. +30 1 4894 339/239, Fax. +30 1 4814 240 Hungary: see Austria India: Tel. +91 22 493 8541, Fax. +91 22 493 0966 Indonesia: see Singapore Ireland: Tel. +353 1 7640 000, Fax. +353 1 7640 200 Israel: Tel. +972 3 645 0444, Fax. +972 3 649 1007 Italy: Tel. +39 2 6752 2531, Fax. +39 2 6752 2557 Japan: Tel. +81 3 3740 5130, Fax. +81 3 3740 5077 Korea: Tel. +82 2 709 1412, Fax. +82 2 709 1415 Malaysia: Tel. +60 3 750 5214, Fax. +60 3 757 4880 Mexico: Tel. +9-5 800 234 7381 Middle East: see Italy Netherlands: Tel. +31 40 27 82785, Fax. +31 40 27 88399 New Zealand: Tel. +64 9 849 4160, Fax. +64 9 849 7811 Norway: Tel. +47 22 74 8000, Fax. +47 22 74 8341 Philippines: Tel. +63 2 816 6380, Fax. +63 2 817 3474 Poland: Tel. +48 22 612 2831, Fax. +48 22 612 2327 Portugal: see Spain Romania: see Italy Russia: Tel. +7 095 755 6918, Fax. +7 095 755 6919 Singapore: Tel. +65 350 2538, Fax. +65 251 6500 Slovakia: see Austria Slovenia: see Italy South Africa: Tel. +27 11 470 5911, Fax. +27 11 470 5494 South America: Tel. +55 11 821 2333, Fax. +55 11 821 2382 Spain: Tel. +34 3 301 6312, Fax. +34 3 301 4107 Sweden: Tel. +46 8 632 2000, Fax. +46 8 632 2745 Switzerland: Tel. +41 1 488 2686, Fax. +41 1 488 3263 Taiwan: Tel. +886 2 2134 2865, Fax. +886 2 2134 2874 Thailand: Tel. +66 2 745 4090, Fax. +66 2 398 0793 Turkey: Tel. +90 212 279 2770, Fax. +90 212 282 6707 Ukraine: Tel. +380 44 264 2776, Fax. +380 44 268 0461 United Kingdom: Tel. +44 181 730 5000, Fax. +44 181 754 8421 United States: Tel. +1 800 234 7381 Uruguay: see South America Vietnam: see Singapore Yugoslavia: Tel. +381 11 625 344, Fax. +381 11 635 777 For all other countries apply to: Philips Semiconductors, International Marketing & Sales Communications, Building BE-p, P.O. Box 218, 5600 MD EINDHOVEN, The Netherlands, Fax. +31 40 27 24825 Internet: http://www.semiconductors.philips.com Philips Electronics N.V. 1998 SCS57 The PHILIPS wordmark and shield are trademarks of Philips Electronics N.V. TriMedia and TriMedia & design are trademarks of Philips Electronics North America Corporation. pSOS+, and pSOS+m are trademarks of Integrated Systems, Inc. Other brands and product names are trademarks of their respective owners. All rights are reserved. Reproduction in whole or in part is prohibited without the prior written consent of the copyright owner. The information presented in this document does not form part of any quotation or contract, is believed to be accurate and reliable and may be changed without notice. No liability will be accepted by the publisher for any consequence of its use. Publication thereof does not convey nor imply any license under patent- or other industrial or intellectual property rights. Printed in The Netherlands. Date of release: March 1998 Document order number: 9397 750 03407