## MICROPROCESSOR R www.MPRonline.com THE INSIDER'S GUIDE TO MICROPROCESSOR HARDWARE

## BEST EXTREME PROCESSOR: XELERATED X10Q

Massively Pipelined NPU Is the First 40Gb/s Packet Processor By Tom R. Halfhill {2/9/04-16}

.....

We have chosen **Xelerated's Xelerator X10q** for the *Microprocessor Report* Analysts' Choice Award as **Best Extreme Processor** of 2003. This award category was especially difficult to judge, because the processors are so radically different from each other and from conventional

processors. We believe the Xelerator X10q deserves the award for both its extreme design, even by the standards of extreme processors, and its focused design, which doesn't allow complexity to obscure its utility.

Although massively parallel processors are becoming almost commonplace—*MPR* covered several in 2003—the X10q steps forward with a massively *pipelined* architecture. This unusual

approach is justified for a high-performance packet processor that performs repetitive tasks in serial fashion. For this application, a long, narrow architecture makes more sense than a wide architecture.

Even so, as logical as the design appears in retrospect, one must admire the moxie of a design team willing to lay a pipeline more than a thousand stages long. And with 200 identical VLIW processor cores, plus additional on-chip function units and resources, the X10q certainly isn't statistically challenged when compared with other extreme processors, no matter what problems they're intended to solve.

Being the first 40Gb/s packet processor is admirable, too, even if the market isn't yet clamoring for the level of performance the X10q can deliver. While the market catches up, Xelerated has some time to refine its development tools and marketing strategy. Unlike some other extreme processors, the X10q has a sharply defined target application: mainly, layer 2–4 packet processing for the Internet Protocol. To succeed, the X10q needn't do a variety of things



extremely well—doing only one thing extremely well is sufficient.

Programmers can largely ignore the physical complexity of the X10q and write their software as if the chip were a single-threaded, single-core processor. In effect, the X10q has 200 instruction slots per packet, divided into 10 blocks of 20 instructions, plus some access points between the blocks. The requirements of the packet-processing

algorithms will determine how to use the access points. Figure 1 shows how the access points and packet-processing functions are interleaved as a packet moves through the pipeline.

Despite the X10q's architectural complexity, we believe that programming it effectively will be relatively easy, at least by the standards of extreme processors and other NPUs. The X10q's architecture is daunting at first glance, but the virtual programming model is more straightforward and manageable than the architecture implies.

Xelerated is preparing three speed grades of the X10q. Relatively speaking, the X10q-w is the hot rod, with a core clock rate of 200MHz. At \$1,300, it's also the most expensive chip. The midrange X10q-m runs at 180MHz and costs \$690, and the low-end X10q-e runs at 160MHz and costs \$490. Don't be fooled by those low clock frequencies; the X10q-w can process as many as 100 million packets per second, and even the X10q-e can process up to 60 million packets per second. All three processors are well-balanced 2



**Figure 1.** This flow diagram shows how the X10q pipeline handles the forwarding of Internet Protocol packets. Distributed throughout the pipeline are 11 engine access points (EAP) that manage programmed accesses to the function units, off-chip memory, and external coprocessors. The functions shown here would use about one-third of the X10q's 1,000-plus pipeline stages.

designs that have enough I/O bandwidth to support their prodigious appetites for data.

Future designs can build on the Xelerator architecture. Xelerated can extend the pipeline even further and increase

the number of function units without breaking the basic design. Clock speeds will rise with advances in fabrication processes. Unless Xelerated fundamentally changes the architecture, programmers should be able to recompile their software or port it to run on new implementations with relatively little effort.

Most of Xelerated's competitors are concentrating on packet processors for 2.4Gb/s and 10Gb/s applications. The X10q is the only single-chip solution currently available for 40Gb/s packet processing that relies on predictable and moderately complex algorithms. By late 2004, the X10q may have closer competition, but by then, Xelerated will be closer to

introducing a next-generation product. The faster the communications industry recovers from the tech recession and begins spending again, the more Xelerated will be able to exploit its early advantage.  $\diamondsuit$ 

To subscribe to Microprocessor Report, phone 480.609.4551 or visit www.MDRonline.com

© IN-STAT/MDR