# MICROPROCESSOR

www.MPRonline.com

THE INSIDER'S GUIDE TO MICROPROCESSOR HARDWARE

## ARM Wrestles PicoTurbo in Court

ARM-Compatible Cores From Startup Draw Patent Lawsuit

By Tom R. Halfhill {4/17/00-01}

PicoTurbo, a two-year-old startup based in Milpitas, Calif., has a new twist on ARM: a family of embedded-processor cores that's compatible with the ARM architecture. Indeed, the cores are apparently *too* compatible for ARM, which has filed a patent-

infringement lawsuit against picoTurbo in U.S. District Court in San Jose.

ARM alleges that picoTurbo infringes three of ARM's U.S. patents. One patent describes shadow registers that temporarily store the contents of data registers during exception processing. The other two patents are related to ARM's Thumb instructions—a subset of the normal 32-bit instruction set that uses 16-bit instruction words for greater code density. PicoTurbo maintains that its cores do not infringe on ARM's patents, because they either don't perform the patented functions or perform similar functions in different ways, with an independently designed "clean room" microarchitecture. In fact, picoTurbo has applied for four patents of its own.

One thing is certain: there's a big opportunity for an ARM-compatible core. Since spinning off from Acorn Computer as an independent company in 1990, ARM has steadily climbed toward the summit of the embedded-processor market. Last year, according to MDR estimates, ARM licensees shipped more than 150 million chips, outselling every other 32-bit embedded-processor architecture in the world—including, for the first time, Motorola's ubiquitous 68K (see MPR 1/17/2000-01, "Embedded Market Breaks New Ground"). Market researchers at Dataquest estimate that shipments of ARM-based chips could exceed one billion units by 2005.

Given ARM's popularity, it was probably only a matter of time before somebody designed an ARM-compatible core.

The same thing happened to MIPS Technologies in 1998 when Lexra introduced a MIPS-like embedded-processor



**Figure 1.** PicoTurbo's pT-110 core is similar to the ARM9 and executes ARMv4T instructions. The pT-100 is identical to the pT-110, except for the elements highlighted in purple.

core (see MPR 2/16/98-03, "Lexra ASIC Core Dupes MIPS R3000"). Since then, MIPS and Lexra have fought a series of bitter legal battles over trademark issues, product claims, and alleged patent infringement (see MPR 12/6/99-03, "MIPS vs. Lexra: Definitely Not Aligned").

Legal action hasn't stopped Lexra from rolling out new cores and signing up licensees, and it probably won't stop picoTurbo either. In both cases, there's an opportunity for workalike cores, because the newcomers offer different features, better performance, more flexible licensing terms, or lower costs.

### **Strong Family Resemblance**

PicoTurbo's pT-100, pT-110, and pT-120 cores are based on a similar design with several variations. Like the ARM9, they are 32-bit uniscalar RISC processors with five-stage pipelines and fully static cores. To address different segments of the market, picoTurbo removed some elements from the pT-110 to produce the lower-end pT-100, and it added some features to produce the higher-end pT-120. But even the pT-100 retains a 32-bit Wallace-tree multiplier, a separate Thumb decoder, a 32-bit barrel shifter, and power-management logic. As with ARM cores, the picoTurbo cores have fully conditional instruction sets and can perform a shift and an ALU operation with one instruction in a single clock cycle.

Figure 1 shows a block diagram of the pT-110. The elements highlighted in purple were omitted from the pT-100: the instruction cache, data cache, write buffer, MMU, and internal PLL clock multiplier. Both cores execute the ARMv4T instruction set and are available now.

The higher-end pT-120 adds 2-bit branch prediction, a 512-entry branch target buffer, and support for the ARMv5T instruction set. PicoTurbo recently taped out the first version of this core and hopes to make it available this quarter. In the fall, picoTurbo plans to have the pT-120 running in a 0.13-micron copper process. A variation of this core, the pT120D, is on picoTurbo's roadmap for introduction in 4Q00. The pT-120D will have some fixed-point DSP instructions and a Windows CE–compatible MMU.

PicoTurbo is also integrating some industry-standard buses with its processor cores to make it easier for embedded developers to use existing peripheral macros and other intellectual property. One of those buses is PCI, and the other, ironically, is AMBA (Advanced Microcontroller Bus Architecture). AMBA was originally developed by ARM but is now a freely licensed bus specification that's gaining popularity with embedded developers.

Objective comparisons between picoTurbo's cores and ARM's cores are difficult, because independent benchmarks aren't available, and the two companies tend to characterize performance in different ways. PicoTurbo often quotes typical or best-case clock frequencies for a given IC process, while ARM says it quotes worst-case clock frequencies. On balance, we believe the raw performance of ARM's and picoTurbo's cores will be roughly comparable when they are fabricated in comparable processes. ARM's cores, however, should enjoy some advantage in clock speed by virtue of better circuit design. Unlike ARM, picoTurbo employs no circuit designers, relying instead on automated design tools and standard-cell logic.

ARM says an ARM9 implementation will run at least as fast as 275MHz in a 0.25-micron process and consume about 220mW at that frequency. In a leading-edge 0.18-micron process, the ARM9's worst-case frequency is 329MHz and power consumption is 122mW, according to ARM. Those numbers are better than previously available estimates, which pegged the ARM9 at 200MHz at 0.25 micron and at 300MHz at 0.18 micron.

PicoTurbo says the pT-110 typically runs at 250MHz in a 0.25-micron process while consuming 750mW. At 0.18 micron, says picoTurbo, the pT-110's typical frequency is 300MHz. But the company hasn't tested actual silicon at that frequency and doesn't know how much power it will consume.

The pT-100 is more suitable for very low-power applications, because it consumes only about half as much power at a given clock frequency as the pT-110 when fabricated in the same IC process. Its maximum clock frequency is lower, too—partly for marketing reasons, and partly because of minor differences in its critical paths and bus interface (it has an ARM7-compatible bus instead of an ARM9-compatible bus). The pT-100 consumes only 120mW at 100MHz in a 0.25-micron process and only 75mW at 150MHz in a 0.18-micron process.

Of course, numerous factors could affect these power/ performance estimates, such as the relative efficiency of the

circuit design and cache implementations. PicoTurbo surely sacrifices some performance by relying on standard-cell logic. Even the caches are generated with standard SRAM macrocells, which explains why the 2.5mm<sup>2</sup> die of the 0.25-micron pT-110 expands to 5.5mm<sup>2</sup> after adding only 8K of cache. That's also why pico-Turbo's caches are limited to

|                      | picoTurbo | picoTurbo | picoTurbo          | picoTurbo          | picoTurbo          | ARM              | ARM              |
|----------------------|-----------|-----------|--------------------|--------------------|--------------------|------------------|------------------|
| Feature              | pT-100    | pT-100    | pT-110             | pT-110             | pT-120             | ARM9             | ARM9             |
| Architecture         | ARMv4T    | ARMv4T    | ARMv4T             | ARMv4T             | ARMv5T             | ARMv4T           | ARMv4T           |
| IC Process           | 0.25μ     | 0.18μ     | 0.25μ              | 0.18μ              | 0.18μ              | 0.25μ            | 0.18μ            |
| Clock speed*         | 100MHz    | 150MHz    | 250MHz             | 300MHz             | 500MHz             | 275MHz           | 329MHz           |
| Core voltage         | 2.5V      | 1.8V      | 2.5V               | 1.8V               | 1.8V               | 2.5V             | 1.8V             |
| Power (mW/MHz)       | 1.2mW     | 0.5mW     | 3.0mW              | n/a                | n/a                | 0.8mW            | 0.37mW           |
| Core size (no cache) | 2mm²      | 1mm²      | 2.5mm <sup>2</sup> | 1.9mm <sup>2</sup> | 2.1mm <sup>2</sup> | 2mm <sup>2</sup> | 1mm <sup>2</sup> |
| Availability (soft)  | Now       | Now       | Now                | Now                | 2Q00               | Now              | Now              |

**Table 1.** These vendor-supplied estimates of performance and power consumption may vary widely with different implementations. \*ARM says these clock frequencies are worst-case estimates, while picoTurbo quotes typical or best-case clock frequencies. (n/a = data not available)

direct mapping or two-way set-associativity, while some of ARM's caches are 64-way set-associative. PicoTurbo says the pT-120 will address these issues by including some custom-designed SRAM. Table 1 summarizes the power/performance data available for the pT-100, pT-110, pT-120, and ARM9.

At the recent IP2000 conference in Santa Clara, pico-Turbo demonstrated sample pT-110 chips running on evaluation boards. The company says it spent 11 months verifying software compatibility and went through five tapeouts to get it right, mainly because the engineers kept discovering more undocumented registers in the ARM architecture.

The pT-100 and pT-110 are available as soft cores, firm cores, or hard cores in multiple formats—including Verilog source code, encrypted RTL, gate-level netlists, and GDSII streams. The multiplicity of formats allows customers to change the aspect ratios of the layouts and port the cores to almost any IC process. The hard cores are available from picoTurbo's foundry partners, TSMC (Taiwan Semiconductor Manufacturing Co.) and UMC.

PicoTurbo says ten customers had signed license agreements by early April. Eight of those customers have licensed firm cores and two have licensed hard cores. PicoTurbo says eight more licenses are in negotiations, four of them for soft cores. None of the licensees wish to be named at this time. (Although picoTurbo's contract indemnifies customers against patent-related legal liability, the licensees may have competitive or other reasons for avoiding public disclosure.)

Licensing terms are negotiable; manufacturing royalties average only about 6 cents per chip. Licenses signed before July 30 will cap the royalties at 10 million units—beyond that, customers will pay no royalties at all. ARM doesn't publicly disclose similar information, but picoTurbo claims a customer could save 50% or more by choosing one of its cores over an ARM9, and that a picoTurbo firm core costs about the same as an ARM hard core while offering more flexibility. The picoTurbo license allows customers to take the cores to TSMC, UMC, or a foundry of their choice and to move a core between foundries without renegotiating the license.

#### Left Arm vs. Right ARM

There are some differences between picoTurbo's cores and the latest ARM10, but those differences won't matter to some customers. Table 2 shows the instructions added to the ARM architecture in the ARMv5T and ARMv5TE extensions. None of these instructions is supported by the pT-100 or pT-110, which adhere to the ARM4vT definition.

Besides the instructions shown in Table 2, picoTurbo also doesn't support any vector floating-point instructions. In 1998, ARM announced a vector floating-point coprocessor at the same time it described the next-generation ARM10 (see *MPR 11/16/98-03*, "ARM10 Points to Set-Tops, Handhelds"). PicoTurbo doesn't rule out a floating-point implementation in the future, but for now the cores are integer-only devices.

| Instruction | Description                          | Architecture |
|-------------|--------------------------------------|--------------|
| BLX         | Branch and link with exchange        | ARMv5T       |
| BKPT        | Breakpoint (prefetch abort or debug) | ARMv5T       |
| CLZ         | Count leading zeroes                 | ARMv5T       |
| POP         | Pop and return with exchange         | ARMv5T       |
| SMULxy      | Signed 16b x 16b multiply            | ARMv5TE      |
| SMULWy      | Signed 32b x 16b multiply            | ARMv5TE      |
| SMLAxy      | Signed 16b x 16b accumulate          | ARMv5TE      |
| SMLAWy      | Signed 32b x 16b accumulate          | ARMv5TE      |
| SMLALxy     | Signed 16b x 16b accumulate long     | ARMv5TE      |
| QADD        | Saturating add                       | ARMv5TE      |
| QDADD       | Double saturating add                | ARMv5TE      |
| QSUB        | Saturating subtract                  | ARMv5TE      |
| QDSUB       | Double saturating subtract           | ARMv5TE      |

**Table 2.** PicoTurbo's ARMv4T-compatible cores currently don't support these new instructions added to the ARMv5T and ARMv5TE architectural extensions.

Apart from its floating-point advantage, the ARM10 is likely to exceed the performance of picoTurbo's initial cores, thanks partly to a longer pipeline that enables higher clock frequencies. As Figure 2 shows, the picoTurbo pipeline is virtually identical to the ARM9's, while the ARM10 splits the decode/register-read stage into two separate stages (see MPR 11/15/99-en, "ARM Extends Reach of ARM10 Pipeline"). ARM expects the ARM10 to hit 300MHz in a 0.25-micron process—20% faster than a comparable pT-110. The ARM10 is scheduled to ship in 2Q00.

In all other important respects, picoTurbo says the pT-100 and pT-110 are compatible with the ARMv4T instruction set. They will run the same development tools, RTOSs, and applications as ARM9-based chips. The company says its cores have successfully run ARM's test suite, WindRiver's VxWorks test suite, and the Integrated Systems pSOS. PicoTurbo provides customers with an instruction-accurate simulator and other test-bench tools as part of its package.

#### **Designed by Lawyers?**

Although picoTurbo's cores weren't really designed by lawyers, the company says its legal counsels worked with the engineers from the beginning of the project to avoid stepping on ARM's intellectual property, which includes more than 40 patents related to RISC technology. PicoTurbo's patent counsel is Robert Yoches of Finnegan, Henderson, Farabow, Garrett & Dunner in Palo Alto, Calif.,



**Figure 2.** The recent addition of an extra stage to the ARM10 pipeline should help it reach higher clock frequencies than the ARM9 and picoTurbo cores in comparable fabrication processes.

who also represents Lexra in its patent-infringement dispute with MIPS.

ARM says picoTurbo is infringing three of its U.S. patents: number 5,386,563 ("Register Substitution During Exception Register Processing"), issued on January 31, 1995; number 5,568,646 ("Multiple Instruction Set Mapping"), issued on October 22, 1996; and number 5,740,461 ("Data Processing With Multiple Instruction Sets"), issued on April 14, 1998. ARM's complaint offers no technical explanations for the allegations, and the company has declined to comment on the case, other than to issue a short statement that says little.

The '563 patent describes the use of shadow registers that temporarily store the contents of data registers during exception handling, such as interrupt processing. This technique allows faster exception handling, because the CPU doesn't have to save and restore the contents of its registers in an off-chip memory stack. The patent has 22 claims, of which 21 are apparatus claims and one is a method claim and an apparatus claim. Method claims are more difficult to circumvent, because they attempt to cover any method that achieves the same result, not just a specific apparatus.

PicoTurbo says its cores handle exceptions in a different way and don't infringe the '563 patent. If ARM's lawsuit reaches a jury trial, picoTurbo could also try to defend itself by challenging the validity of this patent. ARM's claims in the '563 patent describe techniques that appear similar to the register windowing used by other microprocessors before ARM applied for the patent on October 13, 1992. An independent patent expert consulted by MDR says he designed such a scheme in 1975 and holds some patents that have similar claims.

The '646 and '461 patents describe techniques related to ARM's Thumb instructions, although they don't mention Thumb by name and are rather narrowly written. The gist of these patents is that a processor can execute multiple instruction sets of different word lengths; that the different instruction sets can manipulate operands of the same data width; and that the processor's decoding logic can map the shorter instructions to the longer instructions without the need for redundant decoding logic or execution pipelines. That's how the ARM7 core works: at run time, it maps or "decompresses" 16-bit Thumb instructions into equivalent 32-bit instructions, then decodes and executes the longer instructions normally. The 16-bit Thumb instructions require only half as much memory as standard ARM instructions, so they help reduce system costs, yet the processor remains compatible with both instruction sets and the same data types (see MPR 3/27/95-01, "Thumb Squeezes ARM Code Size").

There are 13 claims in the '646 patent, including one method claim. The lone method claim describes a six-step process for decoding and translating one instruction subset into another. The claims are narrowly written and use "means plus function" language, which courts have interpreted as

"the means disclosed in the patent, or its equivalent." Therefore, it might be possible for picoTurbo to circumvent the '646 patent by decoding Thumb instructions in a different way. And, in fact, that's what picoTurbo claims: that its cores use separate decoders for 16- and 32-bit instructions instead of translating or mapping the 16-bit instructions into 32-bit instructions and decoding them normally. This approach requires extra logic for the separate decoder, but picoTurbo says the effect on performance is minimal.

ARM evidently reached a similar conclusion a few years ago, because the ARM9 doesn't decode Thumb instructions in the same way as the ARM7. Instead of mapping 16-bit instructions to equivalent 32-bit instructions, the ARM9 decodes and executes Thumb instructions directly, much as picoTurbo says its cores do. Because the '646 patent appears to cover the instruction-mapping technique, it may not apply to picoTurbo's cores.

PicoTurbo's workaround might circumvent the '461 patent as well. This patent has 15 claims, including four method claims. The claims describe additional details about decoding, instruction mapping, and mode switching. Although this patent appears to be stronger than the '646 patent, picoTurbo's cores may avoid infringement by using separate instruction decoders instead of a mapping and mode-switching scheme. Only a court of law can determine this, of course.

#### **Embedded Industry Echoes x86 Wars**

It's easy to see why ARM is giving picoTurbo the cold shoulder. Millions of dollars are at stake for both companies. ARM feels compelled to defend its hard-won market share against an invader that is undercutting its license fees and royalties. PicoTurbo stands to gain a lucrative chunk of the market by riding the coattails of the popular ARM architecture. We don't expect this case to be settled anytime soon.

Coming on the heels of the MIPS-Lexra battle, it's also an indication that the embedded industry is repeating some unpleasant history of the PC industry. Intel and AMD fought a similar war over the x86 architecture that dragged on for years and enriched dozens of intellectual-property lawyers. There's a significant difference between the Intel-AMD conflict and the ARM-picoTurbo and MIPS-Lexra lawsuitspatent infringement wasn't the central issue with Intel and AMD—but in all cases, the foundation of the dispute is whether or how a challenger can sell microprocessors that are compatible with somebody else's architecture. Ultimately, Intel and AMD reached a settlement that allows AMD to continue selling x86-compatible processors but that excludes AMD's chips from using Intel's socket interfaces after Socket 7. Although the MIPS and ARM architectures don't dominate the embedded market in the same way the x86 rules the PC market, they are popular enough to attract workalike competitors, even if the price of entry is lengthy litigation.

Another possible target for compatible competition is Hitachi's SuperH. According to MDR estimates, SuperH

was the fourth-most-popular 32-bit embedded architecture last year, following ARM, 68K, and MIPS. Hitachi doesn't broadly license SuperH for ASIC integration as MIPS and ARM do, and synthesizable versions of SuperH cores aren't available to embedded developers. There weren't any synthesizable MIPS cores available for licensing before Lexra appeared on the scene either.

That's why the outcomes of the ARM-picoTurbo and MIPS-Lexra cases will bear close inspection. Out-of-court settlements or narrow court rulings may have no relevance for future cases. But a broader court ruling that makes it easier to design compatible CPU cores could have far-reaching implications for other popular architectures and for competitors that recognize a good business opportunity when they see one.  $\Diamond$ 

#### For More Information

PicoTurbo's pT-100 and pT-110 cores are available now in Verilog, encrypted RTL, netlists, and GDSII formats. The pT-120 is scheduled to be available this quarter. Licensing terms are negotiable. For more information, go to <a href="https://www.picoturbo.com">www.picoturbo.com</a>. ARM's ARM9-family cores are available now. For more information, go to <a href="https://www.arm.com">www.arm.com</a>. To look up ARM's patents, go to IBM's Intellectual Property Network at <a href="https://www.patents.ibm.com/ibm.html">www.patents.ibm.com/ibm.html</a>.

To subscribe to Microprocessor Report, phone 408.328.3900 or visit www.MDRonline.com