IIT 3C87
This IIT chip was introduced in 1989, about the same time as the Cyrix 83D87. Both coprocessors are faster than Intel's 387DX coprocessor. The IIT 3C87 also provides extra functions not available on any other 387 chip [38]. It has 24 user-accessible floating-point registers organized into three register banks. Three additional instructions (FSBP0, FSBP1, FSBP2) allow switching from one bank to another. (Transfers between registers in different banks are not supported, however, so this feature by itself is of limited usefulness. Also, there seems to be only one status register [containing the stack top pointer], so it has to be manually loaded and stored when switching between banks with a different number of registers in use [40]). The register bank's main purpose is to aid the fourth additional instruction the 3C87 has (F4X4), which does a full multiply of a 4x4 matrix by a 4x1 vector, an operation common in 3D-graphics applications [39]. The built-in matrix multiply speeds this operation up by a factor of 6 to 8 when compared to a programmed solution according to the manufacturer [38]. Tests show the speed-up to be indeed in this range [40]. I measured the F4X4 to execute in about 280 clock cycles, during which time it executes 16 multiplications and 12 additions. The built-in matrix multiply speeds up the matrix-by- vector multiply by a factor of 3 compared with a programmed solution according to IIT [39]. The results for my own TRNSFORM benchmark support this claim (see results below), showing a performance increase by a factor of about 2.5. This makes matrix multiplies on the IIT 3C87 nearly as fast as on an Intel 486 at the same clock frequency. As desirable as the F4X4 instruction may seem, however, there are very few applications that make use of it when an IIT coprocessor is detected at run time (among them Schroff Development's Silver Screen and Evolution Computing's Fast-CAD 3-D [25]). These IIT-specific instructions also work correctly when using a Chips & Technologies 38600DX or a Cyrix 486DLC CPU, which are both marketed as faster replacements for the Intel 386DX CPU. Tests I ran with the IEEETEST program show that the 3C87 is not fully compatible with the IEEE-754 standard for floating-point arithmetic, although the manufacturer claims otherwise. It is indeed possible that the reported errors are due to personal interpretations of the standard by the program's author that have been incorporated into IEEETEST and that the standard also supports the different interpretation chosen by IIT. On the other hand, the IEEE test vectors incorporated into IEEETEST have become somewhat of an industry standard [66] and Intel's 387, 486, and RapidCAD chips pass the test without a single failure, so the fact that the IIT 3C87 fails some of the tests indicates that it is not fully compatible with the Intel 387 coprocessor. My tests also show that the IIT 3C87 does not support denormals for the double extended format. It is not entirely clear whether the IEEE standard mandates support for extended precision denormals, as the IEEE-754 document explicitly only mentions single and double-precision denormals. Missing support for denormals is not a critical issue for most applications, but there are some programs for which support of denormals is at the very least quite helpful [41]. In any case, failure of the 3C87 to support extended precision denormal numbers does represent an incompatibility with the Intel 387 and 486 chips. The 3C87 is implemented in an advanced CMOS process and has low power requirements, typically about 600 mW. Like the 387 'clones' from Cyrix and ULSI, the 3C87 does not support asynchronous operation of the CPU and the coprocessor, but always runs at the full speed of the CPU. It is available in 16, 20, 25, 33, and 40 MHz versions.