Fast IDEA for Pentium MMX compatibles

SIMD is the kind of parallelism where a single instruction operates with multiple data at the same time. Super-scalar parallelism allows two or more (possibly different) instructions to be executed in parallel at the same time. MMX is a relatively new multimedia extension to super-scalar Pentium family, having a limited version of SIMD. MMX extensions are incorporated in every new Intel processor, including the processor that is currently without doubt the most popular -- Pentium II. SIMD parallelism provided by MMX increases significantly the performance of a very limited set of so called multimedia applications.

Looking from the cryptographic view of point, there seems to be a very narrow perspective of using MMX technology to increase the performance of already existing cryptographic primitives. Reasons are manifold, public key cryptography (RSA, for example) relys on long multiplication not on parallel execution of many short multiplications. Most of the block ciphers in general use are inherently nonparallelisable due to their use of S-boxes and/or table lookups.

Some primitives, though, are outstandingly suitable for MMX. One of such primitives is IDEA; this is demonstrated in the following Figure. (The speeds are scaled for an hypothetical 3200 MHz machine, last updated Nov 13 2000 if not explicitly said otherwise; NB: 1 MB/s=10242 B/s).

Block cipher Block size Cycles MBytes/s AuthorProcessor
Square128 192254.4 Lipmaa Pentium II
RC6128 219222.8 LipmaaPentium II/III 17.05.02
4-way IDEA4x64 440222.0 LipmaaPentium III
Rijndael128226 216.0LipmaaPentium II/III 04.04.02
Square128244200.0BosselaersPentium
4-way IDEA4x64 543180.0 LipmaaPentium MMX
SC2000128 270 180.8Lipmaa Pentium II/III, gcc (no asm) New, 04.04.2002
4-way IDEA4x64 554176.4 LipmaaAMD Athlon New, 01.10.2003
Twofish128 277 176.4Aoki, Lipmaa Pentium II/III
Rijndael128300162.8GladmanPentium III New, 15.10.2001
Camellia 128302161.6 AokiPentium II/III
MARS128 306160.0Lipmaa Pentium II/III
Blowfish 64 158 154.4 BosselaersPentium
RC5-32/16 64 199 122.8 BosselaersPentium
CAST5 64 220 110.8 BosselaersPentium
DES 64 340 72.0 BosselaersPentium
IDEA64358 68.0 LipmaaPentium MMX
SAFER (S)K-128 64 418 58.4 BosselaersPentium
Shark 64 585 41.6 BosselaersPentium
IDEA 64 590 41.2 BosselaersPentium
3DES 64 928 26.4 BosselaersPentium

Why IDEA?

IDEA has been known for the cryptographic community for ten years, and it is still unbroken. Many cryptographers think, that it is really one of the most secure block ciphers available. However, IDEA has generally considered also to be a slow cipher (cf Bosselaers' implementation of IDEA in the last table) due to the costly 16-bit multiplications involved. It is not anymore the case: by using a Pentium MMX compatible machine, IDEA encryption will be faster than DES, RC5 or Blowfish encryption --- to name a few other well-known block ciphers.

Compared to leading AES candidates, 4-way IDEA is only a little slower than RC6 and Rijndael on the Pentium II, but faster than Twofish and MARS. On the Pentium III, 4-way IDEA is even faster than RC6 and Rijndael. If one prefers a block cipher with time-proven security margins, IDEA is definitely the choice over AES algorithms.

However, in the light of the ongoing AES process and the amount of cryptanalysis applied to the leading AES candidates, especially to the winner, it might very soon become desirable to switch over to the proposed AES, Rijndael. To get more information about the AES candidates, click here.

Parallel Encryption Modes

The term "x-way" means that x encryptions/decryptions are done in parallel. To effectively use such parallel implementation, special encryption modes have to be used. I've currently implemented 4-interleaved CBC4 and XORC4 modes (the XORC mode is better known as the counter mode and by a recent analysis of Bellare etc, almost ideally secure in a random oracle model). The XORC4 mode also provides a significant memory-time tradeoff: the so called counter may be encrypted beforehand (``off-line''). Later, ``on-line'', one has only to xor the encrypted counter with the plaintext. On a 500 MHz Pentium II, CBC4 en/decryption and XORC4 en/de/precryption can be done in about 260-275 Mbit/s on a 450 MHz Pentium III.

To further simplify the usage, the CBC1 and XORC1 modes (corresponding to the usual CBC and counter modes) have been implemented. In this mode, FastIDEA can be seen as a drop in replacement to the popular OpenSSL library. Since CBC1 decryption and XORC1 encryption (and decryption that is equal to encryption) use internally the 4-way IDEA, the library achieves almost the same speed in these modes as in the low level CBC4 and XORC4 modes: about 235-260 Mbit/s. The CBC1 encryption is not parallelizable and therefore does not achieve such speed.

Availability

More information on this project is available on by email. (for example if you are interested in commercial use) Note that information on this page corresponds to the version 1.1a of the library.

If you want to learn more about my implementations, don't hesitate to mail to lipmaa(at)cyber.ee. In the mail please specify your interest! :-)

NEWI have also finished fast implementatios of MARS and Rijndael, two of the AES finalist ciphers. Information about that can be obtained from the AES Speed page or by sending an email to lipmaa(at)cyber.ee.

Comparison with other implementations

Interestingly, the fastest available IDEA hardware coprocessor runs at 40 MHz and achieves about 300 Mbit/s on the 3-way IDEA mode and about 100 Mbit/s on the standard IDEA mode. (See links.) On a 500 MHz Pentium III, my 4-way IDEA implementation achieves 290 Mbit/s, while already on the 550 MHz Pentium III my sotware implementation is faster than the Ascom hardware implementation. A 866 MHz Pentium III achieves 500 Mbit/s, which is in the same range as the hardware implementations of the AES candidates.

Publications

Other stuff

News


visitors from 28.02.98, 18:00 EET. A totally new version of this page: 10.08.98.
Helger Lipmaa