AES Ciphers: speed

Cross-table

NB! This table is not updated anymore. For updated information about AES finalists, see here. However if you have any new information about non-finalists, I'd be happy to update also this table here.
Machine/ compiler CAST-256CryptonDEALDFCE2FrogMars RC6RijndaelSAFER+SerpentTwofishHPC
32-bit software
Pentium Pro/Pentium II
BC++ 5.0 1790 630 NTT 2600 920 616 1738 640 3500
MSVC++ 4.0/5.0/DS97 452 (5.0, CHL) 2432 (4.0) 561 (5.0, NTT) 262 (DS97) 2080 (5.0)
DJGPP/egcs 390 (djgpp) ~365 (djgpp, MR) 370 (egcs, HL)
assembly 815 381 (CHL) 392 (DM, RH, TM, DB) 355 NTT 243 (TK) 283 (HL) 258
BG MVC++ 6.0 668 478 2339 1203 734 2572 376 270 374 1746 992 378 1468
Pentium
BG MVC++ 6.0 1131 816 3172 1079 760 758 702 2449 1279 673
assembly 609 (DB, RH, DM, TM) 320 290
EGCS 1.0.2 950
Alpha
AXP 21164 C (DEC CC) 323 (RH) ~420
AXP 21164 C (gcc) 351 (RH) 490 (LG)
AXP 21164, DEC CC (LG) 749 499 2752 1230 679 2752 507 559 516 1502 998 490 930
AXP 21164, gcc 281 (LG) 979 593 2928 1946 813 4197 771 625 617 3347 1444 511 3372
AXP 21164 asm 310 (RH) 587 600
AXP 21164 asm (K.A.) 600 408 2528 304 471 478 467 340 656 915 360 380
AXP 21264 asm 402
SPARC
UltraSparc-II Sun C 5.0 (HL) 694 477 2781 2692 711 2337 840 1161 334 3002 996 487 1465
UltraSparc C 1180 575 775 (RH) 328 (HL) 750
Sparc 170 C 969
Sparc 170 asm 802
Other (RISC, 680x0, ...)
Merced asm 625 170 720 232-
McKinley asm 525 142- 720 181
PPC 604e C 300
PPC 750 C 590
StrongARM asm 442 (DS)
PA-RISC 8200 547 (DW) 186 (DW) 610 (DW) 222 (DW)
TMS320C541 C (BG), TI C 1.20 10773 4155 37991 22668 9652 8908 8231 3518 10288 14703 4672 21759
68040 C 3500
8-bit processors
6805 31524 (GK) 35000 32731 (GK) 14324 (GK) 26500 (GK)
68xx 26000 (6811) 8390 (68HC08)
Z80 17,900 35000
MCS51, 8051 100,000 (KA) 13,535 3168 (1016 bytes) 80,000?
H8/300 6374
Hardware (estimates)
cycles 50 100 8 16
Mbit/s @50MHz 400 640 1300 800 400
cell count 130,000? 70,000 80000

Units. The entries of this table have a format cycles per block (optional implementer)''. For example, in row BC++ 5.0 and in column E2 the entry 711/ means that the best known implementation of E2 for Pentium Pro/Pentium II, using the BC++ 5.0 compiler takes 711 cycles to encrypt one block.

As seen by comparing Brian Gladman's implementations, the speed ratio of different ciphers depends very heavily on the processor (Pentium/Pentium II) used (but see the notes).

Quoted numbers. Most of the estimates below are taken from the original papers. Exceptions:

  1. the assembly implementation of RC6 is by Ted Krovetz. The djgpp implementation of RC6 is reported by Matthew Robshaw.
  2. the data of E2 is from its homepage. The P2 assembly data (355 cycles) has been reached by implementing an exhaustive code scheduler (2.45 uop per cycle!). Note that the Pentium Pro implementations are slower than those of P2 (which are quoted here). The 8051 encryption timing also includes key scheduling (to fit into the 256 bytes of RAM). KA = Kazumaro Aoki
  3. Crypton's cell count is from Lim's AES2 talk. Some other entries (marked by CHL) are due to the personal communication.
  4. The PA-RISC 8200 implementation is as communicated by Doug Whiting, but done by the HP folks.
  5. The TMS320C541 (Texas Instruments DSP) timings are taken from Enigma SOI's AES First Round official comments. They used basically Brian Gladman's code, but had to modify it in some cases to fit it to this architecture.
  6. MVC++ 6.0 data is by Brian Gladman (personal communication). His data given in the tables differs from the data given in his homepage. The main reason is that it also includes the endianness conversion times. Another difference is that I only provide 128b key encryption times, not mean of encryption and decryption.
  7. SAFER++ hardware estimations are by Rieks Joosten
  8. Some of the RC6 data have been taken from its homepage.
  9. DFC'c Alpha C implementations are by Robert Harley; the C implementation Sparc 170 is by Fabrice Noilhan. The UltraSparc C (using the floats), 21164 asm and Pentium asm implementations are by Robert Harley, again. The Pentium/P2 assembler implementations are by Terje Mathisen & Co. The ARM implementation is by David Seal (as reported by Robert Harley).
  10. Twofish assembly data is taken from a posting of Doug Whiting at aes.nist.gov
  11. A separate row is for Kenneth Almquist's estimations of speeds at AXP 21164.
  12. A separate row is for Louis Granboulan's timings on AXP 21164 (see the notes).
  13. Rijndael's Pentium II assembler and Ultrasparc C are my own.
  14. Merced/McKinley (the first and the second generation Intel IA64 processor) estimations are a courtesy of Doug Whiting.

Maintained by Helger Lipmaa. Don't hesitate to email me if you have any corrections/additions/comments. seconds to the d-day.