AES Ciphers: speed

This page is not updated as from 15.10.2001. Updated nformation about Rijndael implementations can be obtained from here. (Old information about Rijndael stays here until it starts to cause major misunderstandings!)

Cross-table

Machine/ compiler Mars RC6RijndaelSerpentTwofish
CyclesAvail.
32-bit software
Pentium Pro/Pentium II/Pentium III
ADA 95 792 Gisle Sælensminde
BC++ 5.0 920 616 1738 640
MSVC++ 4.0/5.0/DS97 262 (DS97)
DJGPP/gcc 347 (gcc, Li) 241 (gcc, Li) 354 (gcc, Li) [$] 759 (gcc, Osvik) 402 (gcc, Li)

assembly: cycles 306 (Li) 219 Li) Upd: 17.05.02 229 (Li) [$] 771(Gl, asm in C++, PIII) 258 (self-modifying)
277 (Ao, Li)
--''-- (the same), Mbit/s @1 GHz 398.9 547.4 533.1 158.3 440.7

Gl, MVC++ 6.0 367 267 364 Free 945 358
Pentium
Gl, MVC++ 6.0 760 758 702 N/A 1279 673
assembly 320 N/A 290
EGCS 1.0.2 950 Free
Alpha
AXP 21164 C (gcc) 490 (LG) ?
AXP 21164, DEC CC (LG) 507 559 516 ? 998 490
AXP 21164, gcc 281 (LG) 771 625 617 ? 1444 511
AXP 21164 C
Weiss etc
701 571 439 984 442
AXP 21164 asm (Almquist) 478 467 340 915 360

AXP 21264 C
Weiss etc
515 428 294 535 (Weiss @ aes.nist.gov) 316
AXP 21264 asm
Weiss
(estimated)
375 360 210 570 255
SPARC
SPARCv9 Sun C 6.0 (Li) 825 1144 270 1003 354
IA-64 family
Merced asm 625 170 720 232-
McKinley asm 525 142- 720 181
McKinley snapshot asm
Worley etc
511 490 124 419 (Worley, p.c.) 182
McKinley snapshot asm
Eric Young
470
IA-64++
Worley etc
255 150 124 468 182
Macintosh-compatible processors
PPC 604e C 300 226 (Denis Ahrens, MWP)
PPC 750 C 590
Other (RISC, 680x0, ...)
PA-RISC 7000 (Gl) 950 1085 735 1345 755
PA-RISC 8200 547 (Whiting) 186 (Whiting) 610 (Whiting) 222 (Whiting)
PA-RISC 8500
Worley etc
540 493 185 580 205
68040 C 3500
ARM-based SmartCards 790 (UC) 1467 (UC)? 8406 (UC)
DSPs
Gl, TMS320C541, TI C 1.20 8908 8231 3518 14703 4672
TMS320C6201 Wollinger etc 406 292 228 871 308
8-bit processors
6805 32731 (GK) 14324 (GK)Free 26500 (GK)
68xx 8390 (68HC08)
MCS51, 8051 13,535 14,500 (UC) 3168 (1016 bytes)
FPA
Xilinx Virtex XCV 1000
Mbit/s (cycles per block/MHz/CLB Slices)
Gaj etc
61.0 (//2744) 142.7 (//1137) 414.2 (//2507) 431.4 (//4507) 177.3 (//1076)
Xilinx Virtex XCV1000BG560-4 FPGA
Mbit/s (cycles per block/MHz/CLB Slices)
Elbirt etc
126.5 (20/19.8/3189) 300.1 (6/14.1/5302) 444.2 (4/13.9/7964) 127.7 (16/16.0/2695)
XC4028XL FPGA 85.6 (16/10.7/911)
Xilinx Virtex
Mbit/s (slices per core w/o key setup) Dandalis, Prasanna, Rolim
101.88 (4621) 112.87 (1749) 353.00 (4312) 148.95 (1250) 173.06 (2809)
Xilinx Virtex
Mbit/s (slices per core) Gaj etc
39.8 (2737) 103.9 (1139) 331.5 (2902) 339.4 (4438) 177.3 (1076)
Altera Flex10KE Mbit/s (slices per core) Fischer 232.7 (1585) 125.5 (3678) 81.5 (1935)
Hardware
NSA Hardware Test
Mbit/s(area um2, trans count)
56.71 (126827662; 1941371) 102.83 (19248830; 307247) 605.77 (33851050; 641681) 202.30 (23274086; 345483) 105.14 (16110756; 264058)
Mitsubishi 0.35micron CMOS ASIC 225.55 Ichikawa etc 203.96 1950.03 931.58 394.08
Hi/fn 0.25micron ASIC
Mbit/s (trans count)
1280 (39,000) 800 (77,500) 528 (86,000)

Units. The entries of this table have a format cycles per block (optional implementer)''. For example, in row BC++ 5.0 and in column E2 the entry 711/ means that the best known implementation of E2 for Pentium Pro/Pentium II, using the BC++ 5.0 compiler takes 711 cycles to encrypt one block.

As seen by comparing Brian Gladman's implementations, the speed ratio of different ciphers depends very heavily on the processor (Pentium/Pentium II) used (but see the notes).

Only iterative (Pfeedback) modes are considered.

Quoted numbers. Most of the estimates below are taken from the original papers. Exceptions:

  1. Ao - Kazumaro Aoki
  2. Gl - Brian Gladman. See his own page.
  3. Li - Helger Lipmaa
  4. UCL/CRYPTO. See their own page.
  5. Some of RC6 data have been taken from its homepage.
  6. The PA-RISC 8200 implementation is as communicated by Doug Whiting, but done by the HP folks.
  7. The TMS320C541 (Texas Instruments DSP) timings are taken from Enigma SOI's AES First Round official comments. They used basically Brian Gladman's code, but had to modify it in some cases to fit it to this architecture.
  8. Twofish assembly data is taken from a posting of Doug Whiting at aes.nist.gov
  9. A separate row is for Kenneth Almquist's estimations of speeds at AXP 21164.
  10. A separate row is for Louis Granboulan's timings on AXP 21164 (see the notes).
  11. Merced/McKinley (the first and the second generation Intel IA64 processor) estimations are a courtesy of Doug Whiting.

Maintained by Helger Lipmaa. Don't hesitate to email me if you have any corrections/additions/comments.

Valid HTML 4.01!