A Low-cost Reconfigurable Architecture for AES Algorithm

Yibo Fan, Takeshi Ikenaga, Yukiyasu Tsunoo, and Satoshi Goto

Abstract—This paper proposes a low-cost reconfigurable architecture for AES algorithm. The proposed architecture separates SubBytes and MixColumns into two parallel data path, and supports different bit-width operation for this two data path. As a result, different number of S-box can be supported in this architecture. The throughput and power consumption can be adjusted by changing the number of S-box running in this design. Using the TSMC 0.18μm CMOS standard cell library, a very low-cost implementation of 7K Gates is obtained under 182MHz frequency. The maximum throughput is 360Mbps while using 4 S-Box simultaneously, and the minimum throughput is 114Mbps while only using 1 S-Box.

Keywords—AES, Reconfigurable architecture, low cost.

I. INTRODUCTION

ADVANCED Encryption Standard (AES) [1] was selected by the National Institutes of Standards and Technology (NIST) as a new encryption standard to replace the Data Encryption Standard (DES) in Oct. 2000. The AES algorithm is a symmetric block cipher that processes data blocks of 128 bits. The data is operated by 10, 12 or 14 rounds of transformations with key length equal to 128, 192 or 256 bits.

A lot of hardware implementations of AES algorithm already have been proposed. They can be classified into two types: high speed designs and low-cost designs. Because of the increase of personal security requirement and mobile device usage, the low-cost design is the trend for AES implementation.

Among the existing low-cost designs, some focus on architecture design, such as Satoh’s work in [2]. Some focus on S-Box design, such as Canright’s work in [3]. Some focus on ultra low power AES design which can be used in RFID, which can be found in [4]. However, there are few proposals on reconfigurable design, which can be configured for different throughput up to the requirement of system. For power-limited system, reconfigurable design can provide different performance and power consumption is much more attractive than fixed architecture design.

In this paper, a very low-cost reconfigurable architecture for AES algorithm is proposed. This architecture can be configured into different modes which has different throughput and power consumption. The ability of reconfiguration is achieved by changing the number of S-box running in the data path. While using more S-box, the performance is increased, and the power consumption also be increased.

This paper is organized as follows. AES algorithm is introduced in Section 2. The reconfigurable architecture is presented in Section 3. The experimental results and comparison are given in Section 4. Finally, conclusion is provided in Section 5.

II. AES ALGORITHM

AES, also known as Rijndael, is the most popular algorithm used in symmetric key cryptography. AES operates on a 4×4 array of bytes termed the State. For encryption, it implements a round function 10, 12, 14 times (depends on the key length). The encryption and decryption flow of AES algorithm are shown in Fig. 1 (a) and (b). Four transformations including Subbytes, ShiftRows, MixColumns and Addroundkey are performed in the encryption process, and the other four inverse transformations are performed in the decryption process. A separate KeyExpansion unit is used to generate keys for each round of AES algorithm. In order to simplify the hardware implementation and support both of encryption and decryption, a hybrid dataflow is proposed, which is shown in Fig. 1 (c). This dataflow adjusts the order of some transformations. The advantage of this dataflow is that it reduces the number of MixColumns module from 2 to 1. (Normally, 2 MixColumns module are needed in AES, such as Satoh’s work in [2]).

Fig. 2 shows the operations in AES algorithm. The briefly introduction is listed as below:

1) SubBytes: The SubBytes operation is a non-linear by substitution that operates on each byte of the State using a substitution table.
2) ShiftRows: In the ShiftRows operation, the bytes in the last three rows of the State are cyclically shifted over different numbers of bytes.
3) MixColumns: Mixing operation which operates on the columns of the State using a linear transformation.
4) AddRoundKey: A Round Key is added to the State by a simple bitwise XOR operation.

The detailed description of these operations can be found in [1]. There are a lot of proposals about hardware design of these sub-modules. Especially the hardware reuse methods are proposed very much, which can be found in [2-6]. These
methods are valuable for low-cost implementation of AES algorithm.

III. RECONFIGURABLE ARCHITECTURE FOR AES ALGORITHM

A. Reconfigurable Architecture

The proposed reconfigurable architecture is shown in Fig. 3. This architecture is different from all of the existing designs, and some new ideas are introduced in this architecture.

Parallel Data Path with Different Bit-Width: There are two data paths in parallel in this design: a) SubBytes data path (8-bit, 16-bit and 32-bit). b) MixColumns data path (32-bit). The advantage of this design includes two points:

Firstly, it provides much more flexibility than serial data path. Most of the low-cost designs use serial data path which connects SubBytes, MixColumns in serial. In this way, all of the operations of AES algorithm should use the same bit width. This is not efficient. Different from serial design, our architecture separates the SubBytes and the MixColumns into two data paths. It can support different bit width operation for SubBytes and MixColumns.

Secondly, it achieves good performance. As the SubBytes module and MixColumns module in parallel data path, the critical path of our design is shorter than the serial data path design.

Reconfigurable S-box: Our architecture supports different bit width for SubBytes, so the number of S-box can be configured in the data path. As shown in Table I, SubBytes is an 8-bit operation. Every S-box in Fig. 3 executes one SubBytes operation. There are totally 4 S-boxes in our architecture, so it can support 4 SubBytes simultaneously. Since the S-box module consumes a lot of power, the power-aware ability can be achieved by adjusting the number of S-box running in the data path. While using less S-box in the data path, the throughput becomes lower and the power consumption also can be reduced.

<table>
<thead>
<tr>
<th>Operations</th>
<th>Min. bit width for operations</th>
</tr>
</thead>
<tbody>
<tr>
<td>(Inv)SubBytes</td>
<td>8-bit</td>
</tr>
<tr>
<td>(Inv)ShiftRows</td>
<td>-</td>
</tr>
<tr>
<td>(Inv)MixColumns</td>
<td>32-bit</td>
</tr>
<tr>
<td>AddRoundKey</td>
<td>1-bit</td>
</tr>
<tr>
<td>KeyExpansion</td>
<td>8-bit</td>
</tr>
</tbody>
</table>

32-bit MixColumns & AddRoundKey Module: From table 1, the bit width for MixColumns is 32-bit. We implement this module by using 32-bit I/O-width. The structure of this module is based on proposed hybrid dataflow in Fig. 1 (c). Only one MixColumns module is needed in this design.

Different from Satoh’s work in [2] and Feldhofer’s work in [4], our implementation achieves lower hardware cost than Satoh’s work with same performance, and higher performance than Feldhofer’s work with extra hardware cost.
B. Mode Configurations

There are three mode configurations in our design. Each configuration uses different number of S-box in the data path, so the throughput and power consumption of each configuration are also different. Fig. 4 shows these 3 configurations. Config.a only uses 1 S-box in the data path, and the bit-width for SubBytes module is 8-bit. It has lowest power consumption and speed. Config.b and Config.c use 2 and 4 S-boxes. Config.c uses 32-bit bit-width for SubBytes, it has highest throughput and highest power consumption.

C. Dataflow

The cycle accurate dataflow of 3 configurations is shown in Fig. 5. The round function of AES algorithm is divided into 3 steps: First Round, Round Loop and Last Round. As our architecture has parallel data path, some operations of AES can be executed in parallel, such as {Subbytes & MixColumns} and {KeyExpansion & MixColumns}.

Table II shows the number of clock cycles consumed in each round and the total number of clock cycles needed for AES encryption with 128-bit key length. From Table II, the Config.a which only uses 1 S-box needs much more clock cycles than other two configurations. The Config.c uses 4 S-boxes simultaneously, so it can save a lot of clock cycles.

By using the proposed reconfigurable architecture, mode configurations and dataflows, different performance and power consumption can be achieved. This design is very suitable for power-limited systems, such as mobile phone. The performance and power consumption of AES can be adapted to the bandwidth condition or the throughput of top-level system.

IV. EXPERIMENTAL RESULTS

Using the TSMC 0.18 µm CMOS standard cell library and the Synopsys Design Compiler Tools, the implementation results are shown in Table III. The total hardware cost for our design is 6986 Gates, and the frequency is 182 MHz. The power consumption of these 3 mode configurations is directly measured by synthesis result without doing power optimization. All of these data is got under 182 MHz system clock and 1.62V system voltage.

Table IV shows the comparison of our design with other low-cost implementations. Satoh’s design in [2] is a 32-bit serial data path design. We implement his design by ourselves to compare the serial data path design with parallel data path design under the same condition. Zhao’s design in [6] is a 32-bit pipelined serial data path design. Feldhofer’s design in [4] is 8-bit parallel data path design, and Pramstaller’s design in [7] is 32-bit parallel data path design.

Compared to other’s design, our design achieves both low hardware cost and resonable throughput. Moreover, our design has 3 mode configurations which can provide different throughput and power consumption. In order to achieve lower power consumption, other power optimization technologies such as clock gating also can be used in this design.

V. CONCLUSION

In this paper, we introduced a low-cost VLSI design of AES algorithm. A reconfigurable architecture which can support different number of S-box running in the data path is proposed, and it achieves performance configurability and power consumption configurability. This design is very suitable to be used in the power-limited mobile systems.
TABLE III
EXPERIMENTAL RESULTS @ 182MHz, 1.62V

<table>
<thead>
<tr>
<th>AES Components</th>
<th>Area</th>
<th>Config.a Power</th>
<th>Config.b Power</th>
<th>Config.c Power</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>Gates</td>
<td>mW</td>
<td>%</td>
<td>mW</td>
</tr>
<tr>
<td>ShiftRows+ Data Registers</td>
<td>1380</td>
<td>2.03</td>
<td>19.8%</td>
<td>2.50</td>
</tr>
<tr>
<td>S-Box 0</td>
<td>618</td>
<td>0</td>
<td>0%</td>
<td>0</td>
</tr>
<tr>
<td>S-Box 1</td>
<td>643</td>
<td>0</td>
<td>0%</td>
<td>0</td>
</tr>
<tr>
<td>S-Box 2</td>
<td>651</td>
<td>3.45</td>
<td>26.8%</td>
<td>3.56</td>
</tr>
<tr>
<td>S-Box 3</td>
<td>693</td>
<td>3.25</td>
<td>39.5%</td>
<td>3.56</td>
</tr>
<tr>
<td>MixColumns/InvMixcolumns</td>
<td>576</td>
<td>0.83</td>
<td>10.1%</td>
<td>0.82</td>
</tr>
<tr>
<td>Key Expander+Key Registers</td>
<td>1590</td>
<td>1.34</td>
<td>16.3%</td>
<td>1.49</td>
</tr>
<tr>
<td>Controller</td>
<td>250</td>
<td>0.26</td>
<td>3.2%</td>
<td>0.33</td>
</tr>
<tr>
<td>Others</td>
<td>585</td>
<td>0.52</td>
<td>6.3%</td>
<td>0.71</td>
</tr>
<tr>
<td>Total</td>
<td>6986</td>
<td>8.24</td>
<td>100%</td>
<td>12.87</td>
</tr>
</tbody>
</table>

TABLE IV
COMPARISON WITH OTHER’S WORK

<table>
<thead>
<tr>
<th>Ref</th>
<th>Tech</th>
<th>Gates</th>
<th>Freq</th>
<th>Throughput</th>
<th>Configurations</th>
</tr>
</thead>
<tbody>
<tr>
<td>[*]</td>
<td>0.18μm</td>
<td>7226</td>
<td>138MHz</td>
<td>327Mbps</td>
<td></td>
</tr>
<tr>
<td>[4]</td>
<td>0.35μm</td>
<td>3595</td>
<td>100KH z</td>
<td>12.6Kbps</td>
<td></td>
</tr>
<tr>
<td>[6]</td>
<td>0.25μm</td>
<td>12000</td>
<td>100MHz</td>
<td>256Mbps</td>
<td></td>
</tr>
<tr>
<td>[7]</td>
<td>0.6μm</td>
<td>8541</td>
<td>50MHz</td>
<td>70Mbps</td>
<td></td>
</tr>
<tr>
<td>Ours</td>
<td>0.18μm</td>
<td>6986</td>
<td>180MHz</td>
<td>360Mbps</td>
<td>Config.c</td>
</tr>
</tbody>
</table>

[*] 32-bit serial design using Satoh’s architecture in [2]

ACKNOWLEDGMENT
This research was supported by “Ambient SoC Global COE Program of Waseda University” of the Ministry of Education, Culture, Sports, Science and Technology, Japan.

REFERENCES