Performance and Memory Trade-offs in SM4 Implementation on Embedded ARM Cortex-M Microcontrollers

Hoang-Gia Vu, Khanh-Tuong Tran, Tuan-Khang Nguyen, Dinh-Tuan Nguyen, Khanh-Nghia Truong, Hoai-Luan Pham

Abstract


The SM4 block cipher has been widely adopted in security applications across embedded systems, particularly as part of China’s national cryptographic standards. However, its practical deployment on resource-constrained microcontrollers remains challenging due to limited processing power and memory. This paper investigates the trade-offs between performance and memory usage in various SM4 software implementations on ARM Cortex-M microcontrollers. We evaluate and compare three implementation strategies: (1) S-box lookup tables stored in Flash and SRAM, (2) T-table optimization that combines substitution and transformation operations, and (3) direct computation of the S-box using Galois Field (GF) logic. Each implementation is benchmarked on a 32-bit STM32 microcontroller to measure encryption latency, SRAM and Flash memory usage, and code complexity. The results reveal that the T-table implementation in SRAM provides the best performance with the lowest encryption latency, albeit at the cost of high SRAM consumption. Conversely, the GF logic implementation minimizes memory usage but suffers from the slowest execution time. This study provides important insights for selecting the most suitable SM4 implementation strategy for embedded systems with varying resource constraints and real-time requirements.

References


https://csrc.nist.gov/projects/lightweight-cryptography?

https://www.st.com/en/microcontrollers-microprocessors/stm32f103/ documentation.html.

https://www.chinesestandard.us/products/gbt32907-2016

https://www.iso.org/standard/81564.html?

https://nvlpubs.nist.gov/nistpubs/fips/nist.fips.197.pdf

H. Kwon et al., "Optimized Implementation of SM4 on AVR Microcontrollers, RISC-V Processors, and ARM Processors," in IEEE Access, vol. 10, pp. 80225-80233, 2022, doi: 10.1109/ACCESS.2022.3195217.

Pu et al., “IoT-Oriented SM4 Lightweight Optimization Implementation”, Acta Electronica Sinica, 2024. Vol. 52(6), pp. 1888-1895. DOI 10.12263/DZXB.20230314.

Niu, Yan Bo, and An Ping Jiang. “A Low Power Design of SM4 Cipher Based on MUX S-Box Architecture.” Applied Mechanics and Materials, Vols. 411–414, pp. 125–130, September 2013.

Guo et al., “Efficient Constant-Time Implementation of SM4 with Intel GFNI and ARM NEON,” Cryptology ePrint 2022/1154.

Xin Miao, Lu Li, Chun Guo, Meiqin Wang, Weijia Wang, and Tom Chen. 2023. Bit-Sliced Implementation of SM4 and New Performance Records. IET Information Security 2023. https://doi.org/10.1049/2023/1821499

Liu Lingyun, “Software Implementation of SM4 in Python Programing Language,” International Conference on Modern Science and Scientific studies, vol. 3, no. 4, pp. 215–224, Apr. 2024

J. Li, W. Xie, L. Li, X. Wu, “Parallel Implementation and Optimization of SM4 Based on CUDA,” Applied Cryptography in Computer and Communications. AC3 2021. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 386, pp. 93-104, Springer, Cham.

P. Schwabe, K. Stoffelen, “All the AES You Need on Cortex-M3 and M4,” Selected Areas in Cryptography – SAC 2016. SAC 2016. Lecture Notes in Computer Science, vol 10532, pp. 180-194, Springer, Cham. https://doi.org/10.1007/978-3-319-69453-5_10

Eum, Siwoo, Hyunjun Kim, Hyeokdong Kwon, Minjoo Sim, Gyeongju Song, and Hwajeong Seo. 2022. "Parallel Implementations of ARIA on ARM Processors and Graphics Processing Unit" Applied Sciences 12, no. 23: 12246. https://doi.org/10.3390/app122312246.




DOI: http://dx.doi.org/10.21553/rev-jec.421

Copyright (c) 2026 REV Journal on Electronics and Communications


ISSN: 1859-378X

Copyright © 2011-2026
Radio and Electronics Association of Vietnam
All rights reserved