How the RISC-V Multiply Extension Adds an Efficient 32-bit Multiply to the RV32I
The RISC-V instruction set architecture (ISA) has its origins in Berkley in 2010. While RISC stands for Reduced Instruction Set Computer/Core, manufacturers can’t resist taking a RISC ISA and adding an instruction here, a new addressing mode there, filling the opcode map until it’s more CISC than RISC. But the Berkley developers of RISC-V were pretty stringent about keeping their core to be a true RISC. The RV32I RISC-V ISA was designed to have only 47 base instructions (a number oddly meaningful to traditional Star Trek fans), and 11 years later, it still has the same number.
The original philosophy behind keeping the number of base instructions low is that a complex CISC instruction can be reproduced as a series of simple RISC instructions. In my experience, whether or not this increases code efficiency and reduces code size depends on the application. In the past, this has certainly been true. So much so that Arm added complex instructions to the opcode map.
While additional instructions can help improve performance, things get more complicated when you have a 32-bit core with 32-bit instructions, and then you want to add the ability to compress some 32-bit instructions into 16-bit instructions to save space. However, to add 16-bit instructions, the core needs to have additional room in the opcode map for these compressed instructions – and adding CISC instructions reduces the number of available opcodes.
This is where the advantage of RISC-V really shines. Arm added the Thumb2 compressed instruction format later and fitted these 16-bit instructions into the existing ISA by adding a separate 16-bit ISA. The RISC-V ISA, however, was designed from the start with an option for compressed instructions and so has only one ISA. This keeps the core simple and efficient, and also simplifies semiconductor design and test.
Enhancing the RISC-V RV32I ISA with a multiply instruction
Manufacturers can expand on the 47 instruction ISA by adding standardized instruction extensions (Figure 1). As the base ISA does not have multiply or divide instructions, the M extension provides that functionality. For example, an RV32I with the M extension would be designated RV32IM.
Figure 1: The RISC-V base ISA of 47 instructions can be expanded by adding standardized instruction extensions, denoted by a letter suffix after the core name. (Image source: RISC-V.org)
An example of a core with the M extension is the SparkFun Electronics RED-V Thing Plus, with an open-source 150 megahertz (MHz) Freedom E310 (FE310) 32-bit RISC-V microcontroller. The FE310 core is designated as an RV32IMAC. Besides the base integer math (I) capability, referring to Figure 1 indicates it supports integer multiply (M), atomic instructions (A), and compressed instructions (C).
The SparkFun DEV-15799 RED-V (pronounced “red five”) RISC-V evaluation board (Figure 2), has 32 megabytes (Mbytes) of program memory QSPI flash and has a USB-C connector that interfaces to a host computer for power, programming, and debugging. There is also an additional connector that can be used to provide battery power.
Figure 2: The SparkFun DEV-15799 board is used to evaluate the open-source 150 MHz FE310 RV32IMAC RISC-V core. It interfaces to a host computer via a USB-C interface. (Image source: SparkFun Electronics)
The M extension adds signed and unsigned 32/32 divide instructions DIV and DIVU, as well as signed and unsigned remainder instructions, REM and REMU. It also adds four multiply instructions:
- MUL performs a 32 x 32 register multiply and stores the lower 32 bits of the 64-bit result in a register.
- MULH and MULHU perform signed and unsigned register multiply, respectively, and store the upper 32 bits of the 64-bit result in a register.
- MULSHU performs a signed x unsigned register multiply and stores the upper 32 bits of the 64-bit result in a register.
So for a 32 x 32 = 64 unsigned multiply, the recommended code sequence is:
Where registers rs1 and rs2 are the multiplicand and multiplier and registers rdh and rdl are the upper and lower 32-bit results, respectively.
By breaking the 64-bit multiply result into two 32-bit operations, the ISA does not have to add a complex 32 x 32 = 64 CISC instruction. This is consistent with the RISC philosophy of using simple instructions to perform CISC operations.
While most of the instructions in the base RV32I ISA execute in only one instruction clock cycle, these multiply instructions in the RED-V FE310 require five. By this reasoning, the recommended code sequence above takes ten clock cycles. While this may be acceptable at 150 MHz, I’ve seen very low-power, low-clock speed microcontroller applications where interrupts were so critical that a ten cycle multiply at 5 MHz is too long to wait for a crucial interrupt. In these cases, I’ve seen firmware developers perform the multiply using a complex assembly subroutine that was allowed to be interrupted.
However, the FE310 core has the ability to take consecutive instructions and internally fuse them into one faster instruction via macro-ops fusion. The core microarchitecture can fuse the two instructions into one internal instruction that executes faster than ten cycles. The RISC-V microarchitecture does this automatically for some code sequences such as indexed loads, load-pair, and store-pair instructions, significantly improving execution speed. Even better, because the FE310 supports the “C” extension when two compatible 16-bit compressed instructions can be fused, it can provide both code and execution speed advantages.
While Arm added macro-ops fusion later in their architectures, just like compressed instructions, RISC-V was designed with macro-ops fusion from the start. The best way to really understand the advantages of code compression and when macro-ops fusion kicks in is to observe these behaviors with an evaluation board like the SparkFun DEV-15799. Code can be examined in the debugger to see how the FE310 microarchitecture fetches and executes each instruction. This allows you to better understand the behavior of the assembly language, which can help in writing efficient code with a C compiler that supports code compression and macro-ops fusion.
Conclusion
The RISC-V ISA proudly promotes itself as a truly reduced instruction set with only 47 base instructions. This can be enhanced with standardized extensions such as the “M” multiply extension which adds multiply and divide instructions. Macro-ops fusion, which is inherent in the RISC-V architecture, can speed code execution of compatible instructions such as consecutive multiply instructions, while the “C” compressed extension reduces code size. Both compressed instructions and macro-ops fusion give you significant performance advantages over other architectures.

Have questions or comments? Continue the conversation on TechForum, DigiKey's online community and technical resource.
Visit TechForum