yaobin.wen

Yaobin's Blog

View on GitHub
14 July 2024

Understanding why memory alignment is needed from the hardware perspective

by yaobin.wen

To understand why memory alignment can make the operation faster in the first place, we need to understand how the hardware is implemented.

The memory

The smallest unit of information storage in the memory is one bit, i.e., a value of 0 or 1. On the hardware level, an S-R latch is the hardware that’s used to store one bit of information.

The smallest addressable unit in the memory is one byte, i.e., an array of 8 bits. Every byte is assigned with an address. To read/write a byte, the CPU must specify the corresponding address of that byte and read/write it as a whole. In other words, a byte is also the smallest readable/writable unit of information. On the hardware level, eight S-R latches are put together to implement one byte of storage.

System buses

On the motherboard, CPU and memory are separate pieces of hardware. They are connected via the system buses that consist of three buses:

Every bus is just a group of wires. Each wires has two states, 0 or 1, that represents one bit of information. The width of a bus is the number of wires that that bus contains. Width matters. For example:

Therefore, for the CPU and the memory to exchange data, they both must be wired to the system buses:

Wiring

Let’s use a data bus of width 16 as the example for discussion.

When the memory is wired to the data bus, they are wired this way:

Wiring between memory and data bus

From the diagram, we can see that data bus can transfer the data in the byte 0 and the byte 1 together, and the data in the byte 2 and the byte 3 together. The byte 2 and the byte 3 cannot be transferred together, because the byte 2 is wired to the wires 8 ~ 15 on the data bus, while the byte 3 is wired to the wires 0 ~ 7 on the data bus. The order they should be accessible (i.e., byte 2 followed by byte 3) is opposite to the order they are wired to the data bus (i.e., byte 3 is wired to the lower 8 wires while byte 2 is wired to the higher 8 wires).

As a result, if the CPU needs to get byte 1 and byte 2 as a pair, it needs to fetch them in two cycles:

This is the reason on the hardware level that memory alignment is recommended: if you make sure the data you want to access is at the address of 0, 2, 4, etc., the data can be accessed in fewer cycles than the misaligned data.

Tags: Tech - Hardware