Memory Emulation

Memory emulation in TEMU is very flexible, the memory system uses a memory space object to carry out address decoding. The memory space object enables the arbitrary mapping of objects to different address ranges. The emulator will handle the address decoding, which is done very efficiently through a multi-level page table.

Memory Spaces

TEMU provides dynamic memory mapping. Memory mapping is done using the MemorySpace class. A CPU needs one memory space object connected to it. The memory space object does not contain actual memory, but rather it contains a memory map. It is possible to map in objects such as RAM, ROM and device models in a memory space.

The requirement is that the object being mapped implements the MemAccess interface. It can optionally implement the Memory interface as well (in which case the mapped object will support block accesses).

The memory space mapping, currently implements a 36 bit physical memory map (which corresponds to the SPARCv8 architecture definition).

Because it would be inefficient to access through this structure and to build up the memory transaction objects for the memory access interface for every memory access (including fetches), the translations are cached in an Address Translation Cache. The ATC maps virtual to host address for RAM and ROM only. Note that there are six ATCs: one each for read, write and execute operations, and in different variants for user and supervisor privileges.

Memory may have attributes set in some cases (such as for example breakpoints, watchpoints and SEU bits). If memory attributes are set on a page, that page cannot be put into the ATC. Therefore, attribute bits should be set only in exceptional cases.

To map an object in memory, there are two alternatives, one is to use the command line interface command memory-map. The other is to use the function temu_memoryMap().

Memory Transactions

Memory transactions are passed through the memory space via the temu_MemAccessIface interface. The memory hierarchy may need to pass along a lot of data for the simulation, so memory transactions are rather large objects that are passed as pointers.

By passing transaction objects around, users always have access to both virtual and physical addresses.

The memory transaction object contains a Value field. The field is 64 bit (8 byte) in size, however the data unit passed in those 64 bits can be either 1, 2, 4 or 8 bytes. The data unit is encoded in the Size field’s lower two bits using the log-size of data unit size:

0: 1-byte transaction
1: 2-byte transaction
2: 4-byte transaction
3: 8-byte transaction

Large Transactions

Large transactions are special transactions where more than one data unit is transferred. Devices can opt-in to large transactions by setting the LargeTransactions in its capabilities.

Since data units are at most 8 byte in size, this means that large transactions primarily handle transfers of data blocks of > 8 bytes.

The purpose of these is to for example emulate DMA traffic, (subject to IOMMU handling), send / transfer list loading, etc.

In addition, large transactions can be used to load memories like RAM and ROM with data from the simulation infrastructure.

TEMU 2 only supported single data unit transactions. Large data blocks could be transferred with the MemoryIface, however these were not subject to IOMMU handling. Large transactions is a new feature in TEMU 3. The MemoryIface is now deprecated in favour of large transactions.

In a normal transaction, the single data unit is transferred by copying it in the Value field. In a large transaction, the Value field is instead expected to contain a pointer to a buffer with the data being transferred.

The buffer can consist of either 1-, 2-, 4- or 8-byte data units.

The type of the buffer is still encoded in the Size field’s lower two bits. A transaction is large if the upper 62 bits in the Size field is non-zero.

For large transactions, the Size field contains both the data unit type, and the number of data units that is pointed at in the Value field.

In the following example, the way to create a normal and large transaction is illustrated:

void
makeBuffer32(temu_MemTransaction *mt, size_t numWords, uint32_t *data)
{
  if (numWords == 1) {
    // Create a normal transaction
    mt->Value = data[0];
    mt->Size = 2;
  } else {
    // Create a large transaction
    // Value contains a pointer to the data
    mt->Value = (uintptr_t)(void*)data;
    // Data unit size is 4 bytes (use value 2 for encoding)
    // Number of words is the number of items going into the upper 62 bit
    mt->Size = (numWords << 2) | 2;
  }
}

Devices do not automatically support large transactions. Devices must opt-in to these using memory access capabilities.

In general, there is no need to support large accesses in a normal device, however if a custom memory is implemented (RAM/ROM/flash/etc), the device should opt-in to handle large transactions.

A large transactions are not subject to automatic endianness swap. Such devices should check the endianness bit in mt.Flags with TEMU_MT_BIG_ENDIAN or TEMU_MT_LITTLE_ENDIAN explicitly. This check should only be done for large transactions.

Endianness

Endianness support for memory transaction is built by both a device specifying its endianness and the transaction endianness being set in the Flags field.

The following two constants can be used to set the Flags accordingly:

TEMU_MT_BIG_ENDIAN
TEMU_MT_LITTLE_ENDIAN

Devices can specify device endianness using the Endianness field in its capabilities. If no endianness is specified (i.e. the capabilities function in the temu_MemAccessIface is not implemented), the system assumes big endian for the device.

If the memory space detects that a transaction is sent to a device of opposite endianness, the memory space will automatically preform byte swapping for non-large transactions.

Capabilities

Memory transactions are routed by the memory space to the correct device. However, some device models might emulate little endian devices. Other devices may for example only support word accesses.

By implementing the getCapabilities function in the memory access interface, a device can signal to the memory emulation system the endianness, or access sizes supported by the device.

Address Translation Cache

In order to get high performance of the emulation for systems with a paged memory management unit (MMU), the emulator caches virtual to physical to host address translations on a per page level. The lookup in the cache is very fast, but includes a two instruction hash followed by a tag check for every memory access (including instruction fetches).

In the case of an Address Translation Cache (ATC) miss, the emulator will call the memory space object’s memory access interface which will forward the access to the relevant device model.

Only RAM and ROM is cached in the ATC, and only if the relevant page does not contain any memory attributes (breakpoints, SEU, MEU etc).

It is possible for models or simulators to purge the ATC in a processor if needed. The means to do this is provided in the CPU interface. Example is given below.

// Purge 100 pages in the ATC starting with address 0
Device->Cpu.Iface->invalidateAtc(Device->Cpu.Obj, 0, 100, 0);

Note that in normal cases, models do not need to purge the ATC and it can safely be ignored, it is mostly needed by MMU models (that cannot be modelled by the user at present).

Figure 1. Memory access flow using the ATC

Memory Hierarchy and Caches

It is possible to manipulate the memory hierarchy when assembling your machine and connecting the object graph. A cache model can be inserted in the memory space object for more accurate performance modelling. Note that, unless the cache estimates the needed stall cycles on a per page basis, this means that the ATC cannot be used while a cache model is connected. Cache models therefore clears the Page pointer field in the memory transaction object to ensure that the ATC is not used for the memory access.

When the ATC is disabled, the performance of the emulator drops considerably and when a cache model is used that emulates the cache in an accurate manner, it drops even more.

Cache models can be connected to the preTransaction and postTransaction interfaces. Caches should typically only cache RAM and ROM, at present, the user needs to set the TEMU_MT_CACHEABLE flag when mapping a device which is cacheable. In principle the MMU should handle this, but at present the SR-MMU does not use the cacheable bits.

Cache models and any other models suitable for handling the pre and post transaction semantics should provide a way to chain an additional model after it. This way, multiple levels of caches and tracing modules can be inserted at will.

At present, cacheable objects are only respected as such if they have a size in multiple of page sizes.

To insert a cache model, the typical command sequence is:

# Remember to set the cacheable flag on cacheable memories.
memory-map memspace=mem0 addr=0x40000000 length=0x8000000 \
           object=ram0 cacheable=1

# Connect pre- and post- MemTransactionIfaces
connect a=mem0.preTransaction b=l1Cache0:PreAccessIface
connect a=mem0.postTransaction b=l1Cache0:PostAccessIface

A pre-transaction handler will intercept memory transactions before they are executed, it can therefore modify written data. A post-transaction handler will intercept memory transactions after they have been executed, the post transaction handler can therefore modify read data.

Currently, the memory system will look at cache timing from the pre-transaction handlers, but the post transaction handler must be connected to ensure that it can clear the Page pointer in the MemTransaction object.

The Generic Cache Model

The emulator (as of TEMU 2.1) comes bundled with a generic cache model. This model can be used to emulate caches with different number of associativity and line sizes. Most standard cache parameters can be configured in the system. Including the replacement policy (at the moment LRU, LRR and RND are supported), line size, word size, number of sets and number of ways. The generic cache model can also be configured as a split (Harward-architecture) cache, where instructions and data have their own blocks.

Note that when the cache is not split, the parameters (including the tags etc) will be turned into identical values.

The generic cache implements two copies of the cache interface, one for instructions and one for data. These are effectively identical if the caches are not split, so in that case which interface is not relevant.

Tracing Memory Accesses

It is possible to utilize the pre- and post-access handlers in the memory space to trace memory accesses. To do so, implement a model exposing the memory transaction interface. In the postAccess handler, the tracing model should clear the page pointer in the transaction object to disable ATC insertion of the memory access. Note that the pre access handler have access to the written value, and the post access handler have access to the read value. While the written value is normally there also in the postAccess handler, it will not be there for atomic exchange operations.

Interfaces

Memory Access Interface

The memory access interface defines the interface used by objects connected to the emulated memory system. The memory accesses are invoked by a CPU and can be either fetch, read or write operations.

typedef struct temu_MemTransaction {
  uint64_t Va; // 64 bit virtual for unified 32/64 bit interface.
  uint64_t Pa; // 64 bit physical address
  uint64_t Value; // Value
  uint64_t Size; // Size (see note above)
  uint64_t Offset; // Offset in bytes from start of mapping (used for determining register)
  temu_InitiatorType InitiatorType;
  temu_Object_ *Initiator; // Initiating object (normally processor, may be null)
  void *Page; // Page will be cached in the ATC for this memoy page
  uint64_t Cycles; // CPU cycles this memory transaction take
  uint32_t Flags;  // Flags for use in the memory hierarchy.

  void *IR; // Internal pointer
} temu_MemTransaction;

// Exposed to the emulator core by a memory object.
struct temu_MemAccessIface {
  void (*fetch)(void *Obj, temu_MemTransaction *Mt);
  void (*read)(void *Obj, temu_MemTransaction *Mt);
  void (*write)(void *Obj, temu_MemTransaction *Mt);
  void (*exchange)(void *Obj, temu_MemTransaction *Mt); // Optional
  void (*mapped)(void *Obj, uint64_t Pa, uint64_t Len); // Optional, called when interface is mapped into memoryspace
  const temu_MemAccessCapabilities *(*getCapabilities)(void *Obj); // Optional, return capabilities of device.
};

Memory Interface

Deprecation Notice

The memory interface has been deprecated in favor of large memory transactions. The memory interface is scheduled to be removed in TEMU 4.

The memory interface is a common interface for memory storage devices. It provides procedures for writing and reading larger blocks of memory. The interface takes an offset from the base address of the object is mapped (normally you use a memory space object to cover the physical address space).

The Size parameter is in bytes, and the Swap parameter specify the log-size in bytes of the data units to read or write. Note that the address/offset is assumed to be aligned at the unit size and the size will be truncated if it does not represent a whole number of data units.

I.e. when reading 64 bit words, the size should be 8, 16, 24, ... and the swap argument should be set to 3.

typedef struct temu_MemoryIface {
  void (*readBytes)(void *Obj,
                    void *Dest, uint64_t Offs, uint32_t Size,
		    int Swap);
  void (*writeBytes)(void *Obj,
                     uint64_t Offs, uint32_t Size, void *Src,
		     int Swap);
} temu_MemoryIface;