Inside Arm64 MMU: Unicorn Emulator vs Apache NuttX RTOS

📝 30 Mar 2025

A Demo of Arm64 Memory Management Unit (MMU)… in 18 Lines of Arm64 Assembly!

Spotted in Unicorn Emulator: A Demo of Arm64 Memory Management Unit (MMU)… in 18 Lines of Arm64 Assembly! (Pic above)

Today we decipher the code inside the Arm64 MMU Demo, figure out how it works. Which turns out to be surprisingly helpful for emulating Apache NuttX RTOS, compiled for Arm64 SBCs…

Arm64 Memory Management Unit

What’s this MMU again? (Pic above)

We need the Arm64 Memory Management Unit for…

If we don’t configure MMU correctly…

We dive deeper inside MMU…

§1 Memory Management Unit

Ah so MMU will allow this switcheroo business?

  1. MMU is Disabled initially

    Without MMU

  2. We read from Physical Address 0x4000_0000

  3. Enable the MMU: Map Virtual Address 0x8000_0000 to Physical Address 0x4000_0000

    Arm64 Memory Management Unit

  4. We read from Virtual Address 0x8000_0000

  5. Both reads produce the same value

Indeed! That’s precisely what our MMU Demo above shall do…

  1. Read from Physical Address 0x4000_0000

    // Read data from physical address
    // Into Register X1
    ldr X0, =0x4000_0000
    ldr X1, [X0]
  2. Map Virtual Address to Physical Address:

    0x8000_0000 becomes 0x4000_0000

    // Init the MMU Registers
    ldr X0, =0x1_8080_3F20
    msr TCR_EL1, X0
    ldr X0, =0xFFFF_FFFF
    msr MAIR_EL1, X0
    
    // Set the MMU Page Table
    adr X0, ttb0_base
    msr TTBR0_EL1, X0

    (We’ll explain this)

  3. Enable the MMU

    // Enable Caches and the MMU
    mrs X0, SCTLR_EL1
    orr X0, X0, #0x1         // M bit (MMU)
    orr X0, X0, #(0x1 << 2)  // C bit (data cache)
    orr X0, X0, #(0x1 << 12) // I bit (instruction cache)
    msr SCTLR_EL1, X0
    dsb SY
    isb

    (We’ll explain this)

  4. Read from Virtual Address 0x8000_0000

    // Read the same Memory Area through Virtual Address
    // Into Register X2
    ldr X0, =0x8000_0000
    ldr X2, [X0]
  5. Assuming that Physical Address 0x4000_0000 is filled with 44 44 44 44 …

    Both reads will produce the same value

    // Register X1 == Register X2
    x1 = 0x4444_4444_4444_4444
    x2 = 0x4444_4444_4444_4444

Yeah the steps for “Map Virtual Address” and “Enable The MMU” are extremely cryptic. We break them down…

§2 Level 1 Page Table

What’s this mystery code from above?

// Init the MMU Registers:
// TCR_EL1 becomes 0x1_8080_3F20
ldr X0, =0x1_8080_3F20  // Load 0x1_8080_3F20 into Register X0
msr TCR_EL1, X0         // Write X0 into System Register TCR_EL1

// MAIR_EL1 becomes 0xFFFF_FFFF
ldr X0, =0xFFFF_FFFF  // Load 0xFFFF_FFFF into Register X0
msr MAIR_EL1, X0      // Write X0 into System Register MAIR_EL1

// Set the MMU Page Table:
// TTBR0_EL1 becomes ttb0_base
adr X0, ttb0_base  // Load ttb0_base into Register X0
msr TTBR0_EL1, X0  // Write X0 into System Register TTBR0_EL1

This code will Map Virtual Addresses to Physical Addresses, so that 0x8000_0000 (virtually) becomes 0x4000_0000.

Later we’ll explain TCR and MAIR, but first…

What’s TTBR0_EL1? Why set it to ttb0_base?

That’s the Translation Table Base Register 0 for Exception Level 1.

It points to the Level 1 Page Table, telling MMU our Virtual-to-Physical Mapping. Suppose we’re mapping Four Chunks of 1 GB

Virtual AddressPhysical AddressSize
0x0000_00000x0000_00001 GB
0x4000_00000xC000_00001 GB
0x8000_00000x4000_00001 GB
0xC000_00000x8000_00001 GB

Our Level 1 Page Table (TTBR0_EL1) will be this…

Level 1 Page Table

Which we Store in RAM (ttb0_base) as…

AddressValueBecause
0x10000x0000_0741Page Table Entry #0
0x10080xC000_0741Page Table Entry #1
0x10100x4000_0741Page Table Entry #2
0x10180x8000_0741Page Table Entry #3

(See the Unicorn Log)

(And the Unicorn Code)

What if we read from 0x4000_0000 AFTER enabling MMU? (Physical Address 0xC000_0000)

We’ll see CC CC CC CC… because that’s how we populated Physical Address 0xC000_0000. Yep our MMU can remap memory in fun convoluted ways.

Why map 0x0000_0000 to itself?

Our code runs at 0x0000_0000. If we don’t map 0x0000_0000 to itself, there won’t be no runway for our demo.

For TTBR0_EL1: Why Exception Level 1?

Our code (NuttX Kernel) runs at Exception Level 1. Later we’ll run NuttX Apps at Exception Level 0, which has Less Privilege. That’s how we protect NuttX Kernel from getting messed up by NuttX Apps.

§3 Page Table Entry

In the Page Table Entries above: Why 741?

We decode the Page Table Entry based on VMSAv8-64 Block Descriptors (Page D8-6491). 0x741 says…

VMSAv8-64 Block Descriptors

NuttX defines the whole list here: arm64_mmu.h

// PTE descriptor can be Block descriptor or Table descriptor or Page descriptor
#define PTE_BLOCK_DESC              1U
#define PTE_TABLE_DESC              3U

// Block and Page descriptor attributes fields
#define PTE_BLOCK_DESC_MEMTYPE(x)   ((x) << 2)
#define PTE_BLOCK_DESC_NS           (1ULL << 5) // Non-Secure
#define PTE_BLOCK_DESC_AP_USER      (1ULL << 6) // User Read-Write
#define PTE_BLOCK_DESC_AP_RO        (1ULL << 7) // Kernel Read-Only
#define PTE_BLOCK_DESC_AP_RW        (0ULL << 7) // Kernel Read-Write
#define PTE_BLOCK_DESC_AP_MASK      (3ULL << 6)
#define PTE_BLOCK_DESC_NON_SHARE    (0ULL << 8)
#define PTE_BLOCK_DESC_OUTER_SHARE  (2ULL << 8)
#define PTE_BLOCK_DESC_INNER_SHARE  (3ULL << 8)
#define PTE_BLOCK_DESC_AF           (1ULL << 10) // A Flag
#define PTE_BLOCK_DESC_NG           (1ULL << 11) // Non-Global
#define PTE_BLOCK_DESC_DIRTY        (1ULL << 51) // D Flag
#define PTE_BLOCK_DESC_PXN          (1ULL << 53) // Kernel Execute Never
#define PTE_BLOCK_DESC_UXN          (1ULL << 54) // User Execute Never

Why Stage 1? Not Stage 2?

We’re doing Stage 1 Only: Single-Stage Translation from Virtual Address (VA) to Physical Address (PA). No need for Stage 2 and Intermediate Physical Address (IPA) (Page D8-6448)

Translation Table Walk

Why Inner vs Outer Shareable? Something about “Severance”?

Inner / Outer Sharing is for Multiple CPU Cores, which we’ll ignore for now (Page B2-293)

Inner / Outer Sharing

(PE = Processing Element = One Arm64 Core)

§4 Translation Control Register

What’s TCR_EL1? Why set it to 0x1_8080_3F20?

// Init the MMU Registers:
// TCR_EL1 becomes 0x1_8080_3F20
ldr X0, =0x1_8080_3F20  // Load 0x1_8080_3F20 into Register X0
msr TCR_EL1, X0         // Write X0 into System Register TCR_EL1

// MAIR_EL1 becomes 0xFFFF_FFFF
ldr X0, =0xFFFF_FFFF  // Load 0xFFFF_FFFF into Register X0
msr MAIR_EL1, X0      // Write X0 into System Register MAIR_EL1

That’s the Translation Control Register for Exception Level 1. According to TCR_EL1 Doc, 0x1_8080_3F20 decodes as…

Translation Control Register

What about MAIR?

// MAIR_EL1 becomes 0xFFFF_FFFF
ldr X0, =0xFFFF_FFFF  // Load 0xFFFF_FFFF into Register X0
msr MAIR_EL1, X0      // Write X0 into System Register MAIR_EL1

Hmmm 0xFFFF_FFFF looks kinda fake? Unicorn Emulator probably ignores the MAIR Bits. We’ll see a Real MAIR in a while.

§5 Enable the MMU

Wrapping up our Mystery Code: This is how we Enable the MMU

// Read System Register SCTLR_EL1 into X0
mrs X0, SCTLR_EL1

// In X0: Set the bits to Enable MMU, Data Cache and Instruction Cache
orr X0, X0, #0x1         // M bit (MMU)
orr X0, X0, #(0x1 << 2)  // C bit (Data Cache)
orr X0, X0, #(0x1 << 12) // I bit (Instruction Cache)

// Write X0 into System Register SCTLR_EL1
msr SCTLR_EL1, X0

// Flush the Data Cache and Instruction Cache
dsb SY ; isb

SCTLR_EL1 is for?

The System Control Register in Exception Level 1. We set these bits to Enable the MMU with Caching

System Control Register

We’re ready to run the demo…

Arm64 Memory Management Unit

§6 Run the MMU Demo

This is how we run the MMU Demo in Unicorn Emulator: main.rs

// Arm64 Machine Code for our MMU Demo, based on https://github.com/unicorn-engine/unicorn/blob/master/tests/unit/test_arm64.c#L378-L486
// Disassembly: https://github.com/lupyuen/nuttx-arm64-emulator/blob/qemu/src/main.rs#L556-L583
let arm64_code = [
  0x00, 0x81, 0x00, 0x58, 0x01, 0x00, 0x40, 0xf9, 0x00, 0x81, 0x00, 0x58, 0x40, 0x20, 0x18,
  0xd5, 0x00, 0x81, 0x00, 0x58, 0x00, 0xa2, 0x18, 0xd5, 0x40, 0x7f, 0x00, 0x10, 0x00, 0x20,
  0x18, 0xd5, 0x00, 0x10, 0x38, 0xd5, 0x00, 0x00, 0x7e, 0xb2, 0x00, 0x00, 0x74, 0xb2, 0x00,
  0x00, 0x40, 0xb2, 0x00, 0x10, 0x18, 0xd5, 0x9f, 0x3f, 0x03, 0xd5, 0xdf, 0x3f, 0x03, 0xd5,
  0xe0, 0x7f, 0x00, 0x58, 0x02, 0x00, 0x40, 0xf9, 0x00, 0x00, 0x00, 0x14, 0x1f, 0x20, 0x03,
  0xd5, 0x1f, 0x20, 0x03, 0xd5, 0x1F, 0x20, 0x03, 0xD5, 0x1F, 0x20, 0x03, 0xD5,       
];

// Init the Emulator in Arm64 mode
let mut unicorn = Unicorn::new(
  Arch::ARM64,
  Mode::LITTLE_ENDIAN
).expect("failed to init Unicorn");

// Enable the MMU Translation
let emu = &mut unicorn;
emu.ctl_tlb_type(unicorn_engine::TlbType::CPU).unwrap();

// Map the Read/Write/Execute Memory at 0x0000 0000
emu.mem_map(
  0,       // Address
  0x2000,  // Size
  Permission::ALL  // Read/Write/Execute Access
).expect("failed to map memory");

// Write the Arm64 Machine Code to the emulated Executable Memory
const ADDRESS: u64 = 0;
emu.mem_write(
  ADDRESS, 
  &arm64_code
).expect("failed to write instructions");

We populate the Level 1 Page Table from earlier: main.rs

// Generate the Page Table Entries...
// Page Table Entry @ 0x1000: 0x0000_0741
// Physical Address: 0x0000_0000
// Bit 00-01: PTE_BLOCK_DESC=1
// Bit 06-07: PTE_BLOCK_DESC_AP_USER=1
// Bit 08-09: PTE_BLOCK_DESC_INNER_SHARE=3
// Bit 10:    PTE_BLOCK_DESC_AF=1  
let mut tlbe: [u8; 8] = [0; 8];
tlbe[0..2].copy_from_slice(&[0x41, 0x07]);
emu.mem_write(0x1000, &tlbe).unwrap();

// Page Table Entry @ 0x1008: 0xC000_0741
// Page Table Entry @ 0x1010: 0x4000_0741
// Page Table Entry @ 0x1018: 0x8000_0741
...

// Not the Page Table, but
// Data Referenced by our Assembly Code:
// Data @ 0x1020: 0x4000_0000
tlbe[0..4].copy_from_slice(&[0x00, 0x00, 0x00, 0x40]);
emu.mem_write(0x1020, &tlbe).unwrap();

// Data @ 0x1028: 0x1_8080_3F20
// Data @ 0x1030: 0xFFFF_FFFF
// Data @ 0x1038: 0x8000_0000
...

To verify that it works: We Fill the Physical Memory with 0x44 then 0x88 then 0xCC: main.rs

// 3 Chunks of Data filled with 0x44, 0x88, 0xCC respectively
let mut data:  [u8; 0x1000] = [0x44; 0x1000];
let mut data2: [u8; 0x1000] = [0x88; 0x1000];
let mut data3: [u8; 0x1000] = [0xcc; 0x1000];

// 0x4000_0000 becomes 0x44 44 44 44...
// 0x8000_0000 becomes 0x88 88 88 88...
// 0xC000_0000 becomes 0xCC CC CC CC...
emu.mem_map_ptr(0x40000000, 0x1000, Permission::READ,
  data.as_mut_ptr() as _).unwrap();
emu.mem_map_ptr(0x80000000, 0x1000, Permission::READ,
  data2.as_mut_ptr() as _).unwrap();
emu.mem_map_ptr(0xc0000000, 0x1000, Permission::READ,
  data3.as_mut_ptr() as _).unwrap();

Finally we Start the Emulator: main.rs

// Start the Unicorn Emulator
let err = emu.emu_start(0, 0x44, 0, 0);
println!("err={:?}", err);

// Read registers X0, X1, X2
let x0 = emu.reg_read(RegisterARM64::X0).unwrap();
let x1 = emu.reg_read(RegisterARM64::X1).unwrap();
let x2 = emu.reg_read(RegisterARM64::X2).unwrap();

// Check the values
assert!(x0 == 0x80000000);
assert!(x1 == 0x4444444444444444);
assert!(x2 == 0x4444444444444444);

And it works!

## Here are Registers X0, X1 and X2
err = Ok(())
x0  = 0x8000_0000
x1  = 0x4444_4444_4444_4444
x2  = 0x4444_4444_4444_4444

(See the Unicorn Log)

§7 NuttX crashes in Unicorn

What’s Unicorn Emulator got to do with Apache NuttX RTOS?

Two Years Ago: We tried creating a PinePhone Emulator with NuttX and Unicorn. But NuttX kept crashing on Unicorn…

## Compile Simplified NuttX for QEMU Arm64 (Kernel Build)
git clone https://github.com/lupyuen2/wip-nuttx nuttx --branch unicorn-qemu-before
git clone https://github.com/lupyuen2/wip-nuttx-apps apps --branch unicorn-qemu
cd nuttx
tools/configure.sh qemu-armv8a:knsh
make -j

## Dump the disassembly to nuttx.S
aarch64-none-elf-objdump \
  --syms --source --reloc --demangle --line-numbers --wide --debugging \
  nuttx \
  >nuttx.S \
  2>&1

## NuttX boots OK on QEMU.
## NSH Shell won't appear yet because we haven't compiled the NuttX Apps.
qemu-system-aarch64 \
  -semihosting \
  -cpu cortex-a53 \
  -nographic \
  -machine virt,virtualization=on,gic-version=3 \
  -net none \
  -chardev stdio,id=con,mux=on \
  -serial chardev:con \
  -mon chardev=con,mode=readline \
  -kernel ./nuttx

## But NuttX crashes in Unicorn Emulator (Remember to Disable MMU Logging)
## Here's the funny thing: Unicorn is actually based on QEMU!
git clone https://github.com/lupyuen/nuttx-arm64-emulator --branch qemu \
  $HOME/nuttx-arm64-emulator
cp nuttx nuttx.bin nuttx.S \
  $HOME/nuttx-arm64-emulator/nuttx/
cd $HOME/nuttx-arm64-emulator
cargo run

## err=Err(EXCEPTION)
## PC=0x402805f0
## call_graph:  setup_page_tables --> ***_HALT_***
## call_graph:  click setup_page_tables href "https://github.com/apache/nuttx/blob/master/arch/arm64/src/common/arm64_mmu.c#L546" "arch/arm64/src/common/arm64_mmu.c " _blank
## env.exception = { syndrome:2248146949, fsr:517, vaddress:1344798719, target_el:1 }

Two Years Later: The bug stops here! Let’s fix it today.

Where does it crash?

According to Unicorn Log: Our Simplified NuttX crashes here in Unicorn Emulator: arm64_mmu.c

// NuttX enables the MMU for Exception Level 1
static void enable_mmu_el1(unsigned int flags) {

  // Set the MAIR, TCR and TTBR registers
  write_sysreg(MEMORY_ATTRIBUTES, mair_el1);
  write_sysreg(get_tcr(1), tcr_el1);
  write_sysreg(base_xlat_table, ttbr0_el1);

  // Ensure the above updates are committed
  // before we enable the MMU: `dsb sy ; isb`
  UP_MB();

  // Read the System Control Register (Exception Level 1)
  uint64_t value = read_sysreg(sctlr_el1);

  // Update the System Control Register (Exception Level 1)
  // Enable the MMU, Data Cache and Instruction Cache
  write_sysreg(
    value 
    | (1 <<  0)  // Set Bit 00: M_BIT (Enable MMU)
    | (1 <<  2)  // Set Bit 02: C_BIT (Enable Data Cache)
    | (1 << 12), // Set Bit 12: I_BIT (Enable Instruction Cache)
    sctlr_el1
  );

  // Oops! Unicorn Emulator fails with an Arm64 Exception
  // syndrome = 2248146949, fsr = 517, vaddress = 1344798719, target_el = 1

(NuttX defines SCTLR_EL1 in arm64_arch.h)

Which is mighty similar to the MMU Demo that we saw earlier…

// MMU Demo Works OK:
// Read System Register SCTLR_EL1 into X0
mrs X0, SCTLR_EL1

// In X0: Set the bits to Enable MMU, Data Cache and Instruction Cache
orr X0, X0, #0x1         // M bit (MMU)
orr X0, X0, #(0x1 << 2)  // C bit (Data Cache)
orr X0, X0, #(0x1 << 12) // I bit (Instruction Cache)

// Write X0 into System Register SCTLR_EL1
msr SCTLR_EL1, X0

Maybe our Page Tables are bad? Or Translation Control Register? We investigate…

§8 Level 1 and 2 Page Tables

NuttX on Unicorn Emulator will fail with this Arm64 Exception

env.exception =
  Syndrome:        0x8600_0005
  FSR:             0x0000_0205
  Virtual Address: 0x5027_ffff (Why?)
  Target Exception Level: 1

Which means: “Oops! Can’t enable MMU”

To troubleshoot, we enable MMU Logging: arm64_mmu.c

// Enable MMU Logging
#define CONFIG_MMU_ASSERT   1
#define CONFIG_MMU_DEBUG    1
#define CONFIG_MMU_DUMP_PTE 1
#define trace_printf _info
#undef sinfo
#define sinfo _info

We simplify the Memory Regions: qemu_boot.c

Virtual AddressPhysical AddressSize
0x0000_00000x0000_00001 GB
0x4000_00000x4000_00008 MB

// NuttX Memory Regions for Arm64 MMU (Simplified)
struct arm_mmu_region g_mmu_regions[] = {

  // Memory Region for I/O Memory
  MMU_REGION_FLAT_ENTRY(
    "DEVICE_REGION",  // Name
    0x0000_0000,      // Start Address
    0x4000_0000,      // Size: 1 GB
    MT_DEVICE_NGNRNE | MT_RW),  // Read-Write I/O Memory

  // Memory Region for RAM
  MMU_REGION_FLAT_ENTRY(
    "DRAM0_S0",   // Name
    0x4000_0000,  // Start Address
    0x0080_0000,  // Size: 8 MB
    MT_NORMAL | MT_RW | MT_EXECUTE),  // Allow Read, Write and Execute

};  // Other Memory Regions? We removed them all

According to NuttX QEMU Log: NuttX creates a Two-Level Page Table

Level 1 Page Table for NuttX

(PXN / UXN = Privileged / User Never-Execute)

Why Two Levels? Because we’re mapping 8 MB of RAM, instead of a Complete 1 GB Chunk. Thus we break up into Level 2 with Smaller 2 MB Chunks

Level 2 Page Table for NuttX

Looks legit, we move on…

§9 Translation Control Register for NuttX

What about the Translation Control Register?

We check the NuttX QEMU Log, with MMU Logging Enabled

get_tcr: Virtual Address Bits: 36
get_tcr: Bit 32-33: TCR_EL1_IPS=1
get_tcr: Bit 23:    TCR_EPD1_DISABLE=1
get_tcr: Bit 00-05: TCR_T0SZ=0x1c
get_tcr: Bit 08-09: TCR_IRGN_WBWA=1
get_tcr: Bit 10-11: TCR_ORGN_WBWA=1
get_tcr: Bit 12-13: TCR_SHARED_INNER=3
get_tcr: Bit 14-15: TCR_TG0_4K=0
get_tcr: Bit 30-31: TCR_TG1_4K=2
get_tcr: Bit 37-38: TCR_TBI_FLAGS=0

enable_mmu_el1: tcr_el1   = 0x1_8080_351C
enable_mmu_el1: mair_el1  = 0xFF_440C_0400
enable_mmu_el1: ttbr0_el1 = 0x402B_2000

According to TCR_EL1 Doc, 0x1_8080_351C decodes as…

Translation Control Register for NuttX

Hmmm something looks different…

(We spoke about Innies and Outies earlier)

(Decoding the Bits with JavaScript)

§10 NuttX vs MMU Demo

MMU Demo works OK, but NuttX doesn’t. How are they different?

Based on the info above, we compare NuttX vs MMU Demo for the Translation Control Register…

NuttX QEMUMMU Demo

T0SZ = 0x1C
36 bits of Virtual Address Space

T0SZ = 0x20
32 bits of Virtual Address Space

IRGN0_WBWA = 1
Write-Allocate Cacheable (Inner)

IRGN0_WBNWA = 3
No Write-Allocate Cacheable (Inner)

ORGN0_WBWA = 1
Write-Allocate Cacheable (Outer)

ORGN0_WBNWA = 3
No Write-Allocate Cacheable (Outer)

Won’t Boot On Unicorn

Works OK On Unicorn

Ah we see a major discrepancy…

We fix the Virtual Addresses…

NuttX Boot Flow with MMU Enabled

§11 32 Bits of Virtual Address

Remember NuttX was using 36 Bits for Virtual Address Space? We cut down to 32 Bits: knsh/defconfig

## Set the Virtual Address Space to 32 bits
CONFIG_ARM64_VA_BITS=32

## Previously: Virtual Address Space was 36 bits
## CONFIG_ARM64_VA_BITS=36

Inside Translation Control Register (TCR_EL1): T0SZ becomes 32 bits…

get_tcr: Virtual Address Bits: 32
get_tcr: Bit 32-33: TCR_EL1_IPS=1
get_tcr: Bit 23:    TCR_EPD1_DISABLE=1
get_tcr: Bit 00-05: TCR_T0SZ=0x20
get_tcr: Bit 08-09: TCR_IRGN_WBWA=1
get_tcr: Bit 10-11: TCR_ORGN_WBWA=1
get_tcr: Bit 12-13: TCR_SHARED_INNER=3
get_tcr: Bit 14-15: TCR_TG0_4K=0
get_tcr: Bit 30-31: TCR_TG1_4K=2
get_tcr: Bit 37-38: TCR_TBI_FLAGS=0

enable_mmu_el1: tcr_el1   = 0x1_8080_3520
enable_mmu_el1: mair_el1  = 0xFF_440C_0400
enable_mmu_el1: ttbr0_el1 = 0x402B_2000

(See the QEMU Log)

NuttX now enables MMU successfully in Unicorn yay! (Pic above)

hook_block:  address=0x402805a4, size=08, setup_page_tables, arch/arm64/src/common/arm64_mmu.c:547:29
call_graph:  enable_mmu_el1 --> setup_page_tables
call_graph:  click enable_mmu_el1 href "https://github.com/apache/nuttx/blob/master/arch/arm64/src/common/arm64_mmu.c#L616" "arch/arm64/src/common/arm64_mmu.c " _blank
hook_block:  address=0x40280614, size=16, enable_mmu_el1, arch/arm64/src/common/arm64_mmu.c:608:3
call_graph:  setup_page_tables --> enable_mmu_el1
call_graph:  click setup_page_tables href "https://github.com/apache/nuttx/blob/master/arch/arm64/src/common/arm64_mmu.c#L546" "arch/arm64/src/common/arm64_mmu.c " _blank
hook_block:  address=0x4028062c, size=04, enable_mmu_el1, arch/arm64/src/common/arm64_mmu.c:617:3
hook_block:  address=0x40280380, size=88, arm64_boot_el1_init, arch/arm64/src/common/arm64_boot.c:215:1
call_graph:  enable_mmu_el1 --> arm64_boot_el1_init

(See the Unicorn Log)

Arm64 Page Tables

Reducing Virtual Addresses from 36 Bits to 32 Bits: Why did it work?

Needs More Investigation: Maybe NuttX didn’t populate the Page Tables completely for 36 Bits? (Something about 0x5027_FFFF?)

For Now: 32-bit Virtual Addresses are totally sufficient. And NuttX boots OK on Unicorn!

Why are we doing all this: NuttX on Unicorn?

We’re about to create a NuttX Emulator for Avaota-A1 Arm64 SBC (Allwinner A527), based on Unicorn Emulator. So that we can Build and Test NuttX on the Avaota-A1 Emulator, without requiring the Actual Hardware. (NuttX Boot Flow for Avaota-A1)

After switching to 32-bit Virtual Address: Any change to the Page Tables?

The Page Tables are identical. Thanks to Unicorn, we learnt so much about arm64_mmu.c! One more fun thing to do…

NuttX Boot Flow

§12 NuttX Boot Flow

Inside the Unicorn Log: Why the funny arrows?

call_graph:  enable_mmu_el1 --> setup_page_tables
call_graph:  click enable_mmu_el1 href "https://github.com/apache/nuttx/blob/master/arch/arm64/src/common/arm64_mmu.c#L616" "arch/arm64/src/common/arm64_mmu.c " _blank
call_graph:  setup_page_tables --> enable_mmu_el1
call_graph:  click setup_page_tables href "https://github.com/apache/nuttx/blob/master/arch/arm64/src/common/arm64_mmu.c#L546" "arch/arm64/src/common/arm64_mmu.c " _blank
call_graph:  enable_mmu_el1 --> arm64_boot_el1_init

That’s because our Unicorn Emulator renders the NuttX Boot Flow (pic above) as a Clickable Mermaid Flowchart. It describes how NuttX boots on Arm64…

Here are the steps to produce the Mermaid Flowchart

## Boot NuttX in Unicorn Emulator. Capture the Mermaid Flowchart.
git clone https://github.com/lupyuen/nuttx-arm64-emulator --branch qemu \
  $HOME/nuttx-arm64-emulator
cd $HOME/nuttx-arm64-emulator
cargo run | grep call_graph | colrm 1 13 \
  >$HOME/nuttx-arm64-emulator/nuttx-boot-flow.mmd

## Omitted: Clean up the bad syntax in nuttx-boot-flow.mmd
vi $HOME/nuttx-arm64-emulator/nuttx-boot-flow.mmd

## Convert the Mermaid Flowchart to PDF
sudo docker pull minlag/mermaid-cli
sudo docker run \
  --rm -u `id -u`:`id -g` -v \
  $HOME/nuttx-arm64-emulator:/data minlag/mermaid-cli \
  --configFile="mermaidRenderConfig.json" \
  -i nuttx-boot-flow.mmd \
  -o nuttx-boot-flow.pdf

## Then change ".pdf" above to ".png" or ".svg"

(nuttx-boot-flow.mmd is here)

How did we create the Mermaid Flowchart? Check the details here…

Unicorn is stuck forever in PL011 UART Driver

Why won’t Unicorn boot to NSH Shell?

We haven’t emulated the PL011 UART Hardware, that’s why Unicorn is looping forever while printing System Messages. Hope to fix it someday! (Pic above)

That should keep us busy for a loooong while?

One Last Thing: Suppose we’re in some Wacky Alternate Universe in which Rust was invented before C. What would arm64_mmu.c look like? Might be fun to take a peek at the Alternate Version of arm64_mmu.c 🤔

Unicorn Emulator for Avaota-A1 SBC

Unicorn Emulator for Avaota-A1 SBC

§13 What’s Next

Special Thanks to My Sponsors for supporting my writing. Your support means so much to me 🙏

Got a question, comment or suggestion? Create an Issue or submit a Pull Request here…

lupyuen.org/src/unicorn3.md

§14 Appendix: Simplified NuttX for QEMU

In this article we took NuttX for QEMU Arm64 (Kernel Build) and made it smaller and simpler.

Why did we Simplify NuttX? So we can be as close to MMU Demo as possible, and isolate the crashing problem. This is how we Build and Test our simpler version of NuttX for QEMU Arm64 (Kernel Build)…

## Before Fixing: Compile Simplified NuttX for QEMU Arm64 (Kernel Build)
git clone https://github.com/lupyuen2/wip-nuttx nuttx \
  --branch unicorn-qemu-before
git clone https://github.com/lupyuen2/wip-nuttx-apps apps \
  --branch unicorn-qemu
cd nuttx
tools/configure.sh qemu-armv8a:knsh
make -j

## Dump the disassembly to nuttx.S
aarch64-none-elf-objdump \
  --syms --source --reloc --demangle --line-numbers --wide --debugging \
  nuttx \
  >nuttx.S \
  2>&1

## NuttX boots OK on QEMU.
## NSH Shell won't appear yet because we haven't compiled the NuttX Apps.
qemu-system-aarch64 \
  -semihosting \
  -cpu cortex-a53 \
  -nographic \
  -machine virt,virtualization=on,gic-version=3 \
  -net none \
  -chardev stdio,id=con,mux=on \
  -serial chardev:con \
  -mon chardev=con,mode=readline \
  -kernel ./nuttx

## But NuttX crashes in Unicorn Emulator.
## Remember to Disable MMU Logging.
git clone https://github.com/lupyuen/nuttx-arm64-emulator --branch qemu \
  $HOME/nuttx-arm64-emulator
cp nuttx nuttx.bin nuttx.S \
  $HOME/nuttx-arm64-emulator/nuttx/
cd $HOME/nuttx-arm64-emulator
cargo run

## err=Err(EXCEPTION)
## PC=0x402805f0
## call_graph:  setup_page_tables --> ***_HALT_***
## call_graph:  click setup_page_tables href "https://github.com/apache/nuttx/blob/master/arch/arm64/src/common/arm64_mmu.c#L546" "arch/arm64/src/common/arm64_mmu.c " _blank
## env.exception={syndrome:2248146949, fsr:517, vaddress:1344798719, target_el:1}

(Before Fix: Unicorn Log)

(Before Fix: QEMU Log)

To fix the crashing bug, we reduced the Virtual Address Size

The Fixed Version (that won’t crash in Unicorn) is here…

## After Fixing: Simplified NuttX for QEMU Arm64 (Kernel Build)
git clone https://github.com/lupyuen2/wip-nuttx nuttx \
  --branch unicorn-qemu-after
git clone https://github.com/lupyuen2/wip-nuttx-apps apps \
  --branch unicorn-qemu

(After Fix: Unicorn Log)

(After Fix: QEMU Log)

For QEMU Testing: Enable MMU Logging by uncommenting the lines below.

For Unicorn Emulator: Don’t enable MMU Logging, because the PL011 UART Driver will get stuck. Comment out the lines below.

From arch/arm64/src/common/arm64_mmu.c:

// Enable MMU Logging
#define CONFIG_MMU_ASSERT   1
#define CONFIG_MMU_DEBUG    1
#define CONFIG_MMU_DUMP_PTE 1
#define trace_printf _info
#undef sinfo
#define sinfo _info

Here’s the Complete List of Changes for our Simplified NuttX. Below are the highlights…

  1. Remove the MMU Regions: PCI*, nx*

    (Simplify the Memory Map)

  2. Set the RAM Size to 8 MB

    (Simplify the Page Tables)

  3. Enable the Data Cache and Instruction Cache

    (Sync with MMU Demo)

  4. Add TCR_TG1_4K

    (Missing from NuttX. Should this be fixed?)

  5. Change Physical Address from 48 to 36 bits

    (Sync with MMU Demo)

  6. Reduce MMU Translation Tables from 10 to 1

    (Simplify the Page Tables)

  7. Disable Device Tree

    (Unicorn won’t boot with Device Tree)

  8. Disable PSCI

    (Unicorn won’t boot with PSCI)

  9. Added MMU Logging

    (Lotsa logs in arch/arm64/src/common/arm64_mmu.c)

  10. The changes above: Could they contribute to NuttX booting successfully on Unicorn? It’s possible, we might have missed something.

    (Before Fix: See the Modified Files)

    (After Fix: See the Modified Files)

Update: Unicorn definitely needs TCR_TG1_4K, otherwise MMU will fail. We verified with Avaota-A1 Emulator on Unicorn. Which means we should patch NuttX too?

§15 Appendix: Decoding the Bits with JavaScript

Here’s a nifty tricky to Decode The Bits for our Arm64 MMU Registers…

  1. In our Web Browser, launch the JavaScript Console

    Click Menu > More Tools > Developer Tools

  2. To decode 0x1_8080_3F20 for MMU Demo, we enter this…

    a=0x180803F20n
    for (i = 0n; i < 63n; i++) { if (a & (1n << i)) { console.log(`Bit ${i}`); } }
  3. We’ll see the Decoded Bits

    Bit 5
    Bit 8
    Bit 9
    Bit 10
    Bit 11
    Bit 12
    Bit 13
    Bit 23
    Bit 31
    Bit 32
  4. To decode 0x1_8080_351C for NuttX QEMU, we enter this…

    a=0x18080351Cn
    for (i = 0n; i < 63n; i++) { if (a & (1n << i)) { console.log(`Bit ${i}`); } }
  5. And we’ll see the Decoded Bits

    Bit 2
    Bit 3
    Bit 4
    Bit 8
    Bit 10
    Bit 12
    Bit 13
    Bit 23
    Bit 31
    Bit 32

Why the “n”?

The n suffix will enable BigInt Support in JavaScript. Without this, our Decoded Bits will overflow.