📝 30 Mar 2025
Spotted in Unicorn Emulator: A Demo of Arm64 Memory Management Unit (MMU)… in 18 Lines of Arm64 Assembly! (Pic above)
Today we decipher the code inside the Arm64 MMU Demo, figure out how it works. Which turns out to be surprisingly helpful for emulating Apache NuttX RTOS, compiled for Arm64 SBCs…
We look inside the Page Tables and Control Registers for MMU Demo
Study a mysterious bug that crashes NuttX on Unicorn Emulator
Somehow Unicorn won’t Enable the MMU for NuttX?
We simplify NuttX Kernel for QEMU and isolate
Aha it’s a problem with the VM Addressable Size!
Thanks to Unicorn: We render a detailed NuttX Boot Flow
Soon we’ll have a Unicorn Emulator for Avaota-A1 Arm64 SBC
What’s this MMU again? (Pic above)
We need the Arm64 Memory Management Unit for…
Memory Protection: Prevent Applications (and Kernel) from meddling with things (in System Memory) that they’re not supposed to
Virtual Memory: Allow Applications to access chunks of “Imaginary Memory” at Exotic Addresses (0x8000_0000!)
But in reality: They’re System RAM recycled from boring old addresses (like 0x40A0_4000)
If we don’t configure MMU correctly…
NuttX Kernel won’t boot: “Help! I can’t access my Kernel Code and Data!”
NuttX Apps won’t run: “Whoops where’s the App Code and Data that Kernel promised?”
We dive deeper inside MMU…
Ah so MMU will allow this switcheroo business?
MMU is Disabled initially
We read from Physical Address 0x4000_0000
Enable the MMU: Map Virtual Address 0x8000_0000 to Physical Address 0x4000_0000
We read from Virtual Address 0x8000_0000
Both reads produce the same value
Indeed! That’s precisely what our MMU Demo above shall do…
Read from Physical Address 0x4000_0000
// Read data from physical address
// Into Register X1
ldr X0, =0x4000_0000
ldr X1, [X0]
Map Virtual Address to Physical Address:
0x8000_0000 becomes 0x4000_0000
// Init the MMU Registers
ldr X0, =0x1_8080_3F20
msr TCR_EL1, X0
ldr X0, =0xFFFF_FFFF
msr MAIR_EL1, X0
// Set the MMU Page Table
adr X0, ttb0_base
msr TTBR0_EL1, X0
(We’ll explain this)
Enable the MMU
// Enable Caches and the MMU
mrs X0, SCTLR_EL1
orr X0, X0, #0x1 // M bit (MMU)
orr X0, X0, #(0x1 << 2) // C bit (data cache)
orr X0, X0, #(0x1 << 12) // I bit (instruction cache)
msr SCTLR_EL1, X0
dsb SY
isb
(We’ll explain this)
Read from Virtual Address 0x8000_0000
// Read the same Memory Area through Virtual Address
// Into Register X2
ldr X0, =0x8000_0000
ldr X2, [X0]
Assuming that Physical Address 0x4000_0000 is filled with 44 44 44 44 …
Both reads will produce the same value…
// Register X1 == Register X2
x1 = 0x4444_4444_4444_4444
x2 = 0x4444_4444_4444_4444
Yeah the steps for “Map Virtual Address” and “Enable The MMU” are extremely cryptic. We break them down…
What’s this mystery code from above?
// Init the MMU Registers:
// TCR_EL1 becomes 0x1_8080_3F20
ldr X0, =0x1_8080_3F20 // Load 0x1_8080_3F20 into Register X0
msr TCR_EL1, X0 // Write X0 into System Register TCR_EL1
// MAIR_EL1 becomes 0xFFFF_FFFF
ldr X0, =0xFFFF_FFFF // Load 0xFFFF_FFFF into Register X0
msr MAIR_EL1, X0 // Write X0 into System Register MAIR_EL1
// Set the MMU Page Table:
// TTBR0_EL1 becomes ttb0_base
adr X0, ttb0_base // Load ttb0_base into Register X0
msr TTBR0_EL1, X0 // Write X0 into System Register TTBR0_EL1
This code will Map Virtual Addresses to Physical Addresses, so that 0x8000_0000 (virtually) becomes 0x4000_0000.
Later we’ll explain TCR and MAIR, but first…
What’s TTBR0_EL1? Why set it to ttb0_base?
That’s the Translation Table Base Register 0 for Exception Level 1.
It points to the Level 1 Page Table, telling MMU our Virtual-to-Physical Mapping. Suppose we’re mapping Four Chunks of 1 GB…
Virtual Address | Physical Address | Size |
---|---|---|
0x0000_0000 | 0x0000_0000 | 1 GB |
0x4000_0000 | 0xC000_0000 | 1 GB |
0x8000_0000 | 0x4000_0000 | 1 GB |
0xC000_0000 | 0x8000_0000 | 1 GB |
Our Level 1 Page Table (TTBR0_EL1) will be this…
Which we Store in RAM (ttb0_base) as…
Address | Value | Because |
---|---|---|
0x1000 | 0x0000_0741 | Page Table Entry #0 |
0x1008 | 0xC000_0741 | Page Table Entry #1 |
0x1010 | 0x4000_0741 | Page Table Entry #2 |
0x1018 | 0x8000_0741 | Page Table Entry #3 |
What if we read from 0x4000_0000 AFTER enabling MMU? (Physical Address 0xC000_0000)
We’ll see CC CC CC CC… because that’s how we populated Physical Address 0xC000_0000. Yep our MMU can remap memory in fun convoluted ways.
Why map 0x0000_0000 to itself?
Our code runs at 0x0000_0000. If we don’t map 0x0000_0000 to itself, there won’t be no runway for our demo.
For TTBR0_EL1: Why Exception Level 1?
Our code (NuttX Kernel) runs at Exception Level 1. Later we’ll run NuttX Apps at Exception Level 0, which has Less Privilege. That’s how we protect NuttX Kernel from getting messed up by NuttX Apps.
In the Page Table Entries above: Why 741?
We decode the Page Table Entry based on VMSAv8-64 Block Descriptors (Page D8-6491). 0x741
says…
Bits 00-01: BLOCK_DESC = 1
This Page Table Entry describes a Block, not a Page
Bits 06-07: BLOCK_DESC_AP_USER = 1
This Block is Read-Writeable by Kernel, Read-Writeable by Apps
Bits 08-09: BLOCK_DESC_INNER_SHARE = 3
This Block is Inner Shareable (see below)
Bits 10-10: BLOCK_DESC_AF = 1
Allow this Virtual-to-Physical Mapping to be cached
Which means each chunk of Virtual Memory (like 0x4000_0000) is a Memory Block that’s accessible by Kernel and Apps
NuttX defines the whole list here: arm64_mmu.h
// PTE descriptor can be Block descriptor or Table descriptor or Page descriptor
#define PTE_BLOCK_DESC 1U
#define PTE_TABLE_DESC 3U
// Block and Page descriptor attributes fields
#define PTE_BLOCK_DESC_MEMTYPE(x) ((x) << 2)
#define PTE_BLOCK_DESC_NS (1ULL << 5) // Non-Secure
#define PTE_BLOCK_DESC_AP_USER (1ULL << 6) // User Read-Write
#define PTE_BLOCK_DESC_AP_RO (1ULL << 7) // Kernel Read-Only
#define PTE_BLOCK_DESC_AP_RW (0ULL << 7) // Kernel Read-Write
#define PTE_BLOCK_DESC_AP_MASK (3ULL << 6)
#define PTE_BLOCK_DESC_NON_SHARE (0ULL << 8)
#define PTE_BLOCK_DESC_OUTER_SHARE (2ULL << 8)
#define PTE_BLOCK_DESC_INNER_SHARE (3ULL << 8)
#define PTE_BLOCK_DESC_AF (1ULL << 10) // A Flag
#define PTE_BLOCK_DESC_NG (1ULL << 11) // Non-Global
#define PTE_BLOCK_DESC_DIRTY (1ULL << 51) // D Flag
#define PTE_BLOCK_DESC_PXN (1ULL << 53) // Kernel Execute Never
#define PTE_BLOCK_DESC_UXN (1ULL << 54) // User Execute Never
Why Stage 1? Not Stage 2?
We’re doing Stage 1 Only: Single-Stage Translation from Virtual Address (VA) to Physical Address (PA). No need for Stage 2 and Intermediate Physical Address (IPA) (Page D8-6448)
Why Inner vs Outer Shareable? Something about “Severance”?
Inner / Outer Sharing is for Multiple CPU Cores, which we’ll ignore for now (Page B2-293)
(PE = Processing Element = One Arm64 Core)
What’s TCR_EL1? Why set it to 0x1_8080_3F20?
// Init the MMU Registers:
// TCR_EL1 becomes 0x1_8080_3F20
ldr X0, =0x1_8080_3F20 // Load 0x1_8080_3F20 into Register X0
msr TCR_EL1, X0 // Write X0 into System Register TCR_EL1
// MAIR_EL1 becomes 0xFFFF_FFFF
ldr X0, =0xFFFF_FFFF // Load 0xFFFF_FFFF into Register X0
msr MAIR_EL1, X0 // Write X0 into System Register MAIR_EL1
That’s the Translation Control Register for Exception Level 1. According to TCR_EL1 Doc, 0x1_8080_3F20 decodes as…
Bits 00-05: T0SZ = 0x20
32 bits of Virtual Address Space
Bits 08-09: IRGN0_WBNWA = 3
Normal memory, Inner Write-Back Read-Allocate No Write-Allocate Cacheable
Bits 10-11: ORGN0_WBNWA = 3
Normal memory, Outer Write-Back Read-Allocate No Write-Allocate Cacheable
Bits 12-13: SH0_SHARED_INNER = 3
Inner Shareable for TTBR0_EL1
Bits 14-15: TG0_4K = 0
EL1 Granule Size (Page Size) is 4 KB for TTBR0_EL1
Bits 23-23: EPD1_DISABLE = 1
Perform translation table walks using TTBR1_EL1
Bits 30-31: TG1_4K = 2
EL1 Granule Size (Page Size) is 4 KB for TTBR1_EL1
Bits 32-34: EL1_IPS = 1
36 bits (64 GB) of Physical Address Space
Thus our MMU shall map 32-bit Virtual Addresses into 36-bit Physical Addresses. Each Physical Address points to a 4 KB Memory Page.
What about MAIR?
// MAIR_EL1 becomes 0xFFFF_FFFF
ldr X0, =0xFFFF_FFFF // Load 0xFFFF_FFFF into Register X0
msr MAIR_EL1, X0 // Write X0 into System Register MAIR_EL1
Hmmm 0xFFFF_FFFF looks kinda fake? Unicorn Emulator probably ignores the MAIR Bits. We’ll see a Real MAIR in a while.
Wrapping up our Mystery Code: This is how we Enable the MMU…
// Read System Register SCTLR_EL1 into X0
mrs X0, SCTLR_EL1
// In X0: Set the bits to Enable MMU, Data Cache and Instruction Cache
orr X0, X0, #0x1 // M bit (MMU)
orr X0, X0, #(0x1 << 2) // C bit (Data Cache)
orr X0, X0, #(0x1 << 12) // I bit (Instruction Cache)
// Write X0 into System Register SCTLR_EL1
msr SCTLR_EL1, X0
// Flush the Data Cache and Instruction Cache
dsb SY ; isb
SCTLR_EL1 is for?
The System Control Register in Exception Level 1. We set these bits to Enable the MMU with Caching…
Bit 0: M = 1
Enable MMU for Address Translation
Bit 2: C = 1
Enable the Data Cache
Bit 12: I = 1
Enable the Instruction Cache
We’re ready to run the demo…
This is how we run the MMU Demo in Unicorn Emulator: main.rs
// Arm64 Machine Code for our MMU Demo, based on https://github.com/unicorn-engine/unicorn/blob/master/tests/unit/test_arm64.c#L378-L486
// Disassembly: https://github.com/lupyuen/nuttx-arm64-emulator/blob/qemu/src/main.rs#L556-L583
let arm64_code = [
0x00, 0x81, 0x00, 0x58, 0x01, 0x00, 0x40, 0xf9, 0x00, 0x81, 0x00, 0x58, 0x40, 0x20, 0x18,
0xd5, 0x00, 0x81, 0x00, 0x58, 0x00, 0xa2, 0x18, 0xd5, 0x40, 0x7f, 0x00, 0x10, 0x00, 0x20,
0x18, 0xd5, 0x00, 0x10, 0x38, 0xd5, 0x00, 0x00, 0x7e, 0xb2, 0x00, 0x00, 0x74, 0xb2, 0x00,
0x00, 0x40, 0xb2, 0x00, 0x10, 0x18, 0xd5, 0x9f, 0x3f, 0x03, 0xd5, 0xdf, 0x3f, 0x03, 0xd5,
0xe0, 0x7f, 0x00, 0x58, 0x02, 0x00, 0x40, 0xf9, 0x00, 0x00, 0x00, 0x14, 0x1f, 0x20, 0x03,
0xd5, 0x1f, 0x20, 0x03, 0xd5, 0x1F, 0x20, 0x03, 0xD5, 0x1F, 0x20, 0x03, 0xD5,
];
// Init the Emulator in Arm64 mode
let mut unicorn = Unicorn::new(
Arch::ARM64,
Mode::LITTLE_ENDIAN
).expect("failed to init Unicorn");
// Enable the MMU Translation
let emu = &mut unicorn;
emu.ctl_tlb_type(unicorn_engine::TlbType::CPU).unwrap();
// Map the Read/Write/Execute Memory at 0x0000 0000
emu.mem_map(
0, // Address
0x2000, // Size
Permission::ALL // Read/Write/Execute Access
).expect("failed to map memory");
// Write the Arm64 Machine Code to the emulated Executable Memory
const ADDRESS: u64 = 0;
emu.mem_write(
ADDRESS,
&arm64_code
).expect("failed to write instructions");
We populate the Level 1 Page Table from earlier: main.rs
// Generate the Page Table Entries...
// Page Table Entry @ 0x1000: 0x0000_0741
// Physical Address: 0x0000_0000
// Bit 00-01: PTE_BLOCK_DESC=1
// Bit 06-07: PTE_BLOCK_DESC_AP_USER=1
// Bit 08-09: PTE_BLOCK_DESC_INNER_SHARE=3
// Bit 10: PTE_BLOCK_DESC_AF=1
let mut tlbe: [u8; 8] = [0; 8];
tlbe[0..2].copy_from_slice(&[0x41, 0x07]);
emu.mem_write(0x1000, &tlbe).unwrap();
// Page Table Entry @ 0x1008: 0xC000_0741
// Page Table Entry @ 0x1010: 0x4000_0741
// Page Table Entry @ 0x1018: 0x8000_0741
...
// Not the Page Table, but
// Data Referenced by our Assembly Code:
// Data @ 0x1020: 0x4000_0000
tlbe[0..4].copy_from_slice(&[0x00, 0x00, 0x00, 0x40]);
emu.mem_write(0x1020, &tlbe).unwrap();
// Data @ 0x1028: 0x1_8080_3F20
// Data @ 0x1030: 0xFFFF_FFFF
// Data @ 0x1038: 0x8000_0000
...
To verify that it works: We Fill the Physical Memory with 0x44 then 0x88 then 0xCC: main.rs
// 3 Chunks of Data filled with 0x44, 0x88, 0xCC respectively
let mut data: [u8; 0x1000] = [0x44; 0x1000];
let mut data2: [u8; 0x1000] = [0x88; 0x1000];
let mut data3: [u8; 0x1000] = [0xcc; 0x1000];
// 0x4000_0000 becomes 0x44 44 44 44...
// 0x8000_0000 becomes 0x88 88 88 88...
// 0xC000_0000 becomes 0xCC CC CC CC...
emu.mem_map_ptr(0x40000000, 0x1000, Permission::READ,
data.as_mut_ptr() as _).unwrap();
emu.mem_map_ptr(0x80000000, 0x1000, Permission::READ,
data2.as_mut_ptr() as _).unwrap();
emu.mem_map_ptr(0xc0000000, 0x1000, Permission::READ,
data3.as_mut_ptr() as _).unwrap();
Finally we Start the Emulator: main.rs
// Start the Unicorn Emulator
let err = emu.emu_start(0, 0x44, 0, 0);
println!("err={:?}", err);
// Read registers X0, X1, X2
let x0 = emu.reg_read(RegisterARM64::X0).unwrap();
let x1 = emu.reg_read(RegisterARM64::X1).unwrap();
let x2 = emu.reg_read(RegisterARM64::X2).unwrap();
// Check the values
assert!(x0 == 0x80000000);
assert!(x1 == 0x4444444444444444);
assert!(x2 == 0x4444444444444444);
And it works!
## Here are Registers X0, X1 and X2
err = Ok(())
x0 = 0x8000_0000
x1 = 0x4444_4444_4444_4444
x2 = 0x4444_4444_4444_4444
What’s Unicorn Emulator got to do with Apache NuttX RTOS?
Two Years Ago: We tried creating a PinePhone Emulator with NuttX and Unicorn. But NuttX kept crashing on Unicorn…
## Compile Simplified NuttX for QEMU Arm64 (Kernel Build)
git clone https://github.com/lupyuen2/wip-nuttx nuttx --branch unicorn-qemu-before
git clone https://github.com/lupyuen2/wip-nuttx-apps apps --branch unicorn-qemu
cd nuttx
tools/configure.sh qemu-armv8a:knsh
make -j
## Dump the disassembly to nuttx.S
aarch64-none-elf-objdump \
--syms --source --reloc --demangle --line-numbers --wide --debugging \
nuttx \
>nuttx.S \
2>&1
## NuttX boots OK on QEMU.
## NSH Shell won't appear yet because we haven't compiled the NuttX Apps.
qemu-system-aarch64 \
-semihosting \
-cpu cortex-a53 \
-nographic \
-machine virt,virtualization=on,gic-version=3 \
-net none \
-chardev stdio,id=con,mux=on \
-serial chardev:con \
-mon chardev=con,mode=readline \
-kernel ./nuttx
## But NuttX crashes in Unicorn Emulator (Remember to Disable MMU Logging)
## Here's the funny thing: Unicorn is actually based on QEMU!
git clone https://github.com/lupyuen/nuttx-arm64-emulator --branch qemu \
$HOME/nuttx-arm64-emulator
cp nuttx nuttx.bin nuttx.S \
$HOME/nuttx-arm64-emulator/nuttx/
cd $HOME/nuttx-arm64-emulator
cargo run
## err=Err(EXCEPTION)
## PC=0x402805f0
## call_graph: setup_page_tables --> ***_HALT_***
## call_graph: click setup_page_tables href "https://github.com/apache/nuttx/blob/master/arch/arm64/src/common/arm64_mmu.c#L546" "arch/arm64/src/common/arm64_mmu.c " _blank
## env.exception = { syndrome:2248146949, fsr:517, vaddress:1344798719, target_el:1 }
Two Years Later: The bug stops here! Let’s fix it today.
Where does it crash?
According to Unicorn Log: Our Simplified NuttX crashes here in Unicorn Emulator: arm64_mmu.c
// NuttX enables the MMU for Exception Level 1
static void enable_mmu_el1(unsigned int flags) {
// Set the MAIR, TCR and TTBR registers
write_sysreg(MEMORY_ATTRIBUTES, mair_el1);
write_sysreg(get_tcr(1), tcr_el1);
write_sysreg(base_xlat_table, ttbr0_el1);
// Ensure the above updates are committed
// before we enable the MMU: `dsb sy ; isb`
UP_MB();
// Read the System Control Register (Exception Level 1)
uint64_t value = read_sysreg(sctlr_el1);
// Update the System Control Register (Exception Level 1)
// Enable the MMU, Data Cache and Instruction Cache
write_sysreg(
value
| (1 << 0) // Set Bit 00: M_BIT (Enable MMU)
| (1 << 2) // Set Bit 02: C_BIT (Enable Data Cache)
| (1 << 12), // Set Bit 12: I_BIT (Enable Instruction Cache)
sctlr_el1
);
// Oops! Unicorn Emulator fails with an Arm64 Exception
// syndrome = 2248146949, fsr = 517, vaddress = 1344798719, target_el = 1
(NuttX defines SCTLR_EL1 in arm64_arch.h)
Which is mighty similar to the MMU Demo that we saw earlier…
// MMU Demo Works OK:
// Read System Register SCTLR_EL1 into X0
mrs X0, SCTLR_EL1
// In X0: Set the bits to Enable MMU, Data Cache and Instruction Cache
orr X0, X0, #0x1 // M bit (MMU)
orr X0, X0, #(0x1 << 2) // C bit (Data Cache)
orr X0, X0, #(0x1 << 12) // I bit (Instruction Cache)
// Write X0 into System Register SCTLR_EL1
msr SCTLR_EL1, X0
Maybe our Page Tables are bad? Or Translation Control Register? We investigate…
NuttX on Unicorn Emulator will fail with this Arm64 Exception…
env.exception =
Syndrome: 0x8600_0005
FSR: 0x0000_0205
Virtual Address: 0x5027_ffff (Why?)
Target Exception Level: 1
Which means: “Oops! Can’t enable MMU”
To troubleshoot, we enable MMU Logging: arm64_mmu.c
// Enable MMU Logging
#define CONFIG_MMU_ASSERT 1
#define CONFIG_MMU_DEBUG 1
#define CONFIG_MMU_DUMP_PTE 1
#define trace_printf _info
#undef sinfo
#define sinfo _info
We simplify the Memory Regions: qemu_boot.c
Virtual Address | Physical Address | Size |
---|---|---|
0x0000_0000 | 0x0000_0000 | 1 GB |
0x4000_0000 | 0x4000_0000 | 8 MB |
// NuttX Memory Regions for Arm64 MMU (Simplified)
struct arm_mmu_region g_mmu_regions[] = {
// Memory Region for I/O Memory
MMU_REGION_FLAT_ENTRY(
"DEVICE_REGION", // Name
0x0000_0000, // Start Address
0x4000_0000, // Size: 1 GB
MT_DEVICE_NGNRNE | MT_RW), // Read-Write I/O Memory
// Memory Region for RAM
MMU_REGION_FLAT_ENTRY(
"DRAM0_S0", // Name
0x4000_0000, // Start Address
0x0080_0000, // Size: 8 MB
MT_NORMAL | MT_RW | MT_EXECUTE), // Allow Read, Write and Execute
}; // Other Memory Regions? We removed them all
According to NuttX QEMU Log: NuttX creates a Two-Level Page Table…
(PXN / UXN = Privileged / User Never-Execute)
Why Two Levels? Because we’re mapping 8 MB of RAM, instead of a Complete 1 GB Chunk. Thus we break up into Level 2 with Smaller 2 MB Chunks…
Looks legit, we move on…
What about the Translation Control Register?
We check the NuttX QEMU Log, with MMU Logging Enabled…
get_tcr: Virtual Address Bits: 36
get_tcr: Bit 32-33: TCR_EL1_IPS=1
get_tcr: Bit 23: TCR_EPD1_DISABLE=1
get_tcr: Bit 00-05: TCR_T0SZ=0x1c
get_tcr: Bit 08-09: TCR_IRGN_WBWA=1
get_tcr: Bit 10-11: TCR_ORGN_WBWA=1
get_tcr: Bit 12-13: TCR_SHARED_INNER=3
get_tcr: Bit 14-15: TCR_TG0_4K=0
get_tcr: Bit 30-31: TCR_TG1_4K=2
get_tcr: Bit 37-38: TCR_TBI_FLAGS=0
enable_mmu_el1: tcr_el1 = 0x1_8080_351C
enable_mmu_el1: mair_el1 = 0xFF_440C_0400
enable_mmu_el1: ttbr0_el1 = 0x402B_2000
According to TCR_EL1 Doc, 0x1_8080_351C decodes as…
Bits 00-05: T0SZ = 0x1C
36 bits of Virtual Address Space
Bits 08-09: IRGN0_WBWA = 1
Normal memory, Inner Write-Back Read-Allocate Write-Allocate Cacheable
Bits 10-11: ORGN0_WBWA = 1
Normal memory, Outer Write-Back Read-Allocate Write-Allocate Cacheable
Bits 12-13: SH0_SHARED_INNER = 3
Inner Shareable for TTBR0_EL1
Bits 14-15: TG0_4K = 0
EL1 Granule Size (Page Size) is 4 KB for TTBR0_EL1
Bits 23-23: EPD1_DISABLE = 1
Perform translation table walks using TTBR1_EL1
Bits 30-31: TG1_4K = 2
EL1 Granule Size (Page Size) is 4 KB for TTBR1_EL1
Bits 32-34: EL1_IPS = 1
36 bits (64 GB) of Physical Address Space
Hmmm something looks different…
(We spoke about Innies and Outies earlier)
(Decoding the Bits with JavaScript)
MMU Demo works OK, but NuttX doesn’t. How are they different?
Based on the info above, we compare NuttX vs MMU Demo for the Translation Control Register…
NuttX QEMU | MMU Demo |
---|---|
T0SZ = 0x1C 36 bits of Virtual Address Space | T0SZ = 0x20 32 bits of Virtual Address Space |
IRGN0_WBWA = 1 Write-Allocate Cacheable (Inner) | IRGN0_WBNWA = 3 No Write-Allocate Cacheable (Inner) |
ORGN0_WBWA = 1 Write-Allocate Cacheable (Outer) | ORGN0_WBNWA = 3 No Write-Allocate Cacheable (Outer) |
Won’t Boot On Unicorn | Works OK On Unicorn |
Ah we see a major discrepancy…
Virtual Address: NuttX uses 36 Bits, MMU Demo uses 32 Bits
Inner / Outer Caching? Probably won’t matter for our Unicorn Emulator
Though truthfully: We already made plenty of fixes
We fix the Virtual Addresses…
Remember NuttX was using 36 Bits for Virtual Address Space? We cut down to 32 Bits: knsh/defconfig
## Set the Virtual Address Space to 32 bits
CONFIG_ARM64_VA_BITS=32
## Previously: Virtual Address Space was 36 bits
## CONFIG_ARM64_VA_BITS=36
Inside Translation Control Register (TCR_EL1): T0SZ becomes 32 bits…
get_tcr: Virtual Address Bits: 32
get_tcr: Bit 32-33: TCR_EL1_IPS=1
get_tcr: Bit 23: TCR_EPD1_DISABLE=1
get_tcr: Bit 00-05: TCR_T0SZ=0x20
get_tcr: Bit 08-09: TCR_IRGN_WBWA=1
get_tcr: Bit 10-11: TCR_ORGN_WBWA=1
get_tcr: Bit 12-13: TCR_SHARED_INNER=3
get_tcr: Bit 14-15: TCR_TG0_4K=0
get_tcr: Bit 30-31: TCR_TG1_4K=2
get_tcr: Bit 37-38: TCR_TBI_FLAGS=0
enable_mmu_el1: tcr_el1 = 0x1_8080_3520
enable_mmu_el1: mair_el1 = 0xFF_440C_0400
enable_mmu_el1: ttbr0_el1 = 0x402B_2000
NuttX now enables MMU successfully in Unicorn yay! (Pic above)
hook_block: address=0x402805a4, size=08, setup_page_tables, arch/arm64/src/common/arm64_mmu.c:547:29
call_graph: enable_mmu_el1 --> setup_page_tables
call_graph: click enable_mmu_el1 href "https://github.com/apache/nuttx/blob/master/arch/arm64/src/common/arm64_mmu.c#L616" "arch/arm64/src/common/arm64_mmu.c " _blank
hook_block: address=0x40280614, size=16, enable_mmu_el1, arch/arm64/src/common/arm64_mmu.c:608:3
call_graph: setup_page_tables --> enable_mmu_el1
call_graph: click setup_page_tables href "https://github.com/apache/nuttx/blob/master/arch/arm64/src/common/arm64_mmu.c#L546" "arch/arm64/src/common/arm64_mmu.c " _blank
hook_block: address=0x4028062c, size=04, enable_mmu_el1, arch/arm64/src/common/arm64_mmu.c:617:3
hook_block: address=0x40280380, size=88, arm64_boot_el1_init, arch/arm64/src/common/arm64_boot.c:215:1
call_graph: enable_mmu_el1 --> arm64_boot_el1_init
Reducing Virtual Addresses from 36 Bits to 32 Bits: Why did it work?
Needs More Investigation: Maybe NuttX didn’t populate the Page Tables completely for 36 Bits? (Something about 0x5027_FFFF?)
For Now: 32-bit Virtual Addresses are totally sufficient. And NuttX boots OK on Unicorn!
Why are we doing all this: NuttX on Unicorn?
We’re about to create a NuttX Emulator for Avaota-A1 Arm64 SBC (Allwinner A527), based on Unicorn Emulator. So that we can Build and Test NuttX on the Avaota-A1 Emulator, without requiring the Actual Hardware. (NuttX Boot Flow for Avaota-A1)
After switching to 32-bit Virtual Address: Any change to the Page Tables?
The Page Tables are identical. Thanks to Unicorn, we learnt so much about arm64_mmu.c! One more fun thing to do…
Inside the Unicorn Log: Why the funny arrows?
call_graph: enable_mmu_el1 --> setup_page_tables
call_graph: click enable_mmu_el1 href "https://github.com/apache/nuttx/blob/master/arch/arm64/src/common/arm64_mmu.c#L616" "arch/arm64/src/common/arm64_mmu.c " _blank
call_graph: setup_page_tables --> enable_mmu_el1
call_graph: click setup_page_tables href "https://github.com/apache/nuttx/blob/master/arch/arm64/src/common/arm64_mmu.c#L546" "arch/arm64/src/common/arm64_mmu.c " _blank
call_graph: enable_mmu_el1 --> arm64_boot_el1_init
That’s because our Unicorn Emulator renders the NuttX Boot Flow (pic above) as a Clickable Mermaid Flowchart. It describes how NuttX boots on Arm64…
Here are the steps to produce the Mermaid Flowchart…
## Boot NuttX in Unicorn Emulator. Capture the Mermaid Flowchart.
git clone https://github.com/lupyuen/nuttx-arm64-emulator --branch qemu \
$HOME/nuttx-arm64-emulator
cd $HOME/nuttx-arm64-emulator
cargo run | grep call_graph | colrm 1 13 \
>$HOME/nuttx-arm64-emulator/nuttx-boot-flow.mmd
## Omitted: Clean up the bad syntax in nuttx-boot-flow.mmd
vi $HOME/nuttx-arm64-emulator/nuttx-boot-flow.mmd
## Convert the Mermaid Flowchart to PDF
sudo docker pull minlag/mermaid-cli
sudo docker run \
--rm -u `id -u`:`id -g` -v \
$HOME/nuttx-arm64-emulator:/data minlag/mermaid-cli \
--configFile="mermaidRenderConfig.json" \
-i nuttx-boot-flow.mmd \
-o nuttx-boot-flow.pdf
## Then change ".pdf" above to ".png" or ".svg"
How did we create the Mermaid Flowchart? Check the details here…
Why won’t Unicorn boot to NSH Shell?
We haven’t emulated the PL011 UART Hardware, that’s why Unicorn is looping forever while printing System Messages. Hope to fix it someday! (Pic above)
That should keep us busy for a loooong while?
One Last Thing: Suppose we’re in some Wacky Alternate Universe in which Rust was invented before C. What would arm64_mmu.c look like? Might be fun to take a peek at the Alternate Version of arm64_mmu.c 🤔
Unicorn Emulator for Avaota-A1 SBC
Special Thanks to My Sponsors for supporting my writing. Your support means so much to me 🙏
Got a question, comment or suggestion? Create an Issue or submit a Pull Request here…
In this article we took NuttX for QEMU Arm64 (Kernel Build) and made it smaller and simpler.
Why did we Simplify NuttX? So we can be as close to MMU Demo as possible, and isolate the crashing problem. This is how we Build and Test our simpler version of NuttX for QEMU Arm64 (Kernel Build)…
## Before Fixing: Compile Simplified NuttX for QEMU Arm64 (Kernel Build)
git clone https://github.com/lupyuen2/wip-nuttx nuttx \
--branch unicorn-qemu-before
git clone https://github.com/lupyuen2/wip-nuttx-apps apps \
--branch unicorn-qemu
cd nuttx
tools/configure.sh qemu-armv8a:knsh
make -j
## Dump the disassembly to nuttx.S
aarch64-none-elf-objdump \
--syms --source --reloc --demangle --line-numbers --wide --debugging \
nuttx \
>nuttx.S \
2>&1
## NuttX boots OK on QEMU.
## NSH Shell won't appear yet because we haven't compiled the NuttX Apps.
qemu-system-aarch64 \
-semihosting \
-cpu cortex-a53 \
-nographic \
-machine virt,virtualization=on,gic-version=3 \
-net none \
-chardev stdio,id=con,mux=on \
-serial chardev:con \
-mon chardev=con,mode=readline \
-kernel ./nuttx
## But NuttX crashes in Unicorn Emulator.
## Remember to Disable MMU Logging.
git clone https://github.com/lupyuen/nuttx-arm64-emulator --branch qemu \
$HOME/nuttx-arm64-emulator
cp nuttx nuttx.bin nuttx.S \
$HOME/nuttx-arm64-emulator/nuttx/
cd $HOME/nuttx-arm64-emulator
cargo run
## err=Err(EXCEPTION)
## PC=0x402805f0
## call_graph: setup_page_tables --> ***_HALT_***
## call_graph: click setup_page_tables href "https://github.com/apache/nuttx/blob/master/arch/arm64/src/common/arm64_mmu.c#L546" "arch/arm64/src/common/arm64_mmu.c " _blank
## env.exception={syndrome:2248146949, fsr:517, vaddress:1344798719, target_el:1}
To fix the crashing bug, we reduced the Virtual Address Size…
The Fixed Version (that won’t crash in Unicorn) is here…
## After Fixing: Simplified NuttX for QEMU Arm64 (Kernel Build)
git clone https://github.com/lupyuen2/wip-nuttx nuttx \
--branch unicorn-qemu-after
git clone https://github.com/lupyuen2/wip-nuttx-apps apps \
--branch unicorn-qemu
For QEMU Testing: Enable MMU Logging by uncommenting the lines below.
For Unicorn Emulator: Don’t enable MMU Logging, because the PL011 UART Driver will get stuck. Comment out the lines below.
From arch/arm64/src/common/arm64_mmu.c:
// Enable MMU Logging
#define CONFIG_MMU_ASSERT 1
#define CONFIG_MMU_DEBUG 1
#define CONFIG_MMU_DUMP_PTE 1
#define trace_printf _info
#undef sinfo
#define sinfo _info
Here’s the Complete List of Changes for our Simplified NuttX. Below are the highlights…
Remove the MMU Regions: PCI*, nx*
(Simplify the Memory Map)
(Simplify the Page Tables)
Enable the Data Cache and Instruction Cache
(Sync with MMU Demo)
(Missing from NuttX. Should this be fixed?)
Change Physical Address from 48 to 36 bits
(Sync with MMU Demo)
Reduce MMU Translation Tables from 10 to 1
(Simplify the Page Tables)
(Unicorn won’t boot with Device Tree)
(Unicorn won’t boot with PSCI)
(Lotsa logs in arch/arm64/src/common/arm64_mmu.c)
The changes above: Could they contribute to NuttX booting successfully on Unicorn? It’s possible, we might have missed something.
Update: Unicorn definitely needs TCR_TG1_4K, otherwise MMU will fail. We verified with Avaota-A1 Emulator on Unicorn. Which means we should patch NuttX too?
Here’s a nifty tricky to Decode The Bits for our Arm64 MMU Registers…
In our Web Browser, launch the JavaScript Console…
Click Menu > More Tools > Developer Tools
To decode 0x1_8080_3F20 for MMU Demo, we enter this…
a=0x180803F20n
for (i = 0n; i < 63n; i++) { if (a & (1n << i)) { console.log(`Bit ${i}`); } }
We’ll see the Decoded Bits…
Bit 5
Bit 8
Bit 9
Bit 10
Bit 11
Bit 12
Bit 13
Bit 23
Bit 31
Bit 32
To decode 0x1_8080_351C for NuttX QEMU, we enter this…
a=0x18080351Cn
for (i = 0n; i < 63n; i++) { if (a & (1n << i)) { console.log(`Bit ${i}`); } }
And we’ll see the Decoded Bits…
Bit 2
Bit 3
Bit 4
Bit 8
Bit 10
Bit 12
Bit 13
Bit 23
Bit 31
Bit 32
Why the “n”?
The “n
” suffix will enable BigInt Support in JavaScript. Without this, our Decoded Bits will overflow.