Fixed the UART Interrupt and Platform-Level Interrupt Controller (Ox64 BL808)

đź“ť 10 Dec 2023

UART Input and Platform-Level Interrupt Controller are finally OK on Apache NuttX RTOS and Ox64 BL808 RISC-V SBC!

Last week we walked through the Serial Console for Pine64 Ox64 BL808 64-bit RISC-V Single-Board Computer (pic below)…

And we hit some illogical impossible problems on Apache NuttX RTOS (Real-Time Operating System)…

Today we discover the One Single Culprit behind all this rowdy mischief…

Weak Ordering in the MMU! (Memory Management Unit)

Here’s how we solved the baffling mystery…

(Watch the Demo on YouTube)

Pine64 Ox64 64-bit RISC-V SBC (Sorry for my substandard soldering)

§1 UART Interrupt

Sorry TLDR: What’s this PLIC? What’s Serial Console gotta do with it?

Platform-Level Interrupt Controller (PLIC) is the hardware inside our SBC that controls the forwarding of Peripheral Interrupts to our 64-bit RISC-V CPU.

(Like Interrupts for UART, I2C, SPI, …)

BL808 Platform-Level Interrupt Controller

Why should we bother with PLIC?

Suppose we’re typing something in the Serial Console on Ox64 SBC…

Without the PLIC, it’s impossible to enter commands in the Serial Console!

Tell me more…

Let’s run through the steps to handle a UART Interrupt on a RISC-V SBC…

Platform-Level Interrupt Controller for Pine64 Ox64 64-bit RISC-V SBC (Bouffalo Lab BL808)

  1. At Startup: We set Interrupt Priority to 1.

    (Lowest Priority)

  2. And Interrupt Threshold to 0.

    (Allow all Interrupts to fire later)

  3. We flip Bit 20 of Interrupt Enable Register to 1.

    (To enable RISC-V IRQ 20 for UART3)

  4. Suppose we press a key on the Serial Console…

    Our UART Controller will fire an Interrupt for IRQ 20.

    (IRQ means Interrupt Request Number)

  5. Our Interrupt Handler will read the Interrupt Number (20) from the Interrupt Claim Register…

    Call the UART Driver to read the keypress…

    Then write the Interrupt Number (20) back into the same old Interrupt Claim Register…

    Which will Complete the Interrupt.

  6. Non-Essential But Useful: Interrupt Pending Register says which Interrupts are awaiting Claiming and Completion.

    (We’ll use it for troubleshooting)

That’s the Textbook Recipe for PLIC, according to the Official RISC-V PLIC Spec. (If Julia Child wrote a PLIC Textbook)

But it doesn’t work on Ox64 BL808 SBC and T-Head C906 Core…

UART and PLIC Troubles on Ox64

§2 UART and PLIC Troubles

What happens when we run the PLIC Recipe on Ox64?

Absolute Disaster! (Pic above)

Our troubles are all Seemingly Unrelated. However there’s actually only One Sinister Culprit causing all these headaches…

BL808 UART Receive Status (Page 405)

BL808 UART Receive Status (Page 405)

§3 Leaky Reads in UART

How to track down the culprit?

We begin with the simplest bug: UART Input is always Empty.

In our UART Driver, this is how we read the UART Input: bl808_serial.c

// Receive one character from the UART Port.
// Called (indirectly) by the UART Interrupt Handler: __uart_interrupt
int bl808_receive(...) {
  ...
  // If there's Pending UART Input...
  // (FIFO_CONFIG_1 is 0x30002084)
  if (getreg32(BL808_UART_FIFO_CONFIG_1(uart_idx)) & UART_FIFO_CONFIG_1_RX_CNT_MASK) {

    // Then read the Actual UART Input
    // (FIFO_RDATA is 0x3000208c)
    rxdata = getreg32(BL808_UART_FIFO_RDATA(uart_idx)) & UART_FIFO_RDATA_MASK;

Which says that we…

Or simply…

// Check for Pending UART Input
uintptr_t pending = getreg32(0x30002084);

// Read the Actual UART Input
uintptr_t rx = getreg32(0x3000208c);

// Dump the values
_info("pending=%p, rx=%p\n", pending, rx);

What happens when we run this?

Something strange happens…

// Yep there's Pending UART Input...
pending=0x7070120

// But Actual UART Input is empty!
rx=0

UART Controller says there’s UART Input to be read… And it’s totally empty!

How is that possible?

The only logical explanation: Someone has already read the UART Input!

UART Input gets Auto-Reset to 0, right after it’s read. Someone must have read it, unintentionally.

Hmmm this sounds like a Leaky Read…

Exactly! (Pic below)

Yep indeed we have Leaky Read + Leaky Write that are causing all our UART + PLIC woes.

Things are looking mighty illogical and incoherent. Why oh why?

Leaky Reads in UART

§4 T-Head Errata

But Linux runs OK on Ox64 BL808…

Something special about Linux on T-Head C906?

We search for “T-Head” in the Linux Kernel Repo. And we see this vital clue: errata_list.h

// T-Head Errata for Linux
#ifdef CONFIG_ERRATA_THEAD_PBMT
  // IO/NOCACHE memory types are handled together with svpbmt,
  // so on T-Head chips, check if no other memory type is set,
  // and set the non-0 PMA type if applicable.
  ...
  asm volatile(... _PAGE_MTMASK_THEAD ...)

(Svpbmt Extension defines Page-Based Memory Types)

Aha! A Linux Errata for T-Head CPU!

We track down PAGE_MTMASK_THEAD: pgtable-64.h

// T-Head Memory Type Definitions in Linux
#define _PAGE_PMA_THEAD     ((1UL << 62) | (1UL << 61) | (1UL << 60))
#define _PAGE_NOCACHE_THEAD ((1UL < 61) | (1UL << 60))
#define _PAGE_IO_THEAD      ((1UL << 63) | (1UL << 60))
#define _PAGE_MTMASK_THEAD  (_PAGE_PMA_THEAD | _PAGE_IO_THEAD | (1UL << 59))

(Spot the Typo!)

Which is annotated with…

[63:59] T-Head Memory Type definitions:
Bit[63] SO  - Strong Order
Bit[62] C   - Cacheable
Bit[61] B   - Bufferable
Bit[60] SH  - Shareable
Bit[59] Sec - Trustable

00110 - NC:  Weakly-Ordered, Non-Cacheable, Bufferable, Shareable, Non-Trustable
01110 - PMA: Weakly-Ordered, Cacheable, Bufferable, Shareable, Non-Trustable
10010 - IO:  Strongly-Ordered, Non-Cacheable, Non-Bufferable, Shareable, Non-Trustable

(Source)

Something sus about I/O Memory?

The last line suggests we should configure the T-Head Memory Type specifically to support I/O Memory: PAGE_IO_THEAD

Memory AttributePage Table Entry
Strongly-OrderedBit 63 is 1
Non-CacheableBit 62 is 0 (Default)
Non-BufferableBit 61 is 0 (Default)
ShareableBit 60 is 1
Non-TrustableBit 59 is 0 (Default)

With the above evidence, we deduce that “Strong Order” is the Magical Bit that we need for UART and PLIC!

What’s “Strong Order”?

“Strong Order” means “All Reads and All Writes are In-Order”.

Apparently T-Head C906 will (by default) Disable Strong Order and read / write memory Out-of-Sequence. (So that it performs better)

Which will surely mess up our UART and PLIC Registers!

They should’ve warned us about Strong Order and I/O Memory!

Ahem they did…

“A Device Driver written to rely on I/O Strong Ordering rules will not operate correctly if the Address Range is mapped with PBMT=NC [Weakly Ordered]”

“As such, this configuration is discouraged”

Though that warning comes from the New Svpbmt Extension. Which isn’t supported by T-Head C906.

(Svpbmt Bits 61~62 will conflict with T-Head Bits 59~63. Oh boy)

How to enable Strong Order?

We do it in the T-Head C906 MMU…

(Strong Order appears briefly in C906 User Manual, Pages 24 & 53)

(What’s “Shareable”? It’s not documented)

UPDATE: Shareable might support Strong Ordering across Multiple Cores

Level 1 Page Table for Ox64 MMU

Level 1 Page Table for Ox64 MMU

§5 Memory Management Unit

Wow the soup gets too salty. What’s MMU?

Memory Management Unit (MMU) is the hardware inside our SBC that does…

For Ox64: We switched on the MMU to protect the Kernel Memory from the Apps. And to protect the Apps from each other.

How does it work?

The pic above shows the Level 1 Page Table that we configured for our MMU. The Page Table has a Page Table Entry that says…

What about PAGE_IO_THEAD and Strong Order?

Memory AttributePage Table Entry
SO: Strongly-OrderedBit 63 is 1
SH: ShareableBit 60 is 1

We’ll set the SO and SH Bits in our Page Table Entries. Hopefully UART and PLIC won’t get mushed up no more…

Enable Strong Order in Ox64 MMU

§6 Enable Strong Order

We need to set the Strong Order Bit…

How will we enable it in our Page Table Entry?

Memory AttributePage Table Entry
SO: Strongly-OrderedBit 63 is 1
SH: ShareableBit 60 is 1

For testing, we patched our MMU Code to set the Strong Order Bit in our Page Table Entries (pic above): riscv_mmu.c

// Set a Page Table Entry in a Page Table for the MMU
void mmu_ln_setentry(
  uint32_t ptlevel,   // Level of Page Table: 1, 2 or 3 
  uintptr_t lntable,  // Page Table Address
  uintptr_t paddr,    // Physical Address
  uintptr_t vaddr,    // Virtual Address (For Kernel: Same as Physical Address)
  uint32_t mmuflags   // MMU Flags (V / G / R / W)
) {
  ...
  // Set the Page Table Entry:
  // Physical Page Number and MMU Flags (V / G / R / W)
  lntable[index] = (paddr | mmuflags);

  // Now we set the T-Head Memory Type in Bits 59 to 63.
  // For I/O and PLIC Memory, we set...
  // SO (Bit 63): Strong Order
  // SH (Bit 60): Shareable
  #define _PAGE_IO_THEAD ((1UL << 63) | (1UL << 60))

  // If this is a Leaf Page Table Entry
  // for I/O Memory or PLIC Memory...
  if ((mmuflags & PTE_R) &&    // Leaf Page Table Entry
    (vaddr < 0x40000000UL ||   // I/O Memory
    vaddr >= 0xe0000000UL)) {  // PLIC Memory

    // Then set the Strong Order
    // and Shareable Bits
    lntable[index] = lntable[index]
      | _PAGE_IO_THEAD;
  }

(Moved here)

(And here)

The code above will set the Strong Order and Shareable Bits for…

map I/O regions
  vaddr=0, lntable[index]=0x90000000000000e7
  // "0x9000..." means Strong Order (Bit 63) and Shareable (Bit 60) are set

map PLIC as Interrupt L2
  vaddr=0xe0000000, lntable[index]=0x90000000380000e7
  vaddr=0xe0200000, lntable[index]=0x90000000380800e7
  vaddr=0xe0400000, lntable[index]=0x90000000381000e7
  vaddr=0xe0600000, lntable[index]=0x90000000381800e7
  ...
  vaddr=0xefc00000, lntable[index]=0x900000003bf000e7
  vaddr=0xefe00000, lntable[index]=0x900000003bf800e7
  // "0x9000..." means Strong Order (Bit 63) and Shareable (Bit 60) are set

If we don’t specify MMU Caching for T-Head C906… Is MMU Caching enabled by default?

Nope, we need to explicitly enable MMU Caching ourselves! Otherwise Memory Accesses (Kernel and Apps) will become really slooooow…

We test our patched code…

NOTE: T-Head MMU Flags (Strong Order / Shareable) are available only if OpenSBI has set the MAEE Bit in the MXSTATUS Register to 1. Otherwise the MMU will crash when we set the flags!

UPDATE: NuttX Mainline now supports T-Head C906 Memory Types

(See the Complete Log)

(Shareable Bit doesn’t effect anything. We’re keeping it to be consistent with Linux)

UART Input and Platform-Level Interrupt Controller are finally OK on Apache NuttX RTOS and Ox64 BL808 RISC-V SBC!

§7 It Works!

What happens when we run our patched MMU code?

Our UART and PLIC Troubles are finally over!

Is NuttX usable on Ox64?

Yep! NuttX RTOS on Ox64 now boots OK to the NuttX Shell (NSH).

And happily accepts commands through the Serial Console yay! (Pic above)

NuttShell (NSH) NuttX-12.0.3
nsh> uname -a
NuttX 12.0.3 fd05b07 Nov 24 2023 07:42:54 risc-v star64

nsh> ls /dev
/dev:
 console
 null
 ram0
 zero

nsh> hello
Hello, World!!

(Watch the Demo on YouTube)

(See the Complete Log)

We are hunky dory with Ox64 BL808 and T-Head C906 đź‘Ť

§8 Lessons Learnt

Phew that was some quick intense debugging…

Yeah we’re really fortunate to get NuttX RTOS running OK on Ox64. Couple of things that might have helped…

  1. Write up Everything about our troubles

    (And share them publicly)

  2. Read the Comments

    (They might inspire the solution!)

  3. Re-Read and Re-Think everything we wrote

    (Challenge all our Assumptions)

  4. Head to the Beach. Have a Picnic.

    (Never know when the solution might pop up!)

  5. Sounds like an Agatha Christie Mystery…

    But sometimes it’s indeed One Single Culprit (Weak Ordering) behind all the Seemingly Unrelated Problems!

Will NuttX officially support Ox64?

We plan to…

And Apache NuttX RTOS shall officially support Ox64 BL808 SBC real soon!

UPDATE: NuttX officially supports Ox64 BL808 SBC!

Are we hunky dory with Ox64 BL808 and T-Head C906?

We said this last time…

“If RISC-V ain’t RISC-V on SiFive vs T-Head: We’ll find out!”

As of Today: Yep RISC-V is indeed RISC-V on SiFive vs T-Head… Just beware of C906 MMU, C906 PLIC and T-Head Errata!

(New T-Head Cores will probably migrate to Svpbmt Extension)

Quick dip in the sea + Picnic on the beach … Really helps with NuttX + Ox64 troubleshooting! 👍

§9 What’s Next

Thank you so much for reading my adventures of NuttX on Ox64… You’re my inspiration for solving this sticky mystery! 🙏

Apache NuttX RTOS for Ox64 BL808 shall be Upstreamed to Mainline real soon. Stay tuned for updates!

UPDATE: NuttX officially supports Ox64 BL808 SBC!

Many Thanks to my GitHub Sponsors (and the awesome NuttX Community) for supporting my work! This article wouldn’t have been possible without your support.

Got a question, comment or suggestion? Create an Issue or submit a Pull Request here…

lupyuen.github.io/src/plic3.md

§10 Appendix: MMU Caching for T-Head C906

If we don’t specify MMU Caching for T-Head C906… Is MMU Caching enabled by default?

Nope, we need to explicitly enable MMU Caching ourselves! Otherwise Memory Accesses (Kernel and Apps) will become really slooooow.

According to Linux Kernel, this is how we define the Cache Flags for T-Head C906: bl808_mm_init.c

// T-Head C906 MMU Extensions
#define MMU_THEAD_SHAREABLE  (1ul << 60)
#define MMU_THEAD_BUFFERABLE (1ul << 61)
#define MMU_THEAD_CACHEABLE  (1ul << 62)

// T-Head C906 MMU requires Kernel Memory
// to be explicitly cached with these flags
#define MMU_THEAD_PMA_FLAGS \
  (MMU_THEAD_SHAREABLE | \
   MMU_THEAD_BUFFERABLE | \
   MMU_THEAD_CACHEABLE)

Then we cache the Kernel Text, Data and Heap, by passing MMU_THEAD_PMA_FLAGS: bl808_mm_init.c

// Cache the Kernel Text, Data and Page Pool
map_region(KFLASH_START, KFLASH_START, KFLASH_SIZE,
  MMU_KTEXT_FLAGS | MMU_THEAD_PMA_FLAGS);

map_region(KSRAM_START, KSRAM_START, KSRAM_SIZE,
  MMU_KDATA_FLAGS | MMU_THEAD_PMA_FLAGS);

mmu_ln_map_region(2, PGT_L2_VBASE, PGPOOL_START, PGPOOL_START, PGPOOL_SIZE,
  MMU_KDATA_FLAGS | MMU_THEAD_PMA_FLAGS);

(See the Pull Request for Ox64 and SG2000)

What about User Text and Data? For NuttX Apps?

Yep they need to be explicitly cached too!

This is how we cache the User Text and Data, by setting the Extra MMU Flags: arch/risc-v/src/common/riscv_mmu.h

// T-Head MMU needs Text and Data to be Shareable, Bufferable, Cacheable
#ifdef CONFIG_ARCH_MMU_EXT_THEAD
#  define PTE_SEC         (1UL << 59) /* Security */
#  define PTE_SHARE       (1UL << 60) /* Shareable */
#  define PTE_BUF         (1UL << 61) /* Bufferable */
#  define PTE_CACHE       (1UL << 62) /* Cacheable */
#  define PTE_SO          (1UL << 63) /* Strong Order */

#  define EXT_UTEXT_FLAGS (PTE_SHARE | PTE_BUF | PTE_CACHE)
#  define EXT_UDATA_FLAGS (PTE_SHARE | PTE_BUF | PTE_CACHE)
#else
#  define EXT_UTEXT_FLAGS (0)
#  define EXT_UDATA_FLAGS (0)
#endif

// Flags for user FLASH (RX) and user RAM (RW)
#define MMU_UTEXT_FLAGS (PTE_R | PTE_X | PTE_U | EXT_UTEXT_FLAGS)
#define MMU_UDATA_FLAGS (PTE_R | PTE_W | PTE_U | EXT_UDATA_FLAGS)

(MMU_UTEXT_FLAGS and MMU_UDATA_FLAGS are used by up_addrenv_create to configure the User Text, Data and Heap)

Then we enable ARCH_MMU_EXT_THEAD for SG2000 and BL808: arch/risc-v/Kconfig

config ARCH_CHIP_SG2000
	select ARCH_MMU_TYPE_SV39
	select ARCH_MMU_EXT_THEAD
	...
config ARCH_CHIP_BL808
	select ARCH_MMU_TYPE_SV39
	select ARCH_MMU_EXT_THEAD

(See the Pull Request for SG2000)

(See the Pull Request for BL808)

Does MMU Caching affect NuttX Performance?

Really it does!

Will we have issues with MMU Flags: T-Head vs Svpbmt?

Well eventually we need to handle (non-standard) T-Head MMU Flags and (standard) Svpbmt MMU Flags. According to Linux Kernel…

T-Head and Svpbmt disagree on the MMU Bits. (And we may have more MMU Bits in future)

Thankfully Svpbmt already caches by default (because PMA=0). So we can ignore Svpbmt for now.

UART Input and Platform-Level Interrupt Controller are finally OK on Apache NuttX RTOS and Ox64 BL808 RISC-V SBC!

§11 Appendix: Build and Run NuttX

In this article, we ran a Work-In-Progress Version of Apache NuttX RTOS for Ox64, with PLIC and Console Input working OK.

This is how we download and build NuttX for Ox64 BL808 SBC…

## Download the WIP NuttX Source Code
git clone \
  --branch ox64c \
  https://github.com/lupyuen2/wip-nuttx \
  nuttx
git clone \
  --branch ox64c \
  https://github.com/lupyuen2/wip-nuttx-apps \
  apps

## Build NuttX
cd nuttx
tools/configure.sh star64:nsh
make

## Export the NuttX Kernel
## to `nuttx.bin`
riscv64-unknown-elf-objcopy \
  -O binary \
  nuttx \
  nuttx.bin

## Dump the disassembly to nuttx.S
riscv64-unknown-elf-objdump \
  --syms --source --reloc --demangle --line-numbers --wide \
  --debugging \
  nuttx \
  >nuttx.S \
  2>&1

(Remember to install the Build Prerequisites and Toolchain)

Then we build the Initial RAM Disk that contains NuttX Shell and NuttX Apps…

## Build the Apps Filesystem
make -j 8 export
pushd ../apps
./tools/mkimport.sh -z -x ../nuttx/nuttx-export-*.tar.gz
make -j 8 import
popd

## Generate the Initial RAM Disk `initrd`
## in ROMFS Filesystem Format
## from the Apps Filesystem `../apps/bin`
## and label it `NuttXBootVol`
genromfs \
  -f initrd \
  -d ../apps/bin \
  -V "NuttXBootVol"

## Prepare a Padding with 64 KB of zeroes
head -c 65536 /dev/zero >/tmp/nuttx.pad

## Append Padding and Initial RAM Disk to NuttX Kernel
cat nuttx.bin /tmp/nuttx.pad initrd \
  >Image

(See the Build Script)

(See the Build Outputs)

(Why the 64 KB Padding)

Next we prepare a Linux microSD for Ox64 as described in the previous article.

(Remember to flash OpenSBI and U-Boot Bootloader)

Then we do the Linux-To-NuttX Switcheroo: Overwrite the microSD Linux Image by the NuttX Kernel…

## Overwrite the Linux Image
## on Ox64 microSD
cp Image \
  "/Volumes/NO NAME/Image"
diskutil unmountDisk /dev/disk2

Insert the microSD into Ox64 and power up Ox64.

Ox64 boots OpenSBI, which starts U-Boot Bootloader, which starts NuttX Kernel and the NuttX Shell (NSH).

NuttX Commands will run OK in NuttX Shell. (Pic above)

(See the NuttX Log)

(Watch the Demo on YouTube)

(See the Build Outputs)

Quick dip in the sea + Picnic on the beach … Really helps with NuttX + Ox64 troubleshooting! 👍

Quick dip in the sea + Picnic on the beach… Really helps with NuttX + Ox64 troubleshooting! 👍