📝 22 Dec 2024
Every Day: Our Dashboard for Apache NuttX RTOS will flag this bothersome bug, since a month ago…
Which says that NuttX on QEMU RISC-V Emulator (32-bit) has failed our Continuous Integration Test. Again and again (like Kafka on Groundhog Day)
Configuration/Tool: rv-virt/citest
test_cmocka PASSED
test_hello PASSED
test_helloxx FAILED
test_pipe FAILED
test_usrsocktest FAILED
[...Failing all the way...]
The Bug Stops Here! In this article: We study the internals of the NuttX CI Test (for Continuous Integration) as we stomp the bug…
We run the CI Test on Docker
Snoop and Study the CI Test Internals
Locate and Dump the CI Log File
That was generated by Pytest and Pexpect
Then we isolate the NuttX Crash with QEMU RISC-V
Maybe CI Troubleshooting could be simpler?
Everything above works on macOS Arm64 too!
We begin with Docker…
CI Test runs a bunch of Test Apps and fails. Why?
Thanks to Docker: We can run CI Test risc-v-05 on our Ubuntu PC. And figure out why it fails rv-virt:citest…
## TODO: Install Docker Engine
## https://docs.docker.com/engine/install/ubuntu/
## TODO: For WSL, we may need to install Docker on Native Windows
## https://github.com/apache/nuttx/issues/14601#issuecomment-2453595402
## Start the NuttX Docker Image
## Name the Docker Container as `nuttx`
sudo docker run \
-it \
--name nuttx \
ghcr.io/apache/nuttx/apache-nuttx-ci-linux:latest \
/bin/bash
## Inside our Docker Container:
## Checkout the NuttX Repo and NuttX Apps
cd
git clone https://github.com/apache/nuttx
git clone https://github.com/apache/nuttx-apps apps
pushd nuttx ; echo NuttX Source: https://github.com/apache/nuttx/tree/$(git rev-parse HEAD) ; popd
pushd apps ; echo NuttX Apps: https://github.com/apache/nuttx-apps/tree/$(git rev-parse HEAD) ; popd
## Run CI Test risc-v-05 inside our Docker Container
cd nuttx/tools/ci
./cibuild.sh \
-c -A -N -R \
testlist/risc-v-05.dat
(More about NuttX Docker Build)
Docker will build rv-virt:citest and start the CI Test…
Configuration/Tool: rv-virt/citest
python3 -m pytest -m 'qemu or rv_virt' ./ -B rv-virt -P /root/nuttx -L /root/nuttx/boards/risc-v/qemu-rv/rv-virt/configs/citest/logs/rv-virt/qemu -R qemu -C --json=/root/nuttx/boards/risc-v/qemu-rv/rv-virt/configs/citest/logs/rv-virt/qemu/pytest.json
test_cmocka PASSED
test_hello PASSED
test_helloxx FAILED
test_pipe FAILED
test_usrsocktest FAILED
[...Failing all the way...]
Which is totally unhelpful. Why is it failing?
To understand what went wrong: We connect to the Docker Container. And snoop the Background Processes…
## Connect to our Running Docker Container
sudo docker exec \
-it \
nuttx \
/bin/bash
## What's running now?
## `more` will show the Entire Command-Line
ps aux | more
A-ha! We see NuttX running on (32-bit) QEMU RISC-V Emulator…
## We started this...
## https://github.com/apache/nuttx/blob/master/tools/ci/cibuild.sh
/root/nuttx/tools/ci/cibuild.sh -c -A -N -R testlist/risc-v-05.dat
## Which calls testbuild.sh...
## https://github.com/apache/nuttx/blob/master/tools/testbuild.sh
/root/nuttx/tools/testbuild.sh -A -N -R -j 24 -e -W
## Which calls citest/run...
## https://github.com/apache/nuttx/blob/master/boards/risc-v/qemu-rv/rv-virt/configs/citest/run
/root/nuttx/boards/risc-v/qemu-rv/rv-virt/configs/citest/run
## Which is sym-linked to cirun.sh...
## https://github.com/apache/nuttx/blob/master/tools/ci/cirun.sh
/root/nuttx/tools/ci/cirun.sh
## Which calls pytest...
## python3 -m pytest -m "${mark}" ./ -B ${BOARD} -P ${path} -L ${logs}/${BOARD}/${core} -R ${target} -C --json=${logs}/${BOARD}/${core}/pytest.json
python3 -m pytest -m 'qemu or rv_virt' ./ -B rv-virt -P /root/nuttx -L /root/nuttx/boards/risc-v/qemu-rv/rv-virt/configs/citest/logs/rv-virt/qemu -R qemu -C --json=/root/nuttx/boards/risc-v/qemu-rv/rv-virt/configs/citest/logs/rv-virt/qemu/pytest.json
## Which (probably) calls testrun...
## https://github.com/apache/nuttx/blob/master/tools/ci/testrun
/root/nuttx/tree/master/tools/ci/testrun
## Which boots NuttX on QEMU RISC-V...
qemu-system-riscv32 -M virt -bios ./nuttx -nographic -drive index=0,id=userdata,if=none,format=raw,file=./fatfs.img -device virtio-blk-device,bus=virtio-mmio-bus.0,drive=userdata
| tee /root/nuttx/boards/risc-v/qemu-rv/rv-virt/configs/citest/logs/rv-virt/qemu/rv-virt_20241211_063532.log
## And tees the output to the above Log File
Let’s inspect the Log File…
From the Background Processes: We see that everything goes into this CI Test Log File…
## Dump the CI Test Log File (e.g. rv-virt_20241211_063532.log)
$ cat /root/nuttx/boards/risc-v/qemu-rv/rv-virt/configs/citest/logs/rv-virt/qemu/*
NuttShell (NSH) NuttX-12.7.0
nsh> cmocka --skip test_case_posix_timer|test_case_oneshot|write_default|read_defaulCmocka Test Start.
Cmocka Test Completed.
nsh> ps
PID GROUP PRI POLICY TYPE NPX STATE EVENT SIGMASK STACK USED FILLED COMMAND
0 0 0 FIFO Kthread - Ready 0000000000000000 0001952 0000844 43.2% Idle_Task
1 0 224 RR Kthread - Waiting Semaphore 0000000000000000 0001904 0000524 27.5% hpwork 0x8014b1f4 0x8014b220
2 0 100 RR Kthread - Waiting Semaphore 0000000000000000 0001896 0000508 26.7% lpwork 0x8014b1b0 0x8014b1dc
riscv_exception: EXCEPTION: Load access fault. MCAUSE: 00000005, EPC: 80008bfe, MTVAL: 01473e00
riscv_exception: PANIC!!! Exception = 00000005
dump_assert_info: Current Version: NuttX 12.7.0 5607eece84 Dec 11 2024 06:34:00 risc-v
dump_assert_info: Assertion failed panic: at file: common/riscv_exception.c:131 task: nsh_main process: nsh_main 0x8000a806
up_dump_register: EPC: 80008bfe
Hmmm this looks super interesting…
CI Test sends NSH Commands to NuttX
CI Test validates the NSH Response
But NuttX “ps
” crashes inside QEMU!
That’s why CI Test hangs, refuses to accept NSH Test Commands, and fails all subsequent tests!
What if we test rv-virt:citest on QEMU?
For simpler troubleshooting: This is how we compile rv-virt:citest inside Docker…
## Start Docker Container for NuttX
sudo docker run \
-it \
ghcr.io/apache/nuttx/apache-nuttx-ci-linux:latest \
/bin/bash
## Inside Docker:
## We compile rv-virt:citest
cd
git clone https://github.com/apache/nuttx
git clone https://github.com/apache/nuttx-apps apps
pushd nuttx ; echo NuttX Source: https://github.com/apache/nuttx/tree/$(git rev-parse HEAD) ; popd
pushd apps ; echo NuttX Apps: https://github.com/apache/nuttx-apps/tree/$(git rev-parse HEAD) ; popd
cd nuttx
tools/configure.sh rv-virt:citest
make -j
Now we boot NuttX on QEMU…
$ qemu-system-riscv32 \
-M virt \
-bios ./nuttx \
-nographic
NuttShell (NSH) NuttX-12.7.0
nsh> uname -a
NuttX 12.7.0 5607eece84 Dec 11 2024 07:05:48 risc-v rv-virt
nsh> ps
PID GROUP PRI POLICY TYPE NPX STATE EVENT SIGMASK STACK USED FILLED COMMAND
0 0 0 FIFO Kthread - Ready 0000000000000000 0001952 0000908 46.5% Idle_Task
1 0 224 RR Kthread - Waiting Semaphore 0000000000000000 0001904 0000508 26.6% hpwork 0x8014b1e4 0x8014b210
2 0 100 RR Kthread - Waiting Semaphore 0000000000000000 0001896 0000508 26.7% lpwork 0x8014b1a0 0x8014b1cc
riscv_exception: EXCEPTION: Load access fault. MCAUSE: 00000005, EPC: 80008bfe, MTVAL: 01473e00
riscv_exception: PANIC!!! Exception = 00000005
dump_assert_info: Current Version: NuttX 12.7.0 5607eece84 Dec 11 2024 07:05:48 risc-v
dump_assert_info: Assertion failed panic: at file: common/riscv_exception.c:131 task: nsh_main process: nsh_main 0x8000a806
up_dump_register: EPC: 80008bfe
Yep we can easily reproduce (and fix) the “ps
” crash using plain old Make and QEMU. No need for CI Test Script!
How will we fix the bug?
We’ll probably Rewind The Build and retest rv-virt:citest on QEMU RISC-V.
UPDATE: Tiago Medicci Serrano has just fixed the “ps
” crash yay!
(But another bug appears. Exactly like Kafka on Groundhog Day sigh)
What’s this Pytest we saw earlier?
Filipe Cavalcanti wrote an excellent article on Pytest in NuttX…
“Testing applications with Pytest and NuttX”
Pytest is a Python Testing Framework. Though it gets messy because NuttX CI isn’t actually testing Python Code with Pytest…
It’s testing External Programs in QEMU. (Because NuttX boots inside QEMU)
How does Pytest control QEMU?
Remember test_helloxx that failed earlier? It calls…
send_command to send an NSH Command, which calls…
Pexpect to talk to QEMU
Why Pexpect?
That’s how our CI Test spawns the QEMU Process and controls it…
“Pexpect is a pure Python module for spawning child applications; controlling them; and responding to expected patterns in their output. Pexpect works like Don Libes’ Expect. Pexpect allows your script to spawn a child application and control it as if a human were typing commands.”
What about Don Libes’ Expect?
Oh yes we use Plain Old Expect for Daily-Testing SG2000 NuttX and Ox64 NuttX. It terminates reliably, and it won’t hang forever, unlike Pexpect.
Why so hard to extract the CI Test Log?
Yeah we should probably add the CI Test Log to GitHub Actions Artifacts for easier downloading. Then NuttX Dashboard can ingest the CI Test Log and render it sensibly.
(Also: Extract the Leftover Files that fail the defconfig check)
Can we do more with Pytest?
Someday we could call Pytest to do Test-Driven Development of NuttX Apps and NuttX Drivers. Which means we can write the Test Cases in Pytest, before writing a Single Line of NuttX Code!
Next Article: We’ll chat about an Experimental Mastodon Server for NuttX Continuous Integration.
Then Later: Let’s talk about Git Bisect and how we auto-magically discover a Breaking Commit in NuttX.
After That: What would NuttX Life be like without GitHub? We try out (self-hosted open-source) Forgejo Git Forge with NuttX.
Many Thanks to the awesome NuttX Admins and NuttX Devs! And my GitHub Sponsors, for sticking with me all these years.
Got a question, comment or suggestion? Create an Issue or submit a Pull Request here…
macOS won’t work with the NuttX Docker Image. How to run the CI Test on macOS?
In the previous article we spoke about Compiling Everything NuttX on macOS Arm64…
Today we apply the same steps to replicate the CI Test on macOS Apple Silicon…
## From https://lupyuen.codeberg.page/articles/ci5
git clone https://github.com/lupyuen/nuttx-build-farm
cd nuttx-build-farm
## Set the GitHub Token: (Should have Gist Permission)
## export GITHUB_TOKEN=...
. $HOME/github-token.sh
brew install neofetch gh glab
## Run One Single NuttX CI Job on macOS
./run-job-macos.sh risc-v-05
(Also works for GitLab Snippets)
We wait for the CI Test to fail. Then we snoop the Background Processes…
$ ps aux | grep qemu
## citest/run calls pytest...
## python3 -m pytest -m "${mark}" ./ -B ${BOARD} -P ${path} -L ${logs}/${BOARD}/${core} -R ${target} -C --json=${logs}/${BOARD}/${core}/pytest.json
Python -m pytest -m 'qemu or rv_virt' ./ -B rv-virt -P /private/tmp/run-job-macos/nuttx -L /private/tmp/run-job-macos/nuttx/boards/risc-v/qemu-rv/rv-virt/configs/citest/logs/rv-virt/qemu -R qemu -C --json=/private/tmp/run-job-macos/nuttx/boards/risc-v/qemu-rv/rv-virt/configs/citest/logs/rv-virt/qemu/pytest.json
## Which boots NuttX on QEMU RISC-V...
qemu-system-riscv32 -M virt -bios ./nuttx -nographic -drive index=0,id=userdata,if=none,format=raw,file=./fatfs.img -device virtio-blk-device,bus=virtio-mmio-bus.0,drive=userdata
## And tees the output to this Log File
tee /private/tmp/run-job-macos/nuttx/boards/risc-v/qemu-rv/rv-virt/configs/citest/logs/rv-virt/qemu/rv-virt_20241210_184405.log
Which says that the CI Test Logs are here (revealing the NuttX ps
crash)
$ cat /tmp/run-job-macos/nuttx/boards/risc-v/qemu-rv/rv-virt/configs/citest/logs/rv-virt/qemu/*
NuttShell (NSH) NuttX-12.7.0
nsh> cmocka --list
Cmocka Test Start ...
nsh> ps
PID GROUP PRI POLICY TYPE NPX STATE EVENT SIGMASK STACK USED FILLED COMMAND
0 0 0 FIFO Kthread - Ready 0000000000000000 0001952 0000828 42.4% Idle_Task
1 0 224 RR Kthread - Waiting Semaphore 0000000000000000 0001896 0000524 27.6% hpwork 0x8014b258 0x8014b27c
2 0 100 RR Kthread - Waiting Semaphore 0000000000000000 0001896 0000508 26.7% lpwork 0x8014b220 0x8014b244
riscv_exception: EXCEPTION: Load access fault. MCAUSE: 00000005, EPC: 80022d50, MTVAL: 00000048
riscv_exception: PANIC!!! Exception = 00000005
dump_assert_info: Current Version: NuttX 12.7.0 1c7d81881c Dec 10 2024 19:01:06 risc-v
dump_assert_info: Assertion failed panic: at file: common/riscv_exception.c:131 task: nsh_main process: nsh_main 0x8000a7a0
up_dump_register: EPC: 80022d50