March 2026
Fixing projectaria_tools on NVIDIA GH200 (aarch64 + GCC)
How 59 patches across 6 files unblock Meta's Aria toolkit on ARM servers
If you're building projectaria_tools from source on an NVIDIA GH200 or any aarch64 Linux server with GCC, the C++ build fails deep inside Meta's Ocean library (github.com/facebookresearch/ocean, fetched via CMake FetchContent). As of March 2026, this is completely undocumented. Here's the problem and the fix.
Why It Breaks
The GH200 pairs an ARM CPU (Grace) with an H100 GPU. The GPU side works fine. The CPU side runs aarch64 Linux with GCC. Ocean contains ARM NEON SIMD code written for Aria glasses, originally tested with Clang on Android and iOS. The ML community validated x86 builds with GCC, which skips the NEON code paths entirely. Nobody tested ARM + GCC — the combination the GH200 exposes.
Bug 1 — constexpr on NEON vector types(47 fixes)
GCC rejects constexpr on types like uint8x16_t. This is a Clang extension — GCC never supported it. Every occurrence across Ocean's NEON intrinsic files is a hard compile error on GCC.
static constexpr uint8x16_t mask = vdupq_n_u8(0x0F);
static constexpr uint16x8_t coeffs = vdupq_n_u16(256);static const uint8x16_t mask = vdupq_n_u8(0x0F);
static const uint16x8_t coeffs = vdupq_n_u16(256);Bug 2 — Signed/unsigned shift mismatch(6 fixes)
vrshrq_n_s16() (the signed rounding shift) is called on unsigned data produced by vrhaddq_u16(). GCC catches the type mismatch; Clang was lenient. Fix: use the unsigned variant.
uint16x8_t avg = vrhaddq_u16(a, b);
int16x8_t result = vrshrq_n_s16(avg, 1); // signed shift on unsigned datauint16x8_t avg = vrhaddq_u16(a, b);
uint16x8_t result = vrshrq_n_u16(avg, 1); // correct unsigned variantBug 3 — Wrong lane accessor types(6 fixes)
vget_low_u8() is called on a uint16x8_t — a 64-bit lane accessor applied to the wrong element width. GCC rejects this. Fix: match the accessor to the input type.
uint16x8_t wide = vaddq_u16(x, y);
uint8x8_t low = vget_low_u8(wide); // u8 accessor on u16 vectoruint16x8_t wide = vaddq_u16(x, y);
uint16x4_t low = vget_low_u16(wide); // correct accessorSummary
| Bug | Root Cause | Count | Fix |
|---|---|---|---|
constexpr NEON types | Clang extension, GCC rejects | 47 | constexpr → const |
| Signed/unsigned shift | Wrong intrinsic variant | 6 | _s16 → _u16 |
| Wrong lane accessor | Element width mismatch | 6 | vget_low_u8 → vget_low_u16 |
Who This Affects
Anyone running egocentric vision pipelines on ARM servers: EgoLifter, HaWoR, EgoGaussian, or anything that pulls projectaria_tools as a dependency. The affected hardware includes the NVIDIA GH200, AWS Graviton, and Ampere Altra. If your CI or inference cluster has shifted to ARM for cost or efficiency reasons, this is a silent blocker.
A PR has been submitted to facebookresearch/ocean to fix this upstream. Discovered while building an ego-video decomposition pipeline on NVIDIA GH200.