Varun Nair

March 2026

Fixing projectaria_tools on NVIDIA GH200 (aarch64 + GCC)

How 59 patches across 6 files unblock Meta's Aria toolkit on ARM servers


If you're building projectaria_tools from source on an NVIDIA GH200 or any aarch64 Linux server with GCC, the C++ build fails deep inside Meta's Ocean library (github.com/facebookresearch/ocean, fetched via CMake FetchContent). As of March 2026, this is completely undocumented. Here's the problem and the fix.

Why It Breaks

The GH200 pairs an ARM CPU (Grace) with an H100 GPU. The GPU side works fine. The CPU side runs aarch64 Linux with GCC. Ocean contains ARM NEON SIMD code written for Aria glasses, originally tested with Clang on Android and iOS. The ML community validated x86 builds with GCC, which skips the NEON code paths entirely. Nobody tested ARM + GCC — the combination the GH200 exposes.

Bug 1 — constexpr on NEON vector types(47 fixes)

GCC rejects constexpr on types like uint8x16_t. This is a Clang extension — GCC never supported it. Every occurrence across Ocean's NEON intrinsic files is a hard compile error on GCC.

before
static constexpr uint8x16_t mask = vdupq_n_u8(0x0F);
static constexpr uint16x8_t coeffs = vdupq_n_u16(256);
after
static const uint8x16_t mask = vdupq_n_u8(0x0F);
static const uint16x8_t coeffs = vdupq_n_u16(256);

Bug 2 — Signed/unsigned shift mismatch(6 fixes)

vrshrq_n_s16() (the signed rounding shift) is called on unsigned data produced by vrhaddq_u16(). GCC catches the type mismatch; Clang was lenient. Fix: use the unsigned variant.

before
uint16x8_t avg = vrhaddq_u16(a, b);
int16x8_t result = vrshrq_n_s16(avg, 1);  // signed shift on unsigned data
after
uint16x8_t avg = vrhaddq_u16(a, b);
uint16x8_t result = vrshrq_n_u16(avg, 1);  // correct unsigned variant

Bug 3 — Wrong lane accessor types(6 fixes)

vget_low_u8() is called on a uint16x8_t — a 64-bit lane accessor applied to the wrong element width. GCC rejects this. Fix: match the accessor to the input type.

before
uint16x8_t wide = vaddq_u16(x, y);
uint8x8_t low = vget_low_u8(wide);   // u8 accessor on u16 vector
after
uint16x8_t wide = vaddq_u16(x, y);
uint16x4_t low = vget_low_u16(wide);  // correct accessor

Summary

BugRoot CauseCountFix
constexpr NEON typesClang extension, GCC rejects47constexprconst
Signed/unsigned shiftWrong intrinsic variant6_s16_u16
Wrong lane accessorElement width mismatch6vget_low_u8vget_low_u16

Who This Affects

Anyone running egocentric vision pipelines on ARM servers: EgoLifter, HaWoR, EgoGaussian, or anything that pulls projectaria_tools as a dependency. The affected hardware includes the NVIDIA GH200, AWS Graviton, and Ampere Altra. If your CI or inference cluster has shifted to ARM for cost or efficiency reasons, this is a silent blocker.

A PR has been submitted to facebookresearch/ocean to fix this upstream. Discovered while building an ego-video decomposition pipeline on NVIDIA GH200.