March 2026

Fixing projectaria_tools on NVIDIA GH200 (aarch64 + GCC)

How 59 patches across 6 files unblock Meta's Aria toolkit on ARM servers

If you're building projectaria_tools from source on an NVIDIA GH200 or any aarch64 Linux server with GCC, the C++ build fails deep inside Meta's Ocean library (github.com/facebookresearch/ocean, fetched via CMake FetchContent). As of March 2026, this is completely undocumented. Here's the problem and the fix.

Why It Breaks

The GH200 pairs an ARM CPU (Grace) with an H100 GPU. The GPU side works fine. The CPU side runs aarch64 Linux with GCC. Ocean contains ARM NEON SIMD code written for Aria glasses, originally tested with Clang on Android and iOS. The ML community validated x86 builds with GCC, which skips the NEON code paths entirely. Nobody tested ARM + GCC — the combination the GH200 exposes.

Bug 1 — `constexpr` on NEON vector types(47 fixes)

GCC rejects constexpr on types like uint8x16_t. This is a Clang extension — GCC never supported it. Every occurrence across Ocean's NEON intrinsic files is a hard compile error on GCC.

before

static constexpr uint8x16_t mask = vdupq_n_u8(0x0F);
static constexpr uint16x8_t coeffs = vdupq_n_u16(256);

after

static const uint8x16_t mask = vdupq_n_u8(0x0F);
static const uint16x8_t coeffs = vdupq_n_u16(256);

Bug 2 — Signed/unsigned shift mismatch(6 fixes)

vrshrq_n_s16() (the signed rounding shift) is called on unsigned data produced by vrhaddq_u16(). GCC catches the type mismatch; Clang was lenient. Fix: use the unsigned variant.

before

uint16x8_t avg = vrhaddq_u16(a, b);
int16x8_t result = vrshrq_n_s16(avg, 1);  // signed shift on unsigned data

after

uint16x8_t avg = vrhaddq_u16(a, b);
uint16x8_t result = vrshrq_n_u16(avg, 1);  // correct unsigned variant

Bug 3 — Wrong lane accessor types(6 fixes)

vget_low_u8() is called on a uint16x8_t — a 64-bit lane accessor applied to the wrong element width. GCC rejects this. Fix: match the accessor to the input type.

before

uint16x8_t wide = vaddq_u16(x, y);
uint8x8_t low = vget_low_u8(wide);   // u8 accessor on u16 vector

after

uint16x8_t wide = vaddq_u16(x, y);
uint16x4_t low = vget_low_u16(wide);  // correct accessor

Summary

Bug	Root Cause	Count	Fix
`constexpr` NEON types	Clang extension, GCC rejects	47	`constexpr` → `const`
Signed/unsigned shift	Wrong intrinsic variant	6	`_s16` → `_u16`
Wrong lane accessor	Element width mismatch	6	`vget_low_u8` → `vget_low_u16`

Who This Affects

Anyone running egocentric vision pipelines on ARM servers: EgoLifter, HaWoR, EgoGaussian, or anything that pulls projectaria_tools as a dependency. The affected hardware includes the NVIDIA GH200, AWS Graviton, and Ampere Altra. If your CI or inference cluster has shifted to ARM for cost or efficiency reasons, this is a silent blocker.

A PR has been submitted to facebookresearch/ocean to fix this upstream. Discovered while building an ego-video decomposition pipeline on NVIDIA GH200.

Fixing projectaria_tools on NVIDIA GH200 (aarch64 + GCC)

Why It Breaks

Bug 1 — constexpr on NEON vector types(47 fixes)

Bug 2 — Signed/unsigned shift mismatch(6 fixes)

Bug 3 — Wrong lane accessor types(6 fixes)

Summary

Who This Affects

Bug 1 — `constexpr` on NEON vector types(47 fixes)