TSC Meeting Notes 2024-10-31

Attendance:

Cary Phillips
Christina Tempelaar-Lietz
John Mertic
Joseph Goldstone
Kimball Thurston
Larry Gritz
Nick Porcino
Peter Hillman
Rod Bogart

Guests:

  • Li Ji, ILM

  • Alex Wells, Intel

  • Pierre-Anthony Lemieux, Sandflow Consulting

  • Michael Smith, Intel

Discussion:

  • 3.3.2 release

    • Kimball: no outstanding issues

    • Cary will stage a release this weekend

    • Build test suite, compile against new version, running dynamically linking against old version, if test suite passes, then say it’s ABI compatible

  • Security policy

    • What is our stated policy about how many back versions we are willing to patch?

    • Kimball: 3 vfx platforms worth, if platform skips release, would have to support the in-between releases as well.

    • Website incorrect, change wording to patch “only upon request” for other versions

    • RH8 still alive, not sure what version is on their system

    • May need to help the long-lived RHEL releases too.

    • Save more conversation on this topic to include Larry.

  • Imath::Vec::operator[] undefined behavior, cast nameless union trick

    • https://github.com/AcademySoftwareFoundation/Imath/issues/446

    • See below for ChatGPT summary and suggestions related to the following discussion

    • Kimball: want to understand support of nameless union construct but it’s not part of C++. Not til C++23, we won’t be able to support for a while.

    • Alex W: Dynamic indexing: what is happening today - implementation takes address of a float, then uses array index. Only legal to dereference within size of object for which you took an address. Anything else is UB.

    • Alex: Examples using Compiler Explorer to look at generated code:

    • Example: Take 2 vec3s shove into box, asking for center of box, invokes operator, called Vec3Fix, derived from Vec3. Supposed to be taking 2 vec3s adding together multiply by 5 but only see 1 multiply rather than 2.

    • Change implementation slightly : take the ‘this’ pointer instead of address of vec3. reinterpret cast to larger type (missed this… 12 bytes instead of 3) now it isn’t UB. reinterpret_cast brings up aliasing issues.

    • Dynamic array address must be in memory or on stack, greatly affects code gen.

    • Extra code being generated to store it, then have dynamic access.

    • Move this code to a GPU or in a vectorized loop SIMD4 wide, setup to avoid SOA: when it’s loading box doing nice wide moves, 4 items at a time, multiplies 4 at a time. But the array access needs a float pointer … has to be pushed back to the stack so we can use the accessor on it. Then has to do array indexing 4 separate times . All because it needed to be stored back on stack.

    • Impact on GPU - it requires private memory to store it. It won’t be stored in the register file - BIG HIT. Want to see all my loads and stores on global memory pointers, in register file.

    • “Select implementation” is another option: uses select_when template to only use the vec3 directly. 6 floats read in, writing out 1 float. Select_when gets compiled away. Supporting a dynamic index at all, leaves open to some horrible code gen when you go to SIMD. On GPU requires private memory for dynamic array access - chunks of your variables not stored in register file as you hoped.

    • With select_when approach, have masking support.

    • Even if you can make dynamics well-defined behavior, might want to support select instead and not even support dynamic access.

    • Kimball: is there a way of saying a parameter argument must be a literal constant, e.g. a 0, 1, or 2.

    • Unfortunately language doesn’t give that to you.

    • Can declare a const but it makes the compiler do work to optimize away. Relying on the optimizer.

    • Kimball: big performance degradation in debug mode then.

    • More examples: intersection box and vector. Box has generic implementation using dynamic array access. Generates a lot of code, not optimized well. Could change these to constant indices and that will give a better debug experience. Could keep generic algorithm if you got it to statically unroll.

    • Kimball: challenge because we don’t know what people are doing with it in the real world

    • Kimball in chat: My real fear is that someone has something like

      Vec3<float> mat3x3[3];
      Vec3<float> &first = mat3x3[0];

      first[7];

    • Li Ji in chat: So basically in a very crude way:
      #Marco: [0] := this.x; [1] := this.y; [2] := this.z
      So nothing is dynamic and the index has to be resolved at compile time

    • Kimball: wondering if we should have a .@ semantics instead.

    • Cary: in terms of where we are now, we have problematic code that’s been out there 20 years, but recently ran into real world case where it fails. Presumably luck has run out and we need to do something about this. There’s a compiler that gives bad results with this. Doing nothing not a viable option.

    • Cary: are we committed to coming up with solution that requires no end user changes.

    • Kimball: union approach

    • Alex: might depend on rules around nameless unions. Previously what last data member you wrote to was the last data member.

    • Kimball: issue is requires C++23 to be a viable solution

    • Alex: way better code gen when dealing with direct data members. Refactored code to get rid of array subscript as much as possible. Select_when approach will get rid of other code paths. Adding support for const indices - consteval might be able to use with array subscripts so you know at compile time. consteval is front end of the compiler. Code elimination will get rid of the code on the front end. Will also happen in debug mode, done by compiler. Great way to allow array subscripts access data member directly. Non-const eval, can use selection instead.

    • Alex: might consider short-term correction, access the ‘this’ pointer instead of the ‘float’ pointer to get rid of one undefined access.

    • Cary: that will give us correctness at the cost of performance.

    • Kimball: aliasing issue… because it’s in the lifetime, can we not use ‘restrict’?

    • Alex: aliasing situation: restrict tells compiler this won’t interact with anybody else. This is the opposite problem - going to access some element of it, that’s array-based. Compiler may think distinct ways of access y element, e.g. tv.y and tv[axis] , are 2 different accesses so it can’t optimize. Compiler, based on types, could think these 2 parts of memory are unrelated. You want the opposite of restrict.

    • Alex: tv is non-const so it’s going to return a reference. It’s an aliasing issue you might run into, possibly only under SIMD gen. Creating a situation where things can go awry. More correct use of unions you’re not using it to switch types. If you are switching types, putting it through memory. With “select” you can get the best of both worlds.

    • Alex happy to review a PR, too busy to create the PR.

    • Kimball: Will check with Larry, need to be sure select works well with gcc.

    • Cary: continue this discussion next week.

    • Do reinterpret cast, commit that, correctness fix. Then do select fix… or do we go straight to select fix?

    • Alex: could have a build option.

    • Kimball: could do reinterpret cast for this release, then ‘select’ for next release.

    • Peter: could wrap it in an ifdef for a patch as well.

    • Cary: make reinterpret cast in release, add select change to main.

    • Peter: option to go back to ‘this’ option if there is a performance, all inline code, don’t need to reinstall any libraries with the change

    • Cary: should eradicate all dynamic indexing.

    • Peter: put comments around the operator with warning

  • Pierre-Anthony Lemieux

    • Will send the JPEG 2000 slides to the Slack in preparation for the next call

  • Security issue - null dereference write, deep scanline

 



ChatGPT summary and suggestions - undefined behavior in Vec::operator[] discussion
(thanks to Nick Porcino)

Could you summarize the following and attempt to discover if there is a consensus or plan of action in it?

In summary, the discussion centers on resolving undefined behavior and performance issues in Vec::operator[] due to dynamic indexing, which often results in inefficient code generation, especially in GPU or SIMD environments. The current implementation relies on reinterpret_cast to handle array indexing, which may inadvertently introduce aliasing issues and other unpredictable behavior across different compilers.
Key Points:

  1. Current Issues with Dynamic Indexing: Dynamic array indexing in Vec::operator[] produces suboptimal code on both CPU and GPU, requiring data to be stored in private memory on GPU, which affects register efficiency. This slows down performance, particularly when optimized SIMD operations could instead access elements directly in registers.

  2. Proposed Solutions:

    • Short-Term Fix (reinterpret_cast): Implement reinterpret_cast for the current release to maintain correctness but with a possible performance hit.

    • Longer-Term Approach (select-when): Transition to a select_when approach, which optimizes for scenarios where constant indices can be used, thus enabling static unrolling and eliminating undefined behavior. This method also helps avoid complex and excessive code generation, particularly in debug mode.

    • Conditional Compilation: Wrap the reinterpret_cast fix within an #ifdef guard as a temporary solution, allowing flexibility for testing and user-configurable options.

  3. Compiler and Code Generation Considerations:

    • Const Indexing: Allowing only constant indices would enable the compiler to optimize aggressively and prevent undefined behavior from dynamic indexing. This is expected to work well with both GCC and Clang, though further testing is needed to confirm.

    • Nameless Unions: While a union approach (allowing seamless access without indexing) was considered, it depends on C++23 for safe usage. As the language standard update is not universally supported yet, this option was deprioritized.

    • Aliasing Complexity: Misinterpreted aliasing arises when the compiler perceives indexed and named accesses (like vec.y vs. vec[axis]) as unrelated, hindering optimization. Solutions that remove such ambiguity, like direct member access or the proposed select_when approach, are preferable.

  4. Consensus and Action Plan:

    • Apply reinterpret_cast as a temporary fix for the immediate release, addressing correctness with some performance cost.

    • Test and plan for a select_when based fix for future releases, potentially eradicating dynamic indexing entirely if performance on critical platforms (e.g., GCC) is satisfactory.

    • Continue discussion, with a follow-up to review implementation results and examine further compiler compatibility.

The conversation highlights a common ground on immediate corrective measures, balanced with a clear direction toward a more performant, constant-indexed approach (select_when). This layered plan aligns with performance requirements while allowing gradual adaptation to newer C++ standards.


Can you think of other solutions, or have they pretty much nailed it?

They've covered the primary approaches for resolving this issue, but a few additional alternatives might be worth exploring, particularly if they’re looking to enhance performance in both immediate and long-term code updates:

  1. Tag Dispatching for Compile-Time Indexing: Instead of only using reinterpret_cast or select_when, they could use tag dispatching to enforce compile-time constants. For example, different overloads of operator[]could use tags for indices 0, 1, and 2. This would effectively prevent dynamic indexing and could simplify code paths for frequently used cases without relying on C++23’s consteval. Although somewhat verbose, this approach can make the constant indexing intent explicit and enable straightforward SIMD-friendly optimizations.

  2. Static Data Layout with Inline Accessors: If access patterns allow, reworking Vec to use a fixed-layout structure with direct member access (e.g., x, y, z) and inlined methods for mathematical operations could eliminate the need for operator[]. This would require adjusting how Vec instances are used throughout the codebase, but it can lead to more consistent in-register performance. To handle cases where dynamic indexing is unavoidable, this approach could still offer the main performance gains while falling back to an index-based accessor with a well-marked warning or fallback.

  3. Fixed-Size Array Wrappers: For cases where Vec types are commonly used with fixed sizes, they could consider adding explicit methods like getX(), getY(), and getZ() instead of operator[], ensuring the compiler doesn’t generate additional array bounds checks or stack-to-register moves that dynamic indices cause. This approach could also reduce debug-mode overhead and help future-proof the API if they decide to deprecate dynamic indexing.

  4. Compile-Time constexpr Unrolling via std::array: If a fixed-size layout like std::array<float, 3>replaces the underlying data structure, they could leverage constexpr functions that index statically. When compiled with high optimization levels, constexpr can often eliminate unnecessary bounds checks or dynamic array overhead. This technique could even be applied in debug builds with C++20 constexpr capabilities and offer predictable, compiler-enforced constant access.

  5. Dynamic Indexing with Guaranteed Performance Penalty Warning: If they choose to retain dynamic indexing as an optional fallback, a clear interface layer marking functions that use dynamic access as "non-performant" could provide feedback to developers using the library. For example, Vec::operator[](size_t)could be explicitly documented with comments and diagnostic pragmas where supported to caution about the associated overhead, encouraging alternatives when performance matters.

  6. Proxy Accessor Object with Custom Aliasing Rules: They could experiment with a proxy object that wraps access to elements and explicitly marks aliasing rules via restrict. This technique would allow them to define specific access patterns (e.g., element-by-element or vector-wide) without compromising performance. It’s trickier to implement but might allow dynamic indexing while maintaining compiler optimizations.

In conclusion, their current approach is sound—prioritizing correctness now with an eye on constant indexing via select_when is a safe strategy. However, adding features like tag dispatching or fixed-layout structures could further enhance performance in both debug and optimized builds, especially if dynamic indexing can be progressively phased out or deprecated over time.