TSC Meeting Notes 2024-10-31
Attendance:
Guests:
Li Ji, ILM
Alex Wells, Intel
Pierre-Anthony Lemieux, Sandflow Consulting
Michael Smith, Intel
Discussion:
3.3.2 release
Kimball: no outstanding issues
Cary will stage a release this weekend
Build test suite, compile against new version, running dynamically linking against old version, if test suite passes, then say it’s ABI compatible
Security policy
What is our stated policy about how many back versions we are willing to patch?
Kimball: 3 vfx platforms worth, if platform skips release, would have to support the in-between releases as well.
Website incorrect, change wording to patch “only upon request” for other versions
RH8 still alive, not sure what version is on their system
May need to help the long-lived RHEL releases too.
Save more conversation on this topic to include Larry.
Imath::Vec::operator[] undefined behavior, cast nameless union trick
https://github.com/AcademySoftwareFoundation/Imath/issues/446
See below for ChatGPT summary and suggestions related to the following discussion
Kimball: want to understand support of nameless union construct but it’s not part of C++. Not til C++23, we won’t be able to support for a while.
Alex W: Dynamic indexing: what is happening today - implementation takes address of a float, then uses array index. Only legal to dereference within size of object for which you took an address. Anything else is UB.
Alex: Examples using Compiler Explorer to look at generated code:
Example: Take 2 vec3s shove into box, asking for center of box, invokes operator, called Vec3Fix, derived from Vec3. Supposed to be taking 2 vec3s adding together multiply by 5 but only see 1 multiply rather than 2.
Change implementation slightly : take the ‘this’ pointer instead of address of vec3. reinterpret cast to larger type (missed this… 12 bytes instead of 3) now it isn’t UB. reinterpret_cast brings up aliasing issues.
Dynamic array address must be in memory or on stack, greatly affects code gen.
Extra code being generated to store it, then have dynamic access.
Move this code to a GPU or in a vectorized loop SIMD4 wide, setup to avoid SOA: when it’s loading box doing nice wide moves, 4 items at a time, multiplies 4 at a time. But the array access needs a float pointer … has to be pushed back to the stack so we can use the accessor on it. Then has to do array indexing 4 separate times . All because it needed to be stored back on stack.
Impact on GPU - it requires private memory to store it. It won’t be stored in the register file - BIG HIT. Want to see all my loads and stores on global memory pointers, in register file.
“Select implementation” is another option: uses select_when template to only use the vec3 directly. 6 floats read in, writing out 1 float. Select_when gets compiled away. Supporting a dynamic index at all, leaves open to some horrible code gen when you go to SIMD. On GPU requires private memory for dynamic array access - chunks of your variables not stored in register file as you hoped.
With select_when approach, have masking support.
Even if you can make dynamics well-defined behavior, might want to support select instead and not even support dynamic access.
Kimball: is there a way of saying a parameter argument must be a literal constant, e.g. a 0, 1, or 2.
Unfortunately language doesn’t give that to you.
Can declare a const but it makes the compiler do work to optimize away. Relying on the optimizer.
Kimball: big performance degradation in debug mode then.
More examples: intersection box and vector. Box has generic implementation using dynamic array access. Generates a lot of code, not optimized well. Could change these to constant indices and that will give a better debug experience. Could keep generic algorithm if you got it to statically unroll.
Kimball: challenge because we don’t know what people are doing with it in the real world
Kimball in chat: My real fear is that someone has something like
Vec3<float> mat3x3[3];
Vec3<float> &first = mat3x3[0];first[7];
Li Ji in chat: So basically in a very crude way:
#Marco: [0] := this.x; [1] := this.y; [2] := this.z
So nothing is dynamic and the index has to be resolved at compile timeKimball: wondering if we should have a .@ semantics instead.
Cary: in terms of where we are now, we have problematic code that’s been out there 20 years, but recently ran into real world case where it fails. Presumably luck has run out and we need to do something about this. There’s a compiler that gives bad results with this. Doing nothing not a viable option.
Cary: are we committed to coming up with solution that requires no end user changes.
Kimball: union approach
Alex: might depend on rules around nameless unions. Previously what last data member you wrote to was the last data member.
Kimball: issue is requires C++23 to be a viable solution
Alex: way better code gen when dealing with direct data members. Refactored code to get rid of array subscript as much as possible. Select_when approach will get rid of other code paths. Adding support for const indices - consteval might be able to use with array subscripts so you know at compile time. consteval is front end of the compiler. Code elimination will get rid of the code on the front end. Will also happen in debug mode, done by compiler. Great way to allow array subscripts access data member directly. Non-const eval, can use selection instead.
Alex: might consider short-term correction, access the ‘this’ pointer instead of the ‘float’ pointer to get rid of one undefined access.
Cary: that will give us correctness at the cost of performance.
Kimball: aliasing issue… because it’s in the lifetime, can we not use ‘restrict’?
Alex: aliasing situation: restrict tells compiler this won’t interact with anybody else. This is the opposite problem - going to access some element of it, that’s array-based. Compiler may think distinct ways of access y element, e.g. tv.y and tv[axis] , are 2 different accesses so it can’t optimize. Compiler, based on types, could think these 2 parts of memory are unrelated. You want the opposite of restrict.
Alex: tv is non-const so it’s going to return a reference. It’s an aliasing issue you might run into, possibly only under SIMD gen. Creating a situation where things can go awry. More correct use of unions you’re not using it to switch types. If you are switching types, putting it through memory. With “select” you can get the best of both worlds.
Alex happy to review a PR, too busy to create the PR.
Kimball: Will check with Larry, need to be sure select works well with gcc.
Cary: continue this discussion next week.
Do reinterpret cast, commit that, correctness fix. Then do select fix… or do we go straight to select fix?
Alex: could have a build option.
Kimball: could do reinterpret cast for this release, then ‘select’ for next release.
Peter: could wrap it in an ifdef for a patch as well.
Cary: make reinterpret cast in release, add select change to main.
Peter: option to go back to ‘this’ option if there is a performance, all inline code, don’t need to reinstall any libraries with the change
Cary: should eradicate all dynamic indexing.
Peter: put comments around the operator with warning
Pierre-Anthony Lemieux
Will send the JPEG 2000 slides to the Slack in preparation for the next call
Security issue - null dereference write, deep scanline
Peter will send to Kimball
hold off on patch release until we get to this issue
Kimball fixed in scanline probably but not deep scanline, will try to fix tomorrow.
ChatGPT summary and suggestions - undefined behavior in Vec::operator[] discussion
(thanks to Nick Porcino)Could you summarize the following and attempt to discover if there is a consensus or plan of action in it?
In summary, the discussion centers on resolving undefined behavior and performance issues in
Vec::operator[]
due to dynamic indexing, which often results in inefficient code generation, especially in GPU or SIMD environments. The current implementation relies onreinterpret_cast
to handle array indexing, which may inadvertently introduce aliasing issues and other unpredictable behavior across different compilers.
Key Points:
Current Issues with Dynamic Indexing: Dynamic array indexing in
Vec::operator[]
produces suboptimal code on both CPU and GPU, requiring data to be stored in private memory on GPU, which affects register efficiency. This slows down performance, particularly when optimized SIMD operations could instead access elements directly in registers.Proposed Solutions:
Short-Term Fix (reinterpret_cast): Implement
reinterpret_cast
for the current release to maintain correctness but with a possible performance hit.Longer-Term Approach (select-when): Transition to a
select_when
approach, which optimizes for scenarios where constant indices can be used, thus enabling static unrolling and eliminating undefined behavior. This method also helps avoid complex and excessive code generation, particularly in debug mode.Conditional Compilation: Wrap the
reinterpret_cast
fix within an#ifdef
guard as a temporary solution, allowing flexibility for testing and user-configurable options.
Compiler and Code Generation Considerations:
Const Indexing: Allowing only constant indices would enable the compiler to optimize aggressively and prevent undefined behavior from dynamic indexing. This is expected to work well with both GCC and Clang, though further testing is needed to confirm.
Nameless Unions: While a union approach (allowing seamless access without indexing) was considered, it depends on C++23 for safe usage. As the language standard update is not universally supported yet, this option was deprioritized.
Aliasing Complexity: Misinterpreted aliasing arises when the compiler perceives indexed and named accesses (like
vec.y
vs.vec[axis]
) as unrelated, hindering optimization. Solutions that remove such ambiguity, like direct member access or the proposedselect_when
approach, are preferable.
Consensus and Action Plan:
Apply
reinterpret_cast
as a temporary fix for the immediate release, addressing correctness with some performance cost.Test and plan for a
select_when
based fix for future releases, potentially eradicating dynamic indexing entirely if performance on critical platforms (e.g., GCC) is satisfactory.Continue discussion, with a follow-up to review implementation results and examine further compiler compatibility.
The conversation highlights a common ground on immediate corrective measures, balanced with a clear direction toward a more performant, constant-indexed approach (
select_when
). This layered plan aligns with performance requirements while allowing gradual adaptation to newer C++ standards.
Can you think of other solutions, or have they pretty much nailed it?
They've covered the primary approaches for resolving this issue, but a few additional alternatives might be worth exploring, particularly if they’re looking to enhance performance in both immediate and long-term code updates:
Tag Dispatching for Compile-Time Indexing: Instead of only using
reinterpret_cast
orselect_when
, they could use tag dispatching to enforce compile-time constants. For example, different overloads ofoperator[]
could use tags for indices0
,1
, and2
. This would effectively prevent dynamic indexing and could simplify code paths for frequently used cases without relying on C++23’sconsteval
. Although somewhat verbose, this approach can make the constant indexing intent explicit and enable straightforward SIMD-friendly optimizations.Static Data Layout with Inline Accessors: If access patterns allow, reworking
Vec
to use a fixed-layout structure with direct member access (e.g.,x
,y
,z
) and inlined methods for mathematical operations could eliminate the need foroperator[]
. This would require adjusting howVec
instances are used throughout the codebase, but it can lead to more consistent in-register performance. To handle cases where dynamic indexing is unavoidable, this approach could still offer the main performance gains while falling back to an index-based accessor with a well-marked warning or fallback.Fixed-Size Array Wrappers: For cases where
Vec
types are commonly used with fixed sizes, they could consider adding explicit methods likegetX()
,getY()
, andgetZ()
instead ofoperator[]
, ensuring the compiler doesn’t generate additional array bounds checks or stack-to-register moves that dynamic indices cause. This approach could also reduce debug-mode overhead and help future-proof the API if they decide to deprecate dynamic indexing.Compile-Time
constexpr
Unrolling viastd::array
: If a fixed-size layout likestd::array<float, 3>
replaces the underlying data structure, they could leverageconstexpr
functions that index statically. When compiled with high optimization levels,constexpr
can often eliminate unnecessary bounds checks or dynamic array overhead. This technique could even be applied in debug builds with C++20constexpr
capabilities and offer predictable, compiler-enforced constant access.Dynamic Indexing with Guaranteed Performance Penalty Warning: If they choose to retain dynamic indexing as an optional fallback, a clear interface layer marking functions that use dynamic access as "non-performant" could provide feedback to developers using the library. For example,
Vec::operator[](size_t)
could be explicitly documented with comments and diagnostic pragmas where supported to caution about the associated overhead, encouraging alternatives when performance matters.Proxy Accessor Object with Custom Aliasing Rules: They could experiment with a proxy object that wraps access to elements and explicitly marks aliasing rules via
restrict
. This technique would allow them to define specific access patterns (e.g., element-by-element or vector-wide) without compromising performance. It’s trickier to implement but might allow dynamic indexing while maintaining compiler optimizations.
In conclusion, their current approach is sound—prioritizing correctness now with an eye on constant indexing via
select_when
is a safe strategy. However, adding features like tag dispatching or fixed-layout structures could further enhance performance in both debug and optimized builds, especially if dynamic indexing can be progressively phased out or deprecated over time.