TSC Meeting Notes 2022-09-08
Attendance:
Others:
Darby Johnston
Barry Dempsey
Lutz Latta
Mark Leone
Discussion:
GPU decompression:
Mark Leone (NVIDIA): Steve Parker outlined 3 options for GPU streaming:
Decompress zip data on the GPU. Pdeflate library has not been distributed, proof of concept. Does better with some metadata to know where the lines of compression begin and end. Parallelizes better. Would be backwards compatible.
New tech: Gdeflate. Decompression is open source, compression is closed source. 50-60 Gb/sec.
DMA between storage and GPU memory. Can read directly into GPU memory, bypass CPU memory. GMA.
Gdefault ships in nvcomp. It’s Linux only. Difficult to accomplish DMA under windows.
Peter: The metadata that says where the scanline is already there? Why does it need more than what it’s already got?
Lutz: Our idealized pipeline, all block compression on the GPU, all native GPU decompression an option for the CPU.
Mark: Open the file on the CPU, then read compressed tile onto GPU memory.
Peter: Kimball already has that, not the GPU decompression but the hooks to specify the memory.
Lutz: Need a legacy fallback.
Mark: gdeflate library but will ship with an open source decompressor for CPU, based on libdeflate.
Mark: Not sure of the plans, but hopefully a reference implementation of the compressor, not necessarily performant.
Mark: Maybe to reserve the right to move the implementation into hardware.
Larry: Do the block compression schemes rely on a color space?
Lutz: I don't think so.
Larry: All along, the assumption is that data in exr is linear, HDR. But exr is the only format that supports multipart, other convenient features. Lots of things would fall out of using exr for format for native texture format.
Peter: The concern about stuff that is non-HDR is about color. People are already abusing it, this is an excuse to abuse it further.
Peter: Add new channel types. We should invent a compression scheme that handles different channels differently.
Enabling SSE instructions:
Darby: Profiling on Windows, SSE doesn’t get turned on. SSE is not being tested in the test suite.
Larry: SSE2 is, but not SSE4, but there’s no easy way to turn it on.
Darby: Use CMake code from OSL. I can add it to the CI as well.
Larry: In OSL CI matrix, sparsely sample the options. Make sure that each SIMD option is tested everywhere. A worthy project. Don’t take the OSL too literally.
Larry: Don’t bake it in at compile time.
Darby: Has anyone used the SS4?
Larry: No way to turn it on? Was there a way with the autoconf build?
Cary: I doubt it.
Darby: Was just playing back some files. Maybe 20-30% speedup.
Larry: Code is so old there’s probably no AVX in it.
Darby: If you get it wrong, the compiler will catch it, right?
Larry: No, because you might be compiling for a different architecture.
Peter: There’s a flag for “I promise to only run this on the machine I compile it on.”
Darby: Another PR, a bug in DWA reader.
I have a patch.
Peter: ImfCheckFile has read from memory and read from disk, but not from memory map. Put that in there, then the fuzz test with catch it.
Peter: ImfCheckFile does brute force testing of the API.
Joseph: Have we ever asked the Apple Silicon group for help with optimizing? I think they have their own fork.
Nick: They have diverged significantly.
Joseph: Contact Michael Johnson.