Page Comparison

...

Discussion
- Kimball: this is exactly how OpenEXR works. Did you get to use the decoding pipeline in the core library? Both tiles and scanlines are called chunks. This was precise idea around having the decoding pipeline. Local header was the only thing I didn’t know whether we needed to do something custom. Depending on zip or zips compression, 1 vs 16 scanlines of data. Maybe too large for DS. Maybe we have a zipg for custom number? Do we need to skip the local header in the EXR file? Mostly sanity checking in local header. Could elide some of that. Have you played with that?
- Rus: that was the plan but never got to it. Even if it was 16 scanlines, it’s not a constant size. Needs to be a constant size. Example 8k texture vs 4k texture.
- Kimball: We can pad it to be a constant size if that helps.
- Rus: so instead of 16 scanlines, needs to be 10mb so it’s a constant size.
- Kimball: for direct storage
- Rus: yes. When I compress an entire file as one chunk, it is inefficient, becomes slower than uncompressed approach. Chunkage is important to the DS performance.
- Kimball: really about how to get GDeflate to give you a consistent size? It’s in each fragment correct?
- Rus: Yes the individual part is what has been compressed. All 64kb when uncompressed. That header generated by DS GDeflate when compressed on GPU.
- Rod: to what degree do we want to consider making legal EXR files that happen to have this data in it? And what are the extension roots you are considering?
- Kimball: is direct storage header public information?
- Rus: somewhat not official guideline. Available as an example on their repository. Fragment size “unlikely to change” but the header could be changed by MS. If support DS 1.2, keep up to date with how they do their headers.
- Rod: not change MS would consider lightly. But if they do update we would have to know.
- Kimball: have we taled to MS to formalize this header definition.
- Rus: no, that would be next step.
- Nick: Each fragment independent or share a dictionary?
- Rus: completely independent. Only common thing is the header.
- Kimball: Mark Liani has created a branch where he wired in GDeflate as an option into EXR in a branch. Have not looked at it yet.
- Rus: can run on whole data but not as efficient.
- Kimball: if define a zipg, we would have to go through ability to break into chunks
- Rod: also have to add the writing code. Not really MS code as long as the header doesn’t change.
- Rod: Larry, have you worked on this yet.
- Larry: no but I’d like to use. But it needs to have a plausible path on all platforms, GPU, CPU.
- Rod: my understanding is that it is working.
- Rus: we will be able to take advantage of this and provide each individual chunk to the GPU, skip system memory and CPU side.
- Larry: I like it a lot.
- Rus: from point of view of OpenEXR it’s a different way of storing things. For us to use GDeflate would just be a juggling of how data is stored. Lose half a sec on decompression on the CPU side.
- Larry: thoughts are what new APIs do we need in OpenEXR to support this? We haven’t talked about giving a GPU side destination for things. Do you want uncompressed pixels? Do you want to just get the compressed chunk? No notion of async calls yet. Not of it extremely diff but need to think of the overall implications, how it interacts with the other APIs. Haven’t yet gone as far of mapping back the core work to the C++ API to support threading. Timing is good to think this stuff through.
- Cary: also have asked the question how beholden are we to the current C++API. Could create something new and fresh potentially.
- Larry: switch from 2 to 3 was very disruptive. We had discussed keeping APIs working and adding a few new things. Significant overhaul is possible but it’s a big ask for everyone downstream.
- Nick: have had good luck building on top of the C core.
- Larry: but you put a lot of Nick-time into that work. A lesser mortal might have struggled. Not easy to figure out the C calls.
- Rod: do we have a commitment to sharing those nicely wrapped things.
- Cary: what is state of your code currently? How far away is it from adding to the library?
- Rus: almost nothing of the existing library.
- Rod: we read and disassemble by ourselves since we know how they are laid out.
- Cary: informative for you to take a close look at what Kimball has done with the core library. Exactly the intent.
- Nick: was struck that it would fit elegantly into the pipeline he has made.
- Rod: this was a prototype to see if it works at all. Yeah now feels like doing it EXR C-core style would be good next move. If you’re reading the whole picture all at once. We are also reading mipmaps and tiles and being very selective, other thing we’ve already done in uncompressed case. But ensuring the right amount of compressed data is there and you don’t have a lot of waste is going to make these tiles a little trickier.
- Larry: concerned about use case of using them as textures, virtual texturing. Need to worry about using them as tiles and worry about what we pull in.
- Rod: in that example, chance to introspect on the scene and figure out what you need. But we are doing this realtime. We have things that are maps, e.g. dome, so if isn’t visible / projected into frustum, don’t read it.
- Rus: tiles is our main use case. Load lower mip levels when don’t see dome. Areas closer to screen load higher mip levels. For tile use case, who situation becomes simple bc you already have a chunk . In the case of tiles, this chunk becomes a tile.
- Rod: size is equitable to DS.
- Rus; 400x256 tile it becomes 393kb, smaller than 640kb but still good size for compression/decompression. In tile case, fairly straightforward.
- Rod: Cary’s right we should try to adapt it using the C core.
- Lutz: How married are you to DS. When we looked at it, lots of Linux use cases, not happy with how they conflate the copy from the disk to the GPU. Have you looked at way to decouple storage aspect vs GPU aspect.
- Rus: DS has automatic GDeflate decompression along the pipeline. No other has these concepts married.
- Rod: our main use case is windows machines on ICVFX stages.
- Rus: current implementation utilizes shared memory buffer. Only just now available on Vulkan, keep mapped buffers and read and write without locking. Constantly locked structured buffer then read and write like regular memory but available through GPU almost immediately. Only available on DX 12 not sure how well it works on Vulkan.
- Lutz: On Vulkan we’ve been doing (?) for a long time.
- Rus: some synchronization but available to CPU and GPU without locks.
- Nick: equivalent thought around RTX (?)
- Rus: direct storage is GPU direct . NVIDIA themselves mention
- Lutz: advantage when it’s separate. GDeflate better controlled in the format. Loader just reads in the raw data, issue command on the buffer to decompress.
- Rus: could read compressed data with the Cu (CUDA?) file. Then call decompression on the buffer. But there was some trick to it, not ideal to his recollection. GPU Direct is cross-platform so could do this on Linux. I should try this.
- Cary: Virtual Town Hall would you be willing to give an abbreviated version of this? We’re going to tell people you’re doing it. Along the lines of plans for the future. August 2, before SIGGRAPH. How public are you willing to be?
- Rod: we need to check our permissions. Epic is on holiday for next few weeks so we will get back to you on what we can present. Would prefer to have an experiment with C core in advance
- Cary: helpful for community to know what’s going on.
- Larry: one of the looming things is figuring out GPU decompression options and how to change the format and API to support those.

3.1.9 DWA read issues discussion
Related to this fix: PR#1439 https://github.com/AcademySoftwareFoundation/openexr/pull/1439issue
- There was a Slack discussion regarding broken DWA file read introduced in 3.1.9 by PR #1439. Nick has a fix that works, shared on the channel.
- Larry: are we convinced that is the right solution?
- Nick: I didn’t exhaustively check each comparison, just changed all of them and now I can load the files. But hoping someone who wrote the code might have an opinion as to whether the change is legitimate. Planning to submit a PR soon.
- Comparison to catch overrun in the fuzzing. It solves the CVE by preventing writing over the end of the buffer. Think it is fine but it highlights that we might need more checking on the DW compressed files in general. Unit test didn’t catch that we couldn’t open those files. S
- Larry: spuriously failing by thinking it was corrupt. But difficult to tell that deep in the code whether it was correct.
- Peter: 2 buffers it was checking. At end of both it’s ok but if not at the end of both, file is not ok. Maybe change it put in PR and have Kimball review then re-fuzz it.
- Larry: reveals gap in our testing policy.
- Peter: may have worked with some DWA compressed files and not others.
- Nick: may be rare but the file Larry shared triggered it.
- Peter: test suite tests the compression work but not with a real file.
- Nick: we have 6 images but one per settings permutation. Not sufficient. Not sure if it’s adequate to just generate more compressed images.
- Cary: fuzzing is more realistic approach.

...

Versions Compared

Old Version 10

New Version 11

Key