TSC Meeting Notes 2020-10-22

Attending:

Cary Phillips
Christina Tempelaar-Lietz
Eric Enderton
Joseph Goldstone
Kimball Thurston
Larry Gritz
Nick Porcino
Owen Thompson
Peter Hillman
Steve Parker

Discussion:

ASWF project survey: should OpenEXR participate?
- Larry: we struggle to enumerate what questions to ask.
- Cary: the generic questions that Larry contributed to the survey are interesting to us, but the survey is not likely to reach the larger OpenEXR community.
- Rod: We've asked questions before in email and GitHub Issues and not gotten much response, on issues like "should we support autotools?"
GPU acceleration of EXR reading:
- Steve Parker of NVIDIA, Rod, Kimball met to discuss how to efficiently read exrs directly to GPU.
- Steve: reading compressed files, decompress on the GPU. Not tiled and mipmapped already. The right thing to do is read off of disk, minimally, only as needed.
- Steve: The motivation is that compressed files exist.
- Reading data is fine, all well and good, until you get to the point where you need to operate on the data. Then you need to decompress it. Haven’t touched format conversions.
- Upshot is, 1 month ago, RTXIO was announced, a combination of technologies for decompression and GPU direct storage, CUDA technology. For a Windows and DirectX ecosystem.
- End goal is to have all these pieces available as a part of the architecture. Working with Microsoft on the direct storage. Watch this space to hear more about it.
- A lot of the foundations are there in GPU direct storage. For EXR, the host would open a file and read its header and offset table. This gives the host enough information to deal with it. Processing of the remainder of the file would move to GPU memory. This is what we’re in the middle of.
- Eric: The host reads the header, then never reads the file again.
- Larry: what’s the programming model? Steve: a combination of a memcopy interface. Posix pread semantics: start at this offset, read this number of bytes.
- Larry: a new library, or built into the CUDA drivers? Steve: the bulk is built into the header drivers.
- Larry: what if you don’t have the right kind of network hardware? The situation is that we don’t know where people will draw the files from, so should just transparently work in the fallback case.
- Steve: thelibrary needs to separate the header reading from the file reading.
- Larry: Tangential thing, we have control over what compression methods are supported in the library and used out in the world. But if there was different lossless compression, we could add a new method.
- Need two options:
  1. GPU able to handle files that are out there.
  2. We need something that a GPU could process more efficiently.
- Larry: Nobody wants uncompressed files.
- Steve: we’re primarily interested in decompressing. We can decompress fast enough on the GPU that it can saturate. So it’s faster to move the compressed data.
- Rod: we’re in the process of thinking about changes in the library. We should consider them now. We should find a way to incorporate them.
- Steve: the offset table doesn’t really have enough information. If you only have it on the host, you have to make some assumptions. It would be nice if the offset table has the offset and the size.
- Surprise from reading the spec: Some tiles can be uncompressed. If some of the tiles are larger compressed, it stores the original data.
- Eric: backwards compatible metadata, could start decompression at an offset.
- Kimball: Should look at encoding some other information about compression types in offset tables.
- Rod: The header requires parsing to find the data you want. The trick at Epic is we’re doing that on the header. And just streaming the pixel data to the GPU. Have to go looking for the pixel data. Should be easy to skip to the pixel data quickl, skip over attributes you don’t need.
- Steve: Some numbers: Multiple GB of data/second. One tile per microsecond. You have one millisecond to pull a thousand tiles out of a file.
- Rod: getting a sense for what size image, and what frames per second. Zip compression is 2:1. 8K latlong, 4K wide.
- Kimball: the IO core in the writer is built around pread. So that should be ready to go, in terms of hooking it into the CUDA IO thing.
- Two sides to the work:
  1. general exr library should be better about it
  2. another addon that knows about the GPU. The OpenEXR library should not have GPUisms in it. Has to go somewhere else. Ideally the library supports that “somwhere else”
- Kimball: we have an example at Weta, a custom stream.
- Steve: 200 RGB data, 8K image, 2:1 compression, then you could read about 200 per second.
- Rod: At Epic, our experience is getting less than 20.
- Steve: PCGen4 gives another factor of 4.
- Should be able to get 25 Gb/second out of them. We still hae room on the GPU to do other things that just read OpenEXR files. Read multiple streams simultaneously. Most of the bottlenecks on are on the Windows side.
- Rod: can read only int data, but OpenEXR is shorts. Have to swizzle the data. We have to reinterpret at as fp16. Should be able to swizzle two shorts simultaneously.
- Kimball: I hadn’t considered changes to the chunk table. Would make the CPU side faster. Would sacrifice some of the corrupt file data processing.
- Peter: Each chunk has the same information in the header. Change the chunk table so the chunk headers are duplicate.
- Peter: A compression format could put padding in it, so the data is aligned and easy to skip over.
- Kimball: I started work on core because of texture access. Don’t know that the sparse texture architecture would be. Allow alterate ordering in an EXR file?
- Larry: we depend on the fact that it’s tiled as well as mipmapped.
- Steve: a huge image, I want to chop out a portion of it. Seek time is not a big deal any more.
- Peter: you can store tiles in any order.
- Steve: Small optimization, MIP-fail, tiles what are 1x1 or 4x4, storing those continuously.
- Larry: But once you’re using solid state storage, there’s no such thing as seek time.
- Steve: technically there is overhead, but it’s small. But with pread, you have only one system call.
Action items?
- Kimball: I didn’t finish cleaning up IlmBase out of the RC3 library. As soon as that’s done, I’ll push up my C library.