Currently, most archives who adopted the video codec FFV1 use it as a preservation format for tape digitisation. I plan on documenting my tests regarding the transcoding of RGB DPX and TIFF scans to FFV1 in a Matroska container. I recently presented some preliminary findings with Reto Kromer at the ‘No Time To Wait‘ Symposium in Berlin. The topic has gained momentum as Michael Niedermayer (sponsored by reto.ch) added 16-bit RGB support for FFV1. This post gives an overview of FFV1 and Matroska as a format for the long term preservation of film scans. Many thanks to Ashley Blewer, Peter Bubestinger, Columb Gilna, Reto Kromer, Dave Rice and Erwin Verbruggen for their feedback/advice/corrections.
One quick disclaimer: FFV1 is becoming increasingly associated with the Matroska/MKV container, but there is nothing stopping you from using containers like MOV and AVI with FFV1. This blog is split into five sections:
- Benefits of losslessly compressed DPX and TIFF scans with FFV1 and Matroska
- The status of resolving current limitations
- Some test results
- Preservation versus Mezzanine (FFV1 editing/grading support)
- Aren’t image sequences safer than a single file?
1. Why losslessly compress DPX and TIFF scans?
The IFI Irish Film Archive are not digitising tape or film at a high volume yet, but from October 2016, this will change. We are a small archive with a limited budget and a small amount of staff, so here’s why FFV1 makes sense for us:
Fig.1 Little endian binary output of 16-bit TIFF. Note the highlighted constant 4 zeroes of padding (Thanks to Dave Rice for the cmd: ffmpeg -i 16bitfile -c:v rawvideo -f rawvideo – | xxd -b -c 2).
- Our scanner, the P&S Techniks Steadyframe, generously donated to us recently by the Imperial War Museum, has a 12-bit sensor, with three file format options: 10-bit linear DPX, 10-bit Printing Density/log DPX and 16-bit linear TIFF. In order to get access to the 12-bits of data that the sensor produces, you have to store those 12-bits in a 16-bit TIFF. As such, every terabyte of data will contain 250 gigabytes of padded zeroes, which is a lot of redundant data (see fig.1). It is a bit frustrating knowing that a quarter of each LTO-6 tape will be filled with zeroes, even more so when each tape has two backups! Lossless compression offsets this redundancy, making it more financially viable to preserve the full output of the sensor for the long term. Even if your scans do not have this kind of extra padding, the storage savings may be significant.
- FFV1 version 3 is capable of storing CRC32 checksums for every slice of a frame. You simply need to decode the video with ffmpeg in order to perform a fixity check. DPX or TIFF do not contain any embedded fixity information. A command like `ffmpeg -i input.mkv -f null -` will display any CRC mismatches in the terminal window. As of today: (2016-10-07) – ffmpeg writes CRC checksums for Top Level Elements on a container level as well, so both codec and container contain built in fixity from the point of creation. Fig 2. An example of a fixity check using embedded CRC32 checksums on an intentionally damaged file.
- FFV1 and Matroska are open formats, that are being standardised openly by the CELLAR working group within the IETF. The continued evolution of the codec has largely been driven by the advocacy of archivists who have put preservation-friendly features to the top of the agenda. Anyone can join the CELLAR mailing list and contribute to the discussion on how these formats are documented and how they will evolve.
- In our current workflow that we are testing, we capture 16-bit scans, and these untouched files eventually form an Archival Information Package (AIP). It’s important to us that we can preserve the original scans prior to any intervention. The image and audio are restored by Gavin Martin and Brian Cash, and these files are exported as DPX and form another AIP. Factoring in the two backups, we end up with 6 different 16-bit RGB image sequences per film asset. Losslessly compressing the files allows us to consider such a workflow.
- It is relatively simple to turn your image sequence into a single FFV1 in Matroska file via ffmpeg. We have found that one large single file has less file system overhead than a large sequence of small files. This also results in much quicker fixity checks. Peter Bubestinger has investigated this in a much more thorough way, and you can see a summary of his findings here.
2. The status of resolving current limitations
There have been some limitations in both the encoding and ffmpeg that have led to a low adoption up until now. Some of these have recently been resolved, or will be soon.
The limitations and their current status are:
- Lack of 16-bit RGB support. Although 16-bit YUV pixel formats were already supported in FFV1 since its very beginning, several 16-bit RGB pixel formats have recently been added (Thanks to the generous sponsorship of Reto Kromer/reto.ch and the actual code of Michael Niedermayer): GBRP16 and RGB48. This has led the IFI Irish Film Archive to pursue the format with much more urgency. Initial tests have been successful. 16-bit TIFF and DPX have been transcoded to ffv1.mkv, and transcoded back again to their source format with matching framemd5 checksums.
- Log/Linear support within FFV1. This is an issue that is documented in several ffmpeg-user threads, as well as the CELLAR mailing list. FFV1 does not hold these color primary/transfer characteristic values. There has been some discussion on the CELLAR list about this, and if this is meaningful to you, please post to the thread. Matroska appears to have some support for log/linear colour values, but not every DPX value is represented, such as Printing Density. From my real world testing, this is actually less of an issue than it initially appears, but it’s still an issue. The actual image data is not affected, but inaccurate colour metadata can result in incorrect rendering by a decoder. A possible workaround could be that the correct values are stored elsewhere, and these are specified when decoding back to DPX, which leads to point 3.
- Log/linear in the ffmpeg DPX encoder – When transcoding your ffv1.mkv files back to DPX, ffmpeg will always write Linear values. This is due to this value being hardcoded in the source code. Progress is being made on this issue, but we have a workaround that involves custom versions of ffmpeg that write log/linear/printing density values.
- Bayer/Raw support. Currently FFv1 does not support Bayer formats. This feature could be added if development is supported.
3. Some Test Results
I have transcoded some RGB film scans to FFV1/ Matroska. This has mostly been to get a sense of compression ratios, encoding times, and losslessness. I can share some data, but I’d like to provide some context.
The sequences were a mixture of 12-bit DPX (2048×1556) and 16-bit TIFF (2350×1800). The compression ratios have generally been an average of 2.3:1 for the TIFF scans that our scanner produces. However, I encoded some black and white 10-bit DPX to FFV1 and saw ratios closer to 3:1. It really is worth testing with your own collections as your mileage may vary with regards to compression ratios. For example, some scans that were desaturated completely ended up with a 7:1 compression ratio.
I wrote a python script that automates the process of encoding a sequence to a single FFv1/MKV file, while also performing lossless verification via framemd5 checks. The script is here, and it generates a CSV on your desktop with various benchmark headings. The test results are here: https://gist.github.com/kieranjol/8846fbef6fee82c0c3a7a106481e44bf
MacPro6,1 — Processor Name: 8-Core Intel Xeon E5 — Processor Speed: 3 GHz
Number of Processors: 1 — Total Number of Cores: 8
L2 Cache (per Core): 256 KB — L3 Cache: 25 MB — Memory: 32 GB
Storage – Thunderbolt attached RAID-5.
4. Preservation versus Mezzanine (FFV1 editing/grading support)
If you’re wondering if Avid/FCP/Premiere/Resolve reads and writes FFv1.mkv, then the answer is no. There is a possibility for Adobe Premiere possibly getting a plugin if enough people can financially support the development of such a tool. Lack of NLE support might be a deal-breaker for some institutions, but it has never really been an issue for us. There are, however, cross platform NLE tools such as Shotcut that natively support FFV1 and Matroska.
We classify FFV1 and Matroska as a preservation file format. They contain features that make them very favourable to long term preservation, and we will never have to worry about having to buy into any proprietary software or hardware in order to access the media. We never really plan on accessing these files directly, as we make derivative, mezzanine copies at the point of archiving that can be used for access purposes. If we really need to return the preservation FFV1/Matroska file to a production workflow, then we can very easily reconstruct a DPX or TIFF sequence with the same RGB image data as what was originally scanned. That’s the beauty of losslessness.
5. Aren’t image sequences and uncompressed files safer than single files?
I’ve heard this said quite a few times: Image sequences are safer, because if some files get damaged, the rest of the image sequence is untouched. A similar argument is used for uncompressed data: that the visual impact of the potential damage is not as severe with uncompressed files.
Fig.3 Glitch in a 10-bit DPX file that occurred during file movement.
In both of these scenarios, the damage caused is unacceptable to an archive that intends to preserve these assets. The solution to both scenarios is that the damaged file is replaced by one of the backups. It will take a little less time to do so in the case of an image sequence as one need only replace the selected files, but this will hopefully happen so rarely in a repository that the time savings are negligible. Try to have as many copies as possible! Yes this is expensive, but lossless compression makes this cheaper!
FFV1 for film scans is gaining momentum and the inhibiting issues are actively being resolved. It would be great if more people were interested in testing and posting their findings.
If you wish to discuss further, the CELLAR mailing list is a great place to discuss FFV1 and Matroska. Feel free to get in touch with me via the contact info in the ‘about’ page of this blog.