Learning through fixity errors!

I have rarely seen many fixity failures in my time working in the Irish Film Institute. I have only ever seen them during some sort of file transfer, and thankfully we spot the issue straight away. It’s actually quite reassuring when an error like this pops up, as it lets you know that your systems and safeguards are working. We had such an incident today in the Irish Film Institute, so I thought I’d write up a few words about how we analysed the error and took advantage of the embedded checksums within FFV1 streams.

^^ This was the glitch produced by the last fixity error I saw during a file transfer. This was an uncompressed TIFF from a 16mm film scan.

My colleague Brian Cash migrated a few Betacam SP tapes during the day, and was performing a backup to a networked Hierarchical Storage Management device (HSM) that would then write two copies to LTO. We use a lot of custom python scripts to facilitate our workflows, and we use a script called copyit.py to copy files.

Copyit.py generates a source MD5 checksum manifest, if one does not already exist. If a manifest sidecar does exist, it skips straight to the file copy stage, using rsync or robocopy, depending on the Operating System. When the files hit the destination directory, a new checksum manifest is generated for the destination files, and this manifest is compared to the source manifest. Any discrepancies are notified to the user and stored in a log. For larger jobs, we have a helper batch script called masscopy.py.

When I came in to work this morning, I was greeted by this image:

So while everything else was a success, the oe6241 transfer was a failure.

I scrolled up (I left that screenshot at work unfortunately!) to see that it was indeed the FFV1/Matroska file that had the checksum error. I was quite surprised by this, as rsync has never given us any issues before. I think the issue was most likely due to some network connectivity issues.


This really was no big deal to rectify. I used our script validate.py to run a fixity check on the source package to ensure that this was still in good health, which it was. Then we just ran another copyit.py job to our HSM and all was well.

However, this was such a rare opportunity to encounter a fixity error that wasn’t forced through testing, so I wanted to use our resources to dig a bit deeper.

The FFV1 version 3 codec has built in crc32 checksums per frame (per slice of a frame to be more specific), so it’s possible to use these embedded checksums to perform a fixity check. I used the command from ffmprovisr.

This was the result:

Decoding the file with ffmpeg gave a very clear sense of where the damage actually was (at 632 and 805 seconds).

We also had some framemd5s stored (checksums for each decoded frame of video), so we were also able to use those in order to get a better sense of the damage. This article by Dave Rice on framemd5 is essential reading for any AV archivist, it had a profound impact on me!

Using this information, I was able to navigate to the exact frame in question in order to see what kinds of glitches were present… and I found nothing! Nothing to the naked eye anyhow. You see FFV1 is also kind enough to provide error correction, so when the file is decoded, it knows where the errors are and it does its best to correct it. This was so successful that I could not see where the errors were, without using specialist tools like QCTools. I got this idea from a blog written by Dinah Handel, more essential reading!

Notice that while the actual image on the left looks perfectly intact, the temporal difference filter on the right is showing a square grey block. This particular QCTools filter shows the difference between successive frames. This is great for spotting error correction, as in this case, the FFV1 decoder figured out that the corrupted area was the third slice in the top row, and so it just repeated the slice from the previous frame. The square block is indicating that there is literally no difference between that portion of the frame and the corresponding portion of the previous frame. The other parts of the image give a sense of the noise and grain present in the tape, which was an analog telecine of a 16mm film. When there is very little movement in an image, this kind of error correction can easily fool the naked eye.

I never got a chance to perform a diff using a Hex Editor, and I haven’t looked at any audio corruption either. I’ll get to that after the Easter holidays 🙂

Figuring all this out didn’t really change the outcome of things. We just ran the job again and we got the backups as intended. It is important to take advantage of moments like this in order to learn more about the file formats in your collection, and to re-assess some of your assumptions. Interestingly, the HSM never picked up that there was a corrupted transfer. It generated two LTO copies and generated a fresh checksum (that reflected the corrupted file, not the source) in its database, so without our fixity check, we’d have never noticed that the file that would be the focus of our preservation activities was corrupted!


Quick notes on No Time To Wait 2!

** Written in Vienna International Airport while waiting for my flight back to Dublin **

I thought i’d jot down some notes about the recent No Time To Wait conference that just finished yesterday in Vienna. It was a two day event loosely based around open source software, AV preservation and FFV1/Matroska. The scope was quite broad, and it included JPEG2000/MXF as part of Kate Murray’s update on AS-07.

You can catch the recorded livestreams here, the schedule and more information is over here.

All of the talks are available online, so I won’t go through each one, but I thought I’d jot down some of the things that resonated with me. I don’t have any photos so it’s a barrage of text, unfortunately. So, in no particular order (sorry if I left anyone out,this is more of a quick brainstorm than anything exhaustive):


  • The importance of financially supporting open source projects. Several participants, particularly Jerome Martinez and Peter Bubestinger made the point that open source software does not mean Freeware. Software requires a considerable level of skill and time commitment, and archives must rethink the way that we fund software and services. We seem to be a lot more comfortable buying licenses for proprietary software,rather than funding a much more sustainable open source project, or buying service contracts for free software.
  • RAWcooked – DPX to FFV1 with metadata!! – It looks like Jerome has a strategy for the lossless process of migrating DPX to FFV1/MKV and back to DPX again. He requires funding, and his strategy should hopefully attract attention as it is the kind of tool that could ultimately save an archive a lot of money. https://avpres.net/RAWcooked/
  • Open source support. The misconception of ‘Open source software means you have no support’ was mentioned a few times. While this might be true for some projects, there were countless examples of open source tools that also offer professional service contracts as part of their business model.
  • Agathe Jarczyk of University of the Arts Bern presented on her wish for ‘the ideal video player’. This might have been my favourite presentation. I really appreciated all the real world tests and the sharing of knowledge. She highlighted the inconsistencies in all the video players that she tried, and explained why Quicktime Player 7 still has a lot of features that are particularly useful for her workflows. It was wonderful to see other archivists in the audience add to the wish list, followed by Steve Lhomme of VLC saying that a lot of these features are in fact going to be part of the upcoming VLC 3.0! For any other missing features, he encouraged Agathe to raise tickets on VLC’s issue tracker, as they look at absolutely everything.
  • Specifications. The run of presentations from Jimi Jones, Ashley Blewer, Steve Lhomme and Kate Murray were all fascinating in different ways. Ashley’s presentation made me feel a little less intimidated about trying to contribute to the actual Matroska and FFV1 specifications. I apologise yet again for messing up the timing of Jimi Jones presentation, he was totally cool about it but it’s my biggest regret!!
  • Carl Eugen Hoyos. A lot of us were particularly excited that we would have a  somewhat unexpected guest: Carl Eugen Hoyos from FFmpeg. One of the goals of No Time To Wait was to get archivists,developers and specification writers together, so it was important to have an FFmpeg representative present. FFmpeg is very frequently used by moving image archivists, and those of us who request help on their mailing list or track FFmpeg development are very familiar with Carl. Pretty much no one aside from Peter Bubestinger had ever actually met him, or even knew what he looked like. He is a frequent contributor to the project, and it’s a running joke among archivists about his insistence on particular posting styles. Anyhow, Carl was one of the stars of the symposium, everyone loved him pretty much. It seemed like he really enjoyed himself as well. He gave a great presentation on FFmpeg, attended a brilliant panel hosted by Alessandra Luciano, and raised many points and questions throughout the conference. And he gave a few of us a historical tour of Vienna.
  • Alessandra/open source strategies.Speaking of which, Alessandra Luciano chaired that panel on attitudes to open source and strategies for bringing open source into your institution. Alessandra’s employer,CNA Luxembourg were also so kind as to fund the attendance of two participants.
  • Dave Rice and Jerome Martinez. They both put a phenomenal amount of work into every aspect of the conference. I don’t know where they found the energy. As always, they are really friendly, inclusive, approachable,and absolutely essential to the field of digital moving image preservation. And they gave excellent presentations. It was great to meet Guillaume Roques and his wife Marie-Laure as well. I’d seen Guillaume pop up a lot on the MediaArea github pages, so yet again, it was nice put a face to a name.
  • Michael Loebenstein & Austrian Film Museum: Michael Loebenstein was incredibly kind to host us in the beautiful Austrian Film Museum. All of the staff there were so friendly and helpful with everything from start to finish. It’s an excellent event space, everything went off without a hitch!
  • Volunteering. I was one of the members of the organising committee, but to be honest,I really did very little in the lead-up. I did put myself forward for a lot of volunteer work during the conference itself, including being the MC/stage co-ordinator for the opening morning, as well as helping with the live stream on the last day. It was all great to do and I’d encourage everyone to get involved in some small way at any future conferences. You get so much experience, meet way more people, and it really gives you a greater appreciation for those who run these events.
  • Reto Kromer made YCoCg a little more understandable for me, especially in terms of why the simpler transformation to RGB resulted in speed increases compared to YCbCr, as well as the color space allowing for some improved digital restoration workflows for some use cases. Listening to his attitude to research,development and implementation is always inspiring. Reto is wonderful! I must rewatch his presentation though and email him about a million things.
  • Ffmprovisr – Maybe I have one more regret – I wish that Ashley, Reto or I had done a quick lightning talk/demo on ffmprovisr, considering that FFmpeg was mentioned every thirty minutes or so. Also it was nice to have three of the four maintainers in the same place, we all missed Katherine Frances Nagels though!
  • NYPL. I was delighted that Ben Turkus and Genevieve Havemeyer-King from NYPL were there to present on their transition to FFV1/Matroska from uncompressed video. It was quite inspiring and it’s always interesting to see a much larger institution, working on a vastly bigger scale, still have a lot of similarities to my own institution.
  • NYU- It was great that Ethan Gates was able to come. It was lovely to finally meet him, and his talk on advocacy,education and open source was a perfect way to kick off the event.
  • Jonas Svatos/Czech Film Archive – I really respect his work,and thought he chaired a fine panel on film preservation, and I loved that he carried on the previous conversation on databases. It was great to hear honest accounts of database woes from the panel. I think most people in the room could relate. It was great to meet Fumiko from the Austrian Film Archive, who stepped in as a late replacement on the panel. She had some very interesting takes on film scanning,and on the Blackmagic Cintel in particular.
  • FIAF – Speaking of which – Jonas was hyping the next FIAF congress that they are hosting in the Czech Film Archive. It is on the theme of sharing,and he encouraged the attendees to enter proposals. I should probably get going on that, hopefully in collaboration with another archive?
  • Validation – I must follow up with Merle Friedrichsen from the German National Library of Science and Technology, as their research into using open source tools to validate deposits at the point of submission is something that we are interested in too..
  • Normalisation- Peter Bubestinger gave a wonderful talk on normalisation strategies – codec only, container only, and both codec and container. I need to rewatch his presentation as I need to think over this topic a lot more,especially the concept of developing a ‘whitelist’ of formats that will not be normalised.
  • EN 15907 – I was really interested in Peter Bubestinger and Christian Widerstrom’s talk on their upcoming database that uses the EN 15907 cataloguing standard.The IFI is investigating this standard at the moment so we will follow this project closely..
  • RTV Slovenia. It was wonderful to hear from Bojan Kosi about the wonderful work of RTV Slovenia and their mass digitisation projects. I really respected their approach and how open they were with their workflows, decision making and execution. I wasn’t familiar with their work, but I think they are doing excellent work and and I think we can learn a lot from them.
  • Livestreaming: Seeing the setup for live OBS streaming was such great experience. It was great to see how the issues were resolved, such as having too long of a HDMI cable for 1080p, which then had a knock-on effect on the presenter’s laptop resolutions. Dave Rice and Jerome Martinez provided most of the hardware, and everything seemed to work really well anyhow. Thankfully, Lukas Oberbichler was there, and he really stepped up and ensured that the live stream ran successfully from start to finish. I don’t think he left the livestreaming setup the entire time. If you enjoyed the stream from home, then Lukas, as well as the other Volunteer Livestream Assistants are to thank.
  • Film scanners. I was interested in the test patterns that Dirk Hildebrandt (Wavelet Beam) and Adrian Bull (Cinelab London) presented. The idea is that you scan their test reel with your film scanner in order to discover the performance and potential flaws of the sensor. We will definitely have to look into this in the Irish Film Institute.
  • Irish Film Institute. I spoke on our experience in the IFI with Matroska/FFV1/Mediaconch and other tools. Check it out on the live stream!
  • Vienna – It is beautiful, particularly at night. Every bar or restaurant seems to find some way to be interesting. Vienna seems super safe and calm as well. I never felt unsafe walking around by myself at night.

That’s roughly it! It was great, and you really should check out the videos, you’ll learn a lot!

Introduction to FFV1 and Matroska for Film Scans

Currently, most archives who adopted the video codec FFV1 use it as a preservation format for tape digitisation. I plan on documenting my tests regarding the transcoding of RGB DPX and TIFF scans to FFV1 in a Matroska container. I recently presented some preliminary findings with Reto Kromer at the ‘No Time To Wait‘ Symposium in Berlin. The topic has gained momentum as Michael Niedermayer (sponsored by reto.ch) added 16-bit RGB support for FFV1. This post gives an overview of FFV1 and Matroska as a format for the long term preservation of film scans. Many thanks to Ashley Blewer, Peter Bubestinger, Columb Gilna, Reto Kromer, Dave Rice and Erwin Verbruggen for their feedback/advice/corrections.

One quick disclaimer: FFV1 is becoming increasingly associated with the Matroska/MKV container, but there is nothing stopping you from using containers like MOV and AVI with FFV1. This blog is split into five sections:

  1. Benefits of losslessly compressed DPX and TIFF scans with FFV1 and Matroska
  2. The status of resolving current limitations
  3. Some test results
  4. Preservation versus Mezzanine (FFV1 editing/grading support)
  5. Aren’t image sequences safer than a single file?


1. Why losslessly compress DPX and TIFF scans?

The IFI Irish Film Archive are not digitising tape or film at a high volume yet, but from October 2016, this will change. We are a small archive with a limited budget and a small amount of staff, so here’s why FFV1 makes sense for us:

Screen Shot 2016-09-06 at 09.39.55

Fig.1 Little endian binary output of 16-bit TIFF. Note the highlighted constant 4 zeroes of padding (Thanks to Dave Rice for the cmd:  ffmpeg -i 16bitfile -c:v rawvideo -f rawvideo – | xxd -b -c 2).

  1. Our scanner, the P&S Techniks Steadyframe, generously donated  to us recently by the Imperial War Museum, has a 12-bit sensor, with three file format options: 10-bit linear DPX, 10-bit Printing Density/log DPX and 16-bit linear TIFF.  In order to get access to the 12-bits of data that the sensor produces, you have to store those 12-bits in a 16-bit TIFF. As such, every terabyte of data will contain 250 gigabytes of padded zeroes, which is a lot of redundant data (see fig.1). It is a bit frustrating knowing that a quarter of each LTO-6 tape will be filled with zeroes, even more so when each tape has two backups! Lossless compression offsets this redundancy, making it more financially viable to preserve the full output of the sensor for the long term. Even if your scans do not have this kind of extra padding, the storage savings may be significant.
  2. FFV1 version 3 is capable of storing CRC32 checksums for every slice of a frame. You simply need to decode the video with ffmpeg in order to perform a fixity check. DPX or TIFF do not contain any embedded fixity information. A command like `ffmpeg -i input.mkv -f null -`  will display any CRC mismatches in the terminal window. As of today: (2016-10-07) – ffmpeg writes CRC checksums for Top Level Elements on a container level as well, so both codec and container contain built in fixity from the point of creation.                                      Screen Shot 2016-09-06 at 10.00.25          Fig 2. An example of a fixity check using embedded CRC32 checksums on an intentionally damaged file.
  3. FFV1 and Matroska are open formats, that are being standardised openly by the CELLAR working group within the IETF. The continued evolution of the codec has largely been driven by the advocacy of archivists who have put preservation-friendly features to the top of the agenda. Anyone can join the CELLAR mailing list and contribute to the discussion on how these formats are documented and how they will evolve.
  4. In our current workflow that we are testing, we capture 16-bit scans, and these untouched files eventually form an Archival Information Package (AIP). It’s important to us that we can preserve the original scans prior to any intervention. The image and audio are restored by Gavin Martin and Brian Cash, and these files are exported as DPX and form another AIP.  Factoring in the two backups, we end up with 6 different 16-bit RGB image sequences per film asset. Losslessly compressing the files allows us to consider such a workflow.
  5. It is relatively simple to turn your image sequence into a single FFV1 in Matroska file via ffmpeg. We have found that one large single file has less file system overhead than a large sequence of small files. This also results in much quicker fixity checks. Peter Bubestinger has investigated this in a much more thorough way, and you can see a summary of his findings here.

2. The status of resolving current limitations

There have been some limitations in both the encoding and ffmpeg that have led to a low adoption up until now. Some of these have recently been resolved, or will be soon.

The limitations and their current status are:

  1. Lack of 16-bit RGB support. Although 16-bit YUV  pixel formats were already supported in FFV1 since its very beginning, several 16-bit RGB pixel formats have recently been added (Thanks to the generous sponsorship of Reto Kromer/reto.ch and the actual code of Michael Niedermayer): GBRP16 and RGB48. This has led the IFI Irish Film Archive to pursue the format with much more urgency. Initial tests have been successful. 16-bit TIFF and DPX have been transcoded to ffv1.mkv, and transcoded back again to their source format with matching framemd5 checksums.
  2. Log/Linear support within FFV1. This is an issue that is documented in several ffmpeg-user threads, as well as the CELLAR mailing list.  FFV1 does not hold these color primary/transfer characteristic values. There has been some discussion on the CELLAR list about this, and if this is meaningful to you, please post to the thread. Matroska appears to have some support for log/linear colour values, but not every DPX value is represented, such as Printing Density. From my real world testing, this is actually less of an issue than it initially appears, but it’s still an issue. The actual image data is not affected, but inaccurate colour metadata can result in incorrect rendering by a decoder. A possible workaround could be that the correct values are stored elsewhere, and these are specified when decoding back to DPX, which leads to point 3.
  3. Log/linear in the ffmpeg DPX encoder – When transcoding your ffv1.mkv files back to DPX, ffmpeg will always write Linear values. This is due to this value being hardcoded in the source code. Progress is being made on this issue, but we have a workaround that involves custom versions of ffmpeg that write log/linear/printing density values.
  4. Bayer/Raw support. Currently FFv1 does not support Bayer formats. This feature could be added if development is supported. 

3. Some Test Results

I have transcoded some RGB film scans to FFV1/ Matroska. This has mostly been to get a sense of compression ratios, encoding times, and losslessness. I can share some data, but I’d like to provide some context.

The sequences were a mixture of 12-bit DPX (2048×1556) and 16-bit TIFF (2350×1800). The compression ratios have generally been an average of 2.3:1 for the TIFF scans that our scanner produces. However, I encoded some black and white 10-bit DPX to FFV1 and saw ratios closer to 3:1. It really is worth testing with your own collections as your mileage may vary with regards to compression ratios. For example, some scans that were desaturated completely ended up with a 7:1 compression ratio.

I wrote a python script that automates the process of encoding a sequence to a single FFv1/MKV file, while also performing lossless verification via framemd5 checks. The script is here, and it generates a CSV on your desktop with various benchmark headings. The test results are here: https://gist.github.com/kieranjol/8846fbef6fee82c0c3a7a106481e44bf 

System specs: 

MacPro6,1 — Processor Name: 8-Core Intel Xeon E5 — Processor Speed: 3 GHz

Number of Processors: 1 — Total Number of Cores: 8

L2 Cache (per Core): 256 KB — L3 Cache: 25 MB — Memory: 32 GB

Storage – Thunderbolt attached RAID-5.


4. Preservation versus Mezzanine (FFV1 editing/grading support)

If you’re wondering if Avid/FCP/Premiere/Resolve reads and writes FFv1.mkv, then the answer is no. There is a possibility for Adobe Premiere possibly getting a plugin if enough people can financially support the development of such a tool. Lack of NLE support might be a deal-breaker for some institutions, but it has never really been an issue for us. There are, however, cross platform NLE tools such as Shotcut that natively support FFV1 and Matroska.

We classify FFV1 and Matroska as a preservation file format. They contain features that make them very favourable to long term preservation, and we will never have to worry about having to buy into any proprietary software or hardware in order to access the media. We never really plan on accessing these files directly, as we make derivative, mezzanine copies at the point of archiving that can be used for access purposes. If we really need to return the preservation FFV1/Matroska file to a production workflow, then we can very easily reconstruct a DPX or TIFF sequence with the same RGB image data as what was originally scanned. That’s the beauty of losslessness.

5. Aren’t image sequences and uncompressed files safer than single files?

I’ve heard this said quite a few times: Image sequences are safer, because if some files get damaged, the rest of the image sequence is untouched. A similar argument is used for uncompressed data: that the visual impact of the potential damage is not as severe with uncompressed files.


Fig.3 Glitch in a 10-bit DPX file that occurred during file movement.

In both of these scenarios, the damage caused is unacceptable to an archive that intends to preserve these assets. The solution to both scenarios is that the damaged file is replaced by one of the backups. It will take a little less time to do so in the case of an image sequence as one need only replace the selected files, but this will hopefully happen so rarely in a repository that the time savings are negligible. Try to have as many copies as possible! Yes this is expensive, but lossless compression makes this cheaper!


FFV1 for film scans is gaining momentum and the inhibiting issues are actively being resolved. It would be great if more people were interested in testing and posting their findings.

If you wish to discuss further, the CELLAR mailing list is a great place to discuss FFV1 and Matroska. Feel free to get in touch with me via the contact info in the ‘about’ page of this blog.