I have rarely seen many fixity failures in my time working in the Irish Film Institute. I have only ever seen them during some sort of file transfer, and thankfully we spot the issue straight away. It’s actually quite reassuring when an error like this pops up, as it lets you know that your systems and safeguards are working. We had such an incident today in the Irish Film Institute, so I thought I’d write up a few words about how we analysed the error and took advantage of the embedded checksums within FFV1 streams.
^^ This was the glitch produced by the last fixity error I saw during a file transfer. This was an uncompressed TIFF from a 16mm film scan.
My colleague Brian Cash migrated a few Betacam SP tapes during the day, and was performing a backup to a networked Hierarchical Storage Management device (HSM) that would then write two copies to LTO. We use a lot of custom python scripts to facilitate our workflows, and we use a script called copyit.py to copy files.
Copyit.py generates a source MD5 checksum manifest, if one does not already exist. If a manifest sidecar does exist, it skips straight to the file copy stage, using rsync or robocopy, depending on the Operating System. When the files hit the destination directory, a new checksum manifest is generated for the destination files, and this manifest is compared to the source manifest. Any discrepancies are notified to the user and stored in a log. For larger jobs, we have a helper batch script called masscopy.py.
When I came in to work this morning, I was greeted by this image:
So while everything else was a success, the oe6241 transfer was a failure.
I scrolled up (I left that screenshot at work unfortunately!) to see that it was indeed the FFV1/Matroska file that had the checksum error. I was quite surprised by this, as rsync has never given us any issues before. I think the issue was most likely due to some network connectivity issues.
This really was no big deal to rectify. I used our script validate.py to run a fixity check on the source package to ensure that this was still in good health, which it was. Then we just ran another copyit.py job to our HSM and all was well.
However, this was such a rare opportunity to encounter a fixity error that wasn’t forced through testing, so I wanted to use our resources to dig a bit deeper.
The FFV1 version 3 codec has built in crc32 checksums per frame (per slice of a frame to be more specific), so it’s possible to use these embedded checksums to perform a fixity check. I used the command from ffmprovisr.
This was the result:
Decoding the file with ffmpeg gave a very clear sense of where the damage actually was (at 632 and 805 seconds).
We also had some framemd5s stored (checksums for each decoded frame of video), so we were also able to use those in order to get a better sense of the damage. This article by Dave Rice on framemd5 is essential reading for any AV archivist, it had a profound impact on me!
Using this information, I was able to navigate to the exact frame in question in order to see what kinds of glitches were present… and I found nothing! Nothing to the naked eye anyhow. You see FFV1 is also kind enough to provide error correction, so when the file is decoded, it knows where the errors are and it does its best to correct it. This was so successful that I could not see where the errors were, without using specialist tools like QCTools. I got this idea from a blog written by Dinah Handel, more essential reading!
Notice that while the actual image on the left looks perfectly intact, the temporal difference filter on the right is showing a square grey block. This particular QCTools filter shows the difference between successive frames. This is great for spotting error correction, as in this case, the FFV1 decoder figured out that the corrupted area was the third slice in the top row, and so it just repeated the slice from the previous frame. The square block is indicating that there is literally no difference between that portion of the frame and the corresponding portion of the previous frame. The other parts of the image give a sense of the noise and grain present in the tape, which was an analog telecine of a 16mm film. When there is very little movement in an image, this kind of error correction can easily fool the naked eye.
I never got a chance to perform a diff using a Hex Editor, and I haven’t looked at any audio corruption either. I’ll get to that after the Easter holidays 🙂
Figuring all this out didn’t really change the outcome of things. We just ran the job again and we got the backups as intended. It is important to take advantage of moments like this in order to learn more about the file formats in your collection, and to re-assess some of your assumptions. Interestingly, the HSM never picked up that there was a corrupted transfer. It generated two LTO copies and generated a fresh checksum (that reflected the corrupted file, not the source) in its database, so without our fixity check, we’d have never noticed that the file that would be the focus of our preservation activities was corrupted!