windows 7 - Successful data recovery but most files are corrupted

Sunday, 20 January 2019

windows 7 - Successful data recovery but most files are corrupted

I used Recuva and EASEUS Data Recovery Wizard to see if there's any difference in the files they recover. Both software were able to recover files that I want however most of the files where no longer usable. For example, for images, when you view them it's just black, for .doc files, MS Word can no longer open open them. They appear to be corrupted. This happens on both Recuva and EASEUS.

Is there a way I can fix the recovered files or, is there a better recovery software that doesn't "corrupt" files?

Answer

Sadly no.

What happened was that those files were fragmented, and once they were deleted, the cluster chain was removed, so when the programs "recovered" them, what they did was to look at the starting location (which is still present) and the size of the file (which is also still present) and simply copied that many clusters in a row from the start.

This works fine if the files are stored in a single, contiguous block (i.e., defragmented), but if they were fragmented, then their blocks are spread out around the disk and the program has absolutely no way to know where/which ones to use; that's why most of the corrupted recovered files will have at least one cluster's worth of correct data, but then contain whatever happened to be in the subsequent clusters that used to belong to other files.

If the files are plain-text, then you could search the drive for unused clusters (which is a nightmare with a giant, nearly empty disk) and manually stitch the file back together (I did this a few times many years ago). But with binary files, this is effectively impossible. In fact, even with plain-text files, it is difficult at best if the file had been getting edited and saved after changes numerous times because it then becomes difficult to identify the clusters that contain blocks of the last version of the file.

As you noticed, PhotoRec seems to recover more (at the cost of lost filenames). I'll explain.

The above explanation is how some data-recovery programs work. It is generally more reliable because it looks at real files that existed more recently. However (not surprisingly perhaps), it can miss out on some files. That is why other programs like PhotoRec use a different approach. Instead of looking at a deleted file's information (filename, size, timestamp, starting cluster) in directory entry and then copying the clusters from the disk, they search the whole disk for lost files.

Most file types have a signature (usually at the start of the file, in the header) which contains a sequence of bytes that identify the file as a certain type. Because of this, programs that open a file can determine if the file is teh correct type and other programs can verify the type of a file.

What some data-recovery programs do is to search the disk and check each cluster to see if they contain the signature of various different file types. If a cluster contains a signature, then it copies that cluster (and more depending on various factors) to a file.

This means that it can find some files that are not linked in any directories. That's good, but there are some downsides:

Because it searches the disk directly instead of directory entries, it has no information about the file, so it applies a generic filename, and gives it the current date/time for the timestamp instead of the file's original one

Because it has no information about the file, it does not know how big the file is supposed to be. Some (few?) filetypes indicate the exact size in the header, so most of the files that are recovered will, at best, be rounded up to the nearest cluster while others can end up being ridiculously huge (e.g., a 10x10 GIF file that is 1.7GB!)

Like with the other data-recovery method, it has no way of recovering fragmented files and only copies contiguous (unused) clusters regardless of whether they belong to the file or not (check the files that PhotoRec recovered; plenty will be half-corrupt like the ones that Recuva recovered

Because it is manually scanning the disk, it will "recover" a whole lot more files than programs that use the other method; many of these files are legitimately deleted files that may have been erased a long time ago, and they also come from all over the disk, not just a specific directory. This means a lot more clutter and more files that have to be examined and sorted through. The problem is that

I was in a similar situation to yours last year. I accidentally deleted ~9,000 graphic files from a volume that was nearly full (hence lots of fragmentation) I used a host of recovery programs that gave (sometimes vastly) different results. While I got a lot of files back, not surprisingly, many of them were corrupt and more than a year later, I'm still trying to sort through them and find which ones are bad.

Unfortunately, current file-systems still don't do much to enhance data-recovery, so losing files means a lot of manual work.

It doesn't help after losing files, but for future reference, the best way to increase the chances of a successful recovery is to keep the disk defragmented (have the system automatically defragment when it idles).

Notes

Sunday, 20 January 2019