Friday 26 October 2018

linux - Modify a large file, then be able to rollback changes doing it (almost) inplace


I'm recovering data from a damaged 500GB disk drive. I'm copying data (ext4 partition) to a 500GB image file. The process is taking about 3 months of copying in total (yes, months). So I'm using dd for patiently filling the image file. First I dd a chunk to a temp file, then put the chunk into the backup file and so...


The problem is that I want to access the partially filled image and recover some data before the backup process ends. I've mounted it read-only and used photorec and testdisk and it's ok. But I also want to try fsck to (try to) repair the partition. After peeking data I would like to rollback fsck changes and resume the copying.


I know tools like rsync, rdiff and git derivates (bup, git-annex...) that could help. but I wonder if there is a way to make this in-place. Not taking another 500Gb of indexed original data.


I don't want versioning capabilities. I don't want a backup of my file. The workflow would be something like:



  1. I have original_500GB_file.img -> 500GB of data

  2. I modify 2GB of the file. Say now I have modified_500GB_file.img and other auxiliary files -> less than 600GB of data (500 original + 2 modified + some metadata)

  3. When I'm happy making changes, rollback and get to point 1 again.


How can achieve this? Would it be possible with BTRFS snapshot capabilities?? (unfortunately I have the file in a NTFS partition)


Thanks.



Answer



I found a good and easy solution for my problem. Slizzered's last paragraph about virtual machines gave me a hint. You can use qemu software without having to actually load a virtual machine. I found the relevant information here and here.


First you have to create a copy on write (COW) file of your image. This is going to use your original_500GB_file.img as its base. The big file won't be edited because its used as read-only. The COW one is minimal in size and will only grow when changes are made. Just what I needed:



$ qemu-img create -f qcow2 -b original_500GB_file.img disposable.qcow2


Formatting 'disposable.qcow2', fmt=qcow2 size=498000000000 backing_file='original_500GB_file.img' encryption=off cluster_size=65536 lazy_refcounts=off


$ ls -l disposable.qcow2


-rw-r--r-- 1 dertalai users 204288 abr 15 20:01 disposable.qcow2



Now you just have to virtualize the original_read-only + cow_writable pair into a single usable block device:



# modprobe nbd


# qemu-nbd -c /dev/nbd0 disposable.qcow2



/dev/nbd0 is ready for use. You can fsck it or even mount it and do whatever you need. When you are done and want to rollback the changes, just free the block device from any process that is using it and delete the COW file if you want:



# qemu-nbd -d /dev/nbd0


# rmmod nbd


$ rm disposable.qcow2



No comments:

Post a Comment

Where does Skype save my contact's avatars in Linux?

I'm using Skype on Linux. Where can I find images cached by skype of my contact's avatars? Answer I wanted to get those Skype avat...