Monday, 28 January 2019

diff ignoring eol and whitespace


I would like to diff two files, such that end of lines and white space be ignored. Namely, I would like diff to find no difference between d1.txt and d2.txt:


$ cat d1.txt                                                                    
test1

test2

test3

test4
$ cat d2.txt
test1test2test3test4

For some reason,



diff -d -w -a --strip-trailing-cr d1.txt d2.txt



does not do the job. Any help is appreciated.



Answer



diff compares lines, see man diff:


diff - compare files line by line

Ignoring white space means that foo bar will match foobar if on the same line. Since your patterns in d1.txt span multiple lines, the files will always differ. I haven;t actually read the source code but I guess diff works something like:


for each line number X in file1;
line1 = line X from file1
line2= line X from file2
If line1 is equal to line2 the do something
else do something else

The first line of your file1 is not the same as the first line of file2 so a difference is reported. If you really want to check that the files contain the exact same non-whitespace characters, you could try something like this:


diff <(perl -ne 's/\s*//xg; print' d1.txt) <(perl -ne 's/\s*/g; print' d2.txt)

No comments:

Post a Comment

Where does Skype save my contact&#39;s avatars in Linux?

I'm using Skype on Linux. Where can I find images cached by skype of my contact's avatars? Answer I wanted to get those Skype avat...