I am having problems with using bdiff on a large text files (about 300000
rows).
I am trying to output only the new or changed lines that appear in todays
file (compared to yesterday's file).
Diff only reports that the files are different due to the large file size
(over 70MB each) so I have to use bdiff.
Also Comm -13 yesterdayfile todayfile outputs some rows that exist in both
files for some reason (the content is the same, but located on different
line numbers).
I'm using the following command:
bdiff -a yesterdaysfile todaysfile |grep '^> ' > differencefile
The different records are output correctly, however, each line in the output
file begins with the > symbol and a space before the actual data. The format
of the output is skewed because I have more than one record per line in the
difference file.
There seems to be a problem with the way white spacing is handled by bdiff.
Any suggestions on how to make sure that only one line from the input files
appears on a line in the difference file?
Perhaps there are ways to split the files, use diff, and then rejoin them?
Thanks in advance,
Tom