I would like to compare two text files which have three columns each. One file has 999 rows and another has 757 rows. I want the different 242 rows to be stored in a different file. I created the first file (999 rows) using a random network generator (999 rows are edges with third column being weight between first, second columns – source, destination nodes).
File Format – Files 1, 2
1 3 1 16 36 1
I have tried
Compare two files line by line and generate the difference in another file
find difference between two text files with one item per line and http://www.daniweb.com/software-development/python/threads/124932/610058#post610058
Neither worked for me.
I think it is a problem of string comparison. I would like to compare the numbers in first column and second column. If they both are different, I want to write it to third file.
Any help will be much appreciated!
I am posting the following code that I tried after @MK posted his comment.
f = open("results.txt","w") for line in file("100rwsnMore.txt"): rwsncount += 1 line = line.split() src = line dest = line for row in file("100rwsnDeleted.txt"): row = row.split() s = row d = row if(s != src and d != dest): f.write(str(s)) f.write(' ') f.write(str(d)) f.write('n') f.close()
The best general-purpose option if you’re on a *nix system is just to use:
sort filea fileb | uniq -u
But if you need to use Python:
Your code reopens the inner file in every iteration of the outer file. Open it outside the loop.
Using a nested loop is less efficient than looping over the first storing the found values, and then comparing the second to those values.
def build_set(filename): # A set stores a collection of unique items. Both adding items and searching for them # are quick, so it's perfect for this application. found = set() with open(filename) as f: for line in f: # [:2] gives us the first two elements of the list. # Tuples, unlike lists, cannot be changed, which is a requirement for anything # being stored in a set. found.add(tuple(sorted(line.split()[:2]))) return found set_more = build_set('100rwsnMore.txt') set_del = build_set('100rwsnDeleted.txt') with open('results.txt', 'w') as out_file: # Using with to open files ensures that they are properly closed, even if the code # raises an exception. for res in (set_more - set_del): # The - computes the elements in set_more not in set_del. out_file.write(" ".join(res) + "n")