How to compare two different files line by line and write the difference in third file?

Posted on

Question :

How to compare two different files line by line and write the difference in third file?

I would like to compare two text files which have three columns each. One file has 999 rows and another has 757 rows. I want the different 242 rows to be stored in a different file. I created the first file (999 rows) using a random network generator (999 rows are edges with third column being weight between first, second columns – source, destination nodes).

File Format – Files 1, 2

1 3 1
16 36 1

I have tried

Compare two files line by line and generate the difference in another file
find difference between two text files with one item per line and

Neither worked for me.

I think it is a problem of string comparison. I would like to compare the numbers in first column and second column. If they both are different, I want to write it to third file.

Any help will be much appreciated!


I am posting the following code that I tried after @MK posted his comment.

f = open("results.txt","w")

for line in file("100rwsnMore.txt"):
    rwsncount += 1
    line = line.split()
    src = line[0]
    dest = line[1]
    for row in file("100rwsnDeleted.txt"):
        row = row.split()
        s = row[0]
        d = row[1]
        if(s != src and d != dest):
             f.write(' ')

Asked By: learner


Answer #1:

The best general-purpose option if you’re on a *nix system is just to use:

sort filea fileb | uniq -u

But if you need to use Python:

Your code reopens the inner file in every iteration of the outer file. Open it outside the loop.

Using a nested loop is less efficient than looping over the first storing the found values, and then comparing the second to those values.

def build_set(filename):
    # A set stores a collection of unique items.  Both adding items and searching for them
    # are quick, so it's perfect for this application.
    found = set()

    with open(filename) as f:
        for line in f:
            # [:2] gives us the first two elements of the list.
            # Tuples, unlike lists, cannot be changed, which is a requirement for anything
            # being stored in a set.

    return found

set_more = build_set('100rwsnMore.txt')
set_del = build_set('100rwsnDeleted.txt')

with open('results.txt', 'w') as out_file:
   # Using with to open files ensures that they are properly closed, even if the code
   # raises an exception.

   for res in (set_more - set_del):
      # The - computes the elements in set_more not in set_del.

      out_file.write(" ".join(res) + "n")      
Answered By: Zack Bloom

Leave a Reply

Your email address will not be published. Required fields are marked *