Compare two different files line by line in python

Posted on

Question :

Compare two different files line by line in python

I have two different files and I want to compare theirs contents line by line, and write their common contents in a different file. Note that both of them contain some blank spaces.
Here is my pseudo code:

file1 = open('some_file_1.txt', 'r')
file2 = open('some_file_2.txt', 'r')
FO = open('some_output_file.txt', 'w')

for line1 in file1:
    for line2 in file2:
        if line1 == line2:
            FO.write("%sn" %(line1))


However, by doing this, I got lots of blank spaces in my FO file. Seems like common blank spaces are also written. I want to write only the text part. Can somebody please help me.

For example: my first file (file1) contains data:

Hostname = TUVALU

TS_Ball_Update_Threshold = 0.2

TS_Player_Search_Radius = 4

Ball_Template_Update = 0

while second file (file2) contains data:

Pole_ID      = 2
Width        = 1280
Height       = 1024
Color_Mode   = 0
Sensor_Scale = 1

Tracking_ROI_Size = 4
Ball_Template_Update = 0

If you notice, last two lines of each files are the same, hence, I want to write this file in my FO file. But, the problem with my approach is that, it writes the common blank space also. Should I use regex for this problem? I do not have experience with regex.

Asked By: Sanchit


Answer #1:

This solution reads both files in one pass, excludes blank lines, and prints common lines regardless of their position in the file:

with open('some_file_1.txt', 'r') as file1:
    with open('some_file_2.txt', 'r') as file2:
        same = set(file1).intersection(file2)


with open('some_output_file.txt', 'w') as file_out:
    for line in same:
Answered By: Sanchit

Answer #2:

Yet another example…

from __future__ import print_function #Only for Python2

with open('file1.txt') as f1, open('file2.txt') as f2, open('outfile.txt', 'w') as outfile:
    for line1, line2 in zip(f1, f2):
        if line1 == line2:
            print(line1, end='', file=outfile)

And if you want to eliminate common blank lines, just change the if statement to:

if line1.strip() and line1 == line2:

.strip() removes all leading and trailing whitespace, so if that’s all that’s on a line, it will become an empty string "", which is considered false.

Answered By: Rob?

Answer #3:

If you are specifically looking for getting the difference between two files, then this might help:

with open('first_file', 'r') as file1:
    with open('second_file', 'r') as file2:
        difference = set(file1).difference(file2)


with open('diff.txt', 'w') as file_out:
    for line in difference:
Answered By: Wayne Werner

Answer #4:

If order is preserved between files you might also prefer difflib. Although Rob?’s result is the bona-fide standard for intersections you might actually be looking for a rough diff-like:

from difflib import Differ

with open('cfg1.txt') as f1, open('cfg2.txt') as f2:
    differ = Differ()

    for line in, f2.readlines()):
        if line.startswith(" "):
            print(line[2:], end="")

That said, this has a different behaviour to what you asked for (order is important) even though in this instance the same output is produced.

Answered By: itzmeesuvm

Answer #5:

Once the file object is iterated, it is exausted.

>>> f = open('1.txt', 'w')
>>> f.write('1n2n3n')
>>> f.close()
>>> f = open('1.txt', 'r')
>>> for line in f: print line



# exausted, another iteration does not produce anything.
>>> for line in f: print line

Use (or close/open the file) to rewind the file:

>>> for line in f: print line


Answered By: Veedrac

Answer #6:

Try this:

from __future__ import with_statement

filename1 = "G:\test1.TXT"
filename2 = "G:\test2.TXT"

with open(filename1) as f1:
   with open(filename2) as f2:
      file1list =
      file2list =
      list1length = len(file1list)
      list2length = len(file2list)
      if list1length == list2length:
          for index in range(len(file1list)):
              if file1list[index] == file2list[index]:
                  print file1list[index] + "==" + file2list[index]
                  print file1list[index] + "!=" + file2list[index]+" Not-Equel"
          print "difference inthe size of the file and number of lines"
Answered By: falsetru

Answer #7:

I have just been faced with the same challenge, but I thought “Why programming this in Python if you can solve it with a simple “grep”?, which led to the following Python code:

import subprocess
from subprocess import PIPE

  output1, errors1 = subprocess.Popen(["c:\cygwin\bin\grep", "-Fvf" ,"c:\file1.txt", "c:\file2.txt"], shell=True, stdout=PIPE, stderr=PIPE).communicate();
  output2, errors2 = subprocess.Popen(["c:\cygwin\bin\grep", "-Fvf" ,"c:\file2.txt", "c:\file1.txt"], shell=True, stdout=PIPE, stderr=PIPE).communicate();
  if (len(output1) + len(output2) + len(errors1) + len(errors2) > 0):
    print ("Compare result : There are differences:");
    if (len(output1) + len(output2) > 0):
      print ("  Output differences : ");
      print (output1);
      print (output2);
    if (len(errors1) + len(errors2) > 0):
      print (" Errors : ");
      print (errors1);
      print (errors2);
    print ("Compare result : Both files are equal");
except Exception as ex:
  print("Compare result : Exception during comparison");

The trick behind this is the following:
grep -Fvf file1.txt file2.txt verifies if all entries in file2.txt are present in file1.txt. By doing this in both directions we can see if the content of both files are “equal”. I put “equal” between quotes because duplicate lines are disregarded in this way of working.

Obviously, this is just an example: you can replace grep by any commandline file comparison tool.

Answered By: Prashanth Babu

Leave a Reply

Your email address will not be published. Required fields are marked *