web123456

Python removes duplicate lines in csv file

with open('','r') as in_file, open('','w') as out_file:

    s = set() # set for fast O(1) amortized lookup

    for line in in_file:

        if line in s: continue # skip duplicate

        s.add(line)

        out_file.write(line)
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11

There is a bug in this code. When the duplicate columns are arranged neatly, the last duplicate element will not remove all duplicate elements, and it will be left with two. (This bug will occasionally occur, and it will be fine after adjusting it all night, manual dog head)