Just noticed an actual "stuttering" repeat upload caused by the flickr
/flickd
processing mishandling duplicated file (one md5sum, 3 "file" names and it would re-upload the last one but tag the first one as flickd
...) After fixing the immediate problem by hand (deleting the duplicates on Flickr itself, then moving the flickd
tags around with emacs) I tried to measure the scope of the problem:
import kimdaba_album
album = kimdaba_album.parse(kimdaba_album.kimdaba_default_album())
km = {}
for img in album.findall("images/image"):
mm = img.get("md5sum")
ff = img.get("file")
km[mm] = km.get(mm, []) + [ff]
print len([k for k,v in km.items() if len(v) > 1])
Turns out there are 289 distinct images that each have multiple paths - one of which is the less-surprising 18 zero-length files, primarily from old cellphones, one is the 3-way that caused the problem, and the rest are duplicates of various kinds. Not much of a mess given that there are 131446 entries total, but it still caused a visible problem and needs fixing.