IMAP de-duplicator - IMAPdedup
From time to time I end up with duplicate email messages in an IMAP mailbox, often as part of copying and archiving mail, or sometimes simply as a result of being on the recipient list for messages more than once.
Deleting these by hand is a nuisance so I created a Python script that would do it for me.
The result, imapdedup.py, is a command-line utility that, if given the details of an IMAP mailbox, go through it looking for messages with a duplicate Message-ID header. This should normally be unique for any given message, so unless you've got some very unusual mailboxes, it's a pretty safe choice, but let me know if you'd like a more thorough check. (If you have messages without a Message-ID header, there's an option to use a checksum of the To, From, Subject and Date fields as a fallback.)
By default, it marks the second and later occurrences of a message as 'deleted' on the server. Some mail clients will let you still view such messages, so you can take a look at what's happened before compacting the mailbox. This is now becoming rarer, though - most modern clients prefer to move deleted messages straight to a 'Trash' folder. You can typically still recover them from there for a little while afterwards in case of accidents.
There are various options to do dry-runs etc.
You can get IMAPdedup from GitHub.
You can list the syntax by running
./imapdedup.py -h
You can then do something like:
./imapdedup.py -s imap.myisp.com -u myuserid -l
to list the mailboxes on the server, and
./imapdedup.py -s imap.myisp.com -u myuserid -n INBOX.Notes
to tell you what it would do to your INBOX/Notes folder. The -n option stops it from actually making any changes. If you leave that out, it will mark duplicate messages as deleted. The process can take some time on large folders or slow connections, so you may want to add the -v option so you can see how it's progressing.
It's Open Source and comes with no warranties, etc. Make sure you have backups. Hope it's useful!