Cleaning up your Contacts with Contact Cleaner

By | February 8, 2008

Anyone who’s had an iPhone, BlackBerry, or Windows Mobile phone knows that depending on how many accounts you have with different e-mail providers, you can end up with 5 different contact entries for one person. Raja, an Engineer here at Zimbra, wrote this cool Contact Cleaner zimlet which was is in Zimbra 5.0. I asked him to write up a blog post, and make a video. Enjoy!

What is the 'Contact Cleaner' Zimlet?
Its a Zimlet that deletes and merges duplicate contacts. It goes beyond just identifying contacts with all matching fields(aka Duplicates with Perfect Match), but is smart enough to identify "most-likely" duplicates where most of the fields(firstname,lastname,email etc) are same but few different ones.

Why you might end up with duplicate contacts?

As you know, Zimbra's AddressBook can sync contacts from various sources, like: Blackberry, iSync,Outlook etc, which is very flexible to load contacts from various sources, but that also means, if any of these break and fail to recognize pre-existing contacts, you might end up with duplicate contacts.
This kind of failure usually happens when you upgraded one of the softwares that participates syncing. Softwares could be: BlackBerry, Zimbra-outlook-connector, Outlook-itself, iSyncConnector, MacOS, Zimbra-server etc.
Apart from this, you might manually import .csv from your colleague/friend or drag-dropped from shared-contacts and might endup with duplicates.
Finally, Since virtually everyone uses multiple email addresses, this ultimately also contributes to duplicate contacts for the same person.

How the Zimlet solves the problem?

This Zimlet scans the address book to see if there are any Contacts that are duplicates or most-probably a duplicate of another. Further, it classifies all such duplicate contacts into 3 broad categories, Duplicates with Perfect-match(actual duplicate), Partial-match(most-likely duplicate) and Duplicates-with-Conflicts(50/50 chance that its a duplicate).

1. Duplicates with perfect match (simplest form and usually most duplicates fall into this category)

These are the duplicates where every field matches. obviously, these are safe to delete. In this case, Contact-cleaner simply moves all the duplicates to Trash while keeping only one of them.

Duplicate1:
Firstname: John
Lastname: Doe
Email: john@foo.com
Email2: John@joe.com
Duplicate2:
Firstname: John
Lastname: Doe
Email: john@foo.com
Email2: John@joe.com
Merged or Resulting
Firstname: John
Lastname: Doe
Email: john@foo.com
Email2: John@joe.com

2. Duplicates with Partial Match: most-likely duplicates (not all fields have matched)

These are the duplicates where one of the duplicate contacts has some extra information (like: email2 information). These are usually safe to merge, and the merged or the resulting contact will be a super-set of both the duplicates.

Duplicate1:
Firstname: John
Lastname: Doe
Email: john@foo.com
Email2: John@joe.com
Email3:  
Duplicate2:
Firstname: John
Lastname: Doe
Email: john@foo.com
Email2:  
Email3: john.doe@foo.com
Merged or Resulting
Firstname: John
Lastname: Doe
Email: john@foo.com
Email2: John@joe.com
Email3: john.doe@foo.com

3. Duplicates with Conflicts (50/50 chance that its a duplicate) AND Automatic-merging will loose data, needs users attention):

These are contacts where duplicates have different values in the same field(although they had enough matches to be considered as duplicates)
E.g. You have one contact with 4 fields(like firstname, lastname email,email2)) and other contact with 4 fields(firstname, lastname email, email2)
Suppose if email2 in this case has two different values, then they become duplicates with conflicts.

E.g. Duplicates with 3 conflicting fields(email2, phone and city)

Duplicate1:
Firstname: John
Lastname: Doe
Email: john@foo.com
Email2: John@joe.com
Phone: 650-123-4567
City: San Mateo
Duplicate2:
Firstname: John
Lastname: Doe
Email: john@foo.com
Email2: john.doe@foo.com
Phone: 888-888-8888
City: Sunnyvale

… in this case, you have 3 options to fix:

OPTION 3.1: Ignore merging:

You might have two people with same firstName and LastName or for any other reason you decided not to merge. Selecting this would ignore merging.

OPTION 3.2(Automatic): Add Conflicting email info to 'email2' and 'email3' fields and the rest of the conflicts to 'Notes'-section:

In this case, Zimlet will use one of the duplicate contact's conflicting info as original and automatically
adds all the conflicting info(say conflicting phone number, conflicting City etc) to Notes-section.

Secondly, a special consideration is given to conflicting Emails-fields, such that, zimlet tries to squeeze in the conflicting emails into empty-email field(say email3 field). if we have email1@foo.com and email2@foo.com



Gallery Download Link (also available in your servers /opt/zimbra/zimlets-extra)

Updated link: http://gallery.zimbra.com/type/zimlet/contact-cleaner
-Mike


Comments

  • I use the contact cleaner because of a synchronization issue (contacts get duplicated), however I find contact cleaner only works in the trivial cases and in frequent cases it is unusable.

    Example, only first name matches: contact cleaner suggests to merge them even if last name and email differs. When I have 10 Daniel Xyz and 1 Daniel Abc, I cannot merge the 10 Daniel Xyz and just exclude Daniel Abc. (Actually contact cleaner shouldn’t list Daniel Abc at all, as common first names among different contacts is, well, quite common).

    Is contact cleaner actively maintained? Or alternatively, is it possible to develop my own extension based on the current contact cleaner? I wouldn’t want to start from scratch, but to make it a bit smarter shouldn’t be too difficult.

    Commented on December 19, 2011 at 12:42 pm