Clustering is a process of grouping matched records into a single group.
Clustering & Merging :
A Group is a set of records which are identified as pertaining to same entity (The entity could be individual or house hold).
When a batch of records are de-duped, every record of the batch should be compared with every other record and the results analyzed to form groups(clusters).
The results may be interpreted with two approaches for grouping.
Open Group Policy: Under this policy, records within a group are related to other records (within the same group) either 'directly' or 'indirectly'. The elements of the group share Transitive property.
- Example: If A~B, B~C and C~D, then A,B,C,D are inserted in same group.
Closed Group Policy: Under this policy, each record within a group is 'directly' related to every other record (within the same group) . The elements of the group do not support Transitive property.
- Example: If A~B,A~C,A~D,B~C,B~D and C~D, then A,B,C,D are inserted in same group.
Where ~ means relation is shared between the participants.
To adopt a closed group policy, an open group has to be investigated for its disintegration into closed groups. This disintegration may not be unique thus inviting arbitrariness to resolve the same.
The tool uses the open group policy.
Dual Clustering :
The tool makes provision for two levels of clustering. One clustering based on very stringent(tight) rules and other based on little liberal(loose) based. The former is referred to as Confident Cluster(formed from confident matches) and later Probable Cluster (formed from probable matches). Every record of customer master will have both these clusterIDs. Confident Cluster is utilized for the day today process which is automated. Probable clusters may be reviewed manually at any point of time as desired.
Thus
- Confident cluster can be at the most be equal(usually smaller) to a probable cluster. Probable cluster will be a superset of Confident clusters.
- A probable cluster may contain one or more Confident clusters.
- Wherever a record is unique, the corresponding cluster will have single record.
- The customer master will be built with Confident clusters only. Integration with other systems shall happen with the master. The probable cluster is only indicative. They are meant for review at any time of convenience.
- The probable clusters as and when reviewed, may lead to realignment of few Confident clusters.
For demo, product and solution evaluations and pricing details please
contact us.