Crowdsourcing classification

Different types of human-generated metadata can be used to define content. Classification schemes such as the International Patent Classification and MEDLINE’s MeSH are structured and controlled, but require trained experts and central management to restrict ambiguity. Ambiguity is the poison of classification.

Elsewhere collaboratively generated tags can be aggregated into folksonomies to categorize content, however the greater freedom enjoyed by a crowd typically results in less precision.

DZ effectively separates the ‘what is what’ and the ‘what goes where’ of traditional classification schemes. ‘What is what’ is controlled centrally. ‘What goes where’ is open to distributed development.

In DZ, Zootags are created that define entities in science and technology. There is a single meaning for each Zootag and a single Zootag for each meaning. Registrars control the acceptance of new Zootags. DZ implements fuzzy classification, rather than simple ‘class or no class’ classification. As a result, Zootags may be similar although they must be distinct and non-ambiguous.

Similar entities are organised on a sliding scale of likeness on graphical representations called Zootag Steering Diagrams (ZSDs). The work is performed by experts from the different technical fields. Different experts, or groups of experts are at liberty to work independently from other experts.

Concepts are defined by 5 Zootags: one each from a category of Application, Technology, Operation, Problem and Solution. Together the 5 Zootags can effectively ‘triangulate’ a concept.
The crowd are free to classify disclosures as they see fit, using a hyperlinked library of the ZSD’s to find the correct Zootags. We use a ‘fractal search’ algorithm to link the work of the different experts and classifiers to find the closest prior art, or to search for a novel solution. The DZ design thus limits ambiguity whilst facilitating a crowdsourcing effort.

The translation of the classification codes into different languages will support multi-lingual search without the need for document translation. Holding the classification information in a central index can support search across different formats and copyright protected literature.