Saturday, January 30, 2010

Class, label, sensu

In the past couple weeks we've been actively debating what the concepts like synonyms and homonyms mean in the context of the HAO. In particular how do they relate to our concepts of classes (the "real" things at the core of our ontology), and the labels we use to references those classes? We had been using tags to indicated synonyms, but our tag data-model wasn't as elegant as we needed it to be, as exemplified by the fact that too many hacks were being added to the code base to perform various calculations on our data. After much discussion and debate as to the meaning of our various tags in general we realized that we could use a new, simple model which nicely encapsulates what we wanted to capture with respect to synonyms and homonyms, classes and labels: we've called it a "Sensu".

The sensu model is simply a pointer to a label, a class, and a reference (a table with three columns).  It states that so-and-so (a reference) used a label (basiparamere) for a class (The sclerite that is connected proximally with the cupula, distally with the harpe, ventrolaterally with the parossiculus). The Sensu model provides the basic functionality of linking labels to classes.  In addition, from this simple table we can derive nearly everything we wanted to in the past with respect to synonymy and homonymy (and the "acts" of these).  For example, if two sensus share the same class, but different labels, then those labels are synonyms, and if two classes share the same label, then that label is homonymous.  If one person (reference) used two labels in conjunction with the same class, then that person performed the "act" of synonymy, if they used two classes with the same label, then they indicated homonymy.

There are some really nice consequences of this approach: for instance we don't have to specifically identify a "preferred" label or a senior synonym - if we wanted we can calculate these based on some arbitrary metric (e.g. first usage, most used, most voted for, etc.). As long as people can reference the class (some xml or similar markup), they can use whatever label (including other languages) they want for that class. We can also algorithmically detect all cases of synonymy and homonymy without specifically looking for them, i.e. if someone discovers that so-and-so used a label for a class, they need not specifically be intending to synonymize that label, but we can still calculate that a synonym is implied.

In moving this model we've also realized what appears to be a shortcoming to the OBO format, in which a singular class-label construct is the core unit (synonyms can be captured, but each class must have a unique label). The more we work on this project, the more we doubt that we can (or should) enforce labels to have just one meaning (for instance, the classic example process is both a time-based and morphological concept), and we further doubt that this is necessary for us to do some really cool things with the ontology.

2 comments:

  1. Interesting post.

    I think the OBO way to do this would be to record multiple synonyms with attributions:
    synonym: "basiparamere" EXACT [ref_id:0123456, ref_id:0123457]

    I see the synonym field as mapping to language from defined classes in the ontology, but you could also think of EXACT synonyms as asserting equivalence between the referent of the ontology term and that of some term as used in a specific publication. I see the other standard OBO synonym scopes - narrow, broad, related, as a loose mapping to linguistic usage that is useful for searching. Where there is a particularly confusing tangle of usage, I try to add notes to the OBO comment field describing the different usages.

    If you want to infer equivalence between different classes (as it sounds like you do), you need some way to specify necessary and sufficient conditions for class membership and the confidence to assert that the conditions you specify are not simply necessary conditions. In OBO, you can use the intersection_of tag. In OWL (MS) - EquivalentTo. Support for detecting equivalence is good for standard OWL reasoners, but is completely lacking in OBO-Edit. One more reason to use OWL...

    ReplyDelete
  2. Thanks for the comments.

    We just ran a little test on synonymous labels in OBO files, and you're right, while OBO throws a warning it's non critical. Looks like our Sensus will be nicely accommodated here.

    It looks like narrow, broad, related synonym are deprecated in 1.2, this seems to make sense in my mind, while we could think of examples that fit these categories it was very hard to define their limits.

    Since all our definition are genus differentia format (or will be) it should be easy to implement intersection_of (see also my recent post on the OBO list).

    ReplyDelete