Saturday, August 29, 2009

Australia, taxonomy, and integration

I recently traveled to Canberra, Australia thanks to an invite from John La Salle at CSIRO. John is one of the driving forces behind the Atlas of Living Australia, a initiative recently funded on the scale of EOL. While there I attended a couple of short workshops on phenomics and biosecurity and presented talks on the HAO. One recurring topic was how ontologies could be used to speed taxonomy. There are obvious applications, for instance our text markup proofing tool, but clearly a lot more is possible.

La Salle, for example, is pushing the idea of automated character recognition (PDF). Imagine an anatomy ontology, such as the HAO, that has associated with it a large number of annotations on images - e.g., that light micrograph of an ant head has polygons and/or point markers superimposed over it that indicate which structures are the compound eyes, ocelli, face, antennae, etc. These core data could act as the basis for algorithmic recognition on images of other ants. While this functionality is largely science fiction at the moment, it's possible that the technology will be folded into the taxonomic workflow within my lifetime. We will spend a lot of time in the next three years thinking about this and other ways the HAO can be used to address the taxonomic impediment.

Integrating the HAO with taxonomy maybe be somewhat abstract, but by the virtue of being an OBO format file available to the others the HAO is already being integrated into the ontological world (semantic web?). Richard Cole wrote to us to point out that the HAO is now visible via the Ontology Lookup Service. This service provides a nice graph to visualize and navigate the ontology. If you have other useful or cool applications that use OBO ontologies let us know and we'll point them.

Thursday, August 20, 2009

Come work with us!

Undergraduate Position Available Immediately for Fall Semester

The Hymenoptera Anatomy Ontology (HAO) project is looking for a student interested in learning biodiversity informatics, library science, entomology, imaging and scientific illustrating techniques, modern museum studies and Web design.

Hours are flexible, but will be between 8am-5pm.
Salary: $8.50 per hour (10-15 hours a week)

Responsibilities may include:
* extraction of data from historical/scientific texts (primary responsibility)
* testing and design of Web-based tools for outreach (i.e., educating non-scientists
about insects, insect anatomy, and biodiversity informatics)
* development of visual/semantic Web interfaces for novel rapidly growing dataset
* learning light microscopy imaging techniques
* learning and illustrating Hymenoptera anatomy
(i.e., what the body parts of ants, bees, wasps, and sawflies are called)

More information about the HAO project is available:

Contact: Katja Seltmann: katja_seltmann (at) ncsu (dot) edu; Room 3212 Gardner Hall; 5-2833

Wednesday, August 12, 2009

.obo encoding issues

We successfully submitted the HAO to the OBO Foundry last week(!), and we hope to ascend to candidacy after some testing / evaluation / reading / questioning / etc. by the OBO community. If you access our submitted version through SourceForge here you will probably notice some diacritical messiness, mainly this --> �

That's what happens to each letter embellished with diacritics. Each instance of Mikó and Pénzes becomes Mik� and P�nzes. This is an issue because our data are UTF-8 encoded in the database.

Why not convert all those references and names to xref IDs? We will, as soon as we know our IDs will be stable. UTF-8 support will remain an important component of our ontology, though, as we go forward. Ultimately we will be attempting to account for ALL commonly used terms in Hymenoptera species descriptions, and many of those are in (accented) French, German, and Spanish: carène intertorulaire, prépectus, écailles, Ringstück, etc.

The "encoding: UTF-8" tag is supposed to be available in OBO 1.3 spec. I know we aren't the only ones longing for UTF-8 support in OBO Edit and other tools, so developers out there - take this as a little nudge.

Friday, August 7, 2009

iterating through the low hanging fruit

Perhaps a little behind schedule with the blog post, but we've been busy!

After several weeks of concentrated editing the first "version" of the HAO is in the hands of the folks at the OBO Foundry, to be included as a candidate (it should appear in the next several daysit's here). It was interesting to learn during this process (not process sensu HAO:0000822, but process sensu evo-devo) that no ontology is actually included in the foundry, they are all candidates.

The HAO is indeed a candidate in many senses of the word, this first effort is largely to get the editing team comfortable with the steps it takes to release versions of the HAO, and the basic skeleton logic into the hands of those who can start to provide us feedback. That said, we feel pretty good about this initial effort, even though we have perhaps been gathering the low hanging fruit. We have a full fledged ontology output from a web-based application (albeit with a hack here or two), and around 90% of the terms contain definitions in human-written genus differentia formats. We've also generated HAO ids for over 1000 terms, which is an important first step towards allowing others to reference fixed points in the ontology in meaningful ways. Perhaps most importantly, we have a product that people can begin to provide critical feedback on, like "where's the nervous system" (our first comment from a non-project member, it's not in there...yet). We're depending on this feedback, both from experts on ontologies in the broader sense, and from morphologists with much more experience than us.

Along with work on the HAO itself has come some feature development for handling the ontology. We're using tags to comment and annotate the HAO. Tags in mx contain a keyword, and an optional pointer to a reference, and option comment or "value". To make tags more useful on a day to day basis we hacked up a tag browser (see above) which lets us quickly return sets and then navigate to the results.

We also generated a quick tree viewer to browse through the ontology. Watch for a public version of the viewer to appear on the glossary in the following months. The tree gives us context, allows us to quickly edit the definitions, and we can drag terms to add relationships.