I need to develop an ontology in computational biochemistry and molecular dynamics. For this, I have collected the terms that is going to be used and attempted to reuse ontologies by searching the terms on ontology search service, such as EBI-OLS. Some terms are very relevant to import/reuse, however, the ontology itself is intended for a more specific domain, such as National Cancer Institute Thesaurus (which has 171,081 classes). Other than that, there are other 10 source ontologies that I could potentially reuse. Some of them are also huge, such as EDAM ontology.
Is it okay to reuse ontology that seemingly intended for a more specific domain, such as cancer? We will use the ontology for a more generic use in life science, not only cancer-related domain.
Is there any general rule of thumb on which of those 10-ish ontologies that are suitable for reuse? (e.g., the paper describing that ontology should be cited by at least n number of papers, or it should be compatible with Open Biological and Biomedical Ontology (OBO) Foundry principles, or it should be backed by a well-known institution and still maintained).
How to decide the sweet spot on the number of ontology sources one can based on? While we can reuse as much available terms as we can (from many ontology sources, especially in life science domain), there is a concern that it would make the resulting knowledge graph representation much more complex.
Thank you for your answers.
Answers to your questions:
I would say yes, assuming the terms that you intent to use are indeed a match for your use case. I.e., if there is a term that you are interested in using, but say its definition or the synonyms do not match your needs, then I will probably consider not using the term.
Yes, there are. I really recommend reading the paper Ten Simple Rules for Selecting a Bio-ontology and the OBO Tutorial.
Try to keep the number of ontologies you want to use as small as is sensible (that is the smallest set of ontologies that are well aligned with the needs of your use case). The reason for this is that you will want to engage with the designers of the ontologies you use to extend and amend these ontologies for your use case. The more ontologies you use, the chances are that you will need to communicate with a larger community to affect change for your use case. This may increase development times. However, using an ontology that is not well aligned with your use case will also increase communication and timelines. Thus, the reason for keeping the number of ontologies as small as is sensible.
As for your concern regarding importing large ontologies into your ontology, the way this is dealt with is to extract only the terms you are interested using ROBOT and then to import the extracted ontology into your own ontology.
In general, I will really strongly recommend reaching out to the OBO Foundry. They have developed life science related ontologies for a number of years. Working with them you are likely to avoid many of the typical problems people run into when they start designing ontologies.
I have also written up some general guidelines from my perspective wrt choosing biological ontologies here.