Schema
UnifyBio's core data model consists of an extended and annotated Datomic schema. You should refer to the official Datomic docs to understand the available primitive data types, attribute organization and cardinalities, and how references between entities are handled.
Unify's Metamodel
Unify expects and enforces the convention, common in most Datomic schemas in practice, that
that related attributes are grouped by entity type, termed kinds, to disambiguate
semantic information encoded in the data model from the use of programming language type systems
and their constraints. In this convention, the type information is encoded in the name
of the attribute by its namespace. For instance, in the attribute :measurement/fpkm,
the kind enformation is encoded in the namespace portion, measurement.
References Attributes
Datomic's data model enforces no constraints and grants no affordances to this convention out of the box, so the Unify metamodel adds annotations to the database schema which provide additional constraints and affordances around the representaiton and use of kinds in the data model. Specifically, it specifies that certain reference attributes point from one kind to another, that certain kinds are children of other kinds (and inversely, some kinds are therefore parents to others) in the dataset tree.
Identity
The Unify metamodel also specifies different ways that entities may be uniquely identified.
Either an identification is assumed to be globally unique at import time, whether through
a single attribute or composite ID, or the entity's identity must be scoped to its
context in the dataset. Unforunately, it is all too common that various public and
proprietary datasets only supply weak identifiers, such as integer keys, or string composites of
type and integer, possibly with small string codes, e.g. "trial-bcc-subject-1". These
keys do not provide the strong uniqueness guarantees that keys like UUIDs do, so on import,
Unify will prefix these with context, into a tuple of dataset and the context scope from
the dataset tree, e.g. ["dataset-name" "trial-bcc-subject-1"] or, with measurements, when
the tree context and synthetic composite IDs are combined.
["dataset-name" "rna-seq/processing-method-1/sample-id--subject-id--BRCA]`