lamindb.core.CanValidate¶
- class lamindb.core.CanValidate¶
Bases:
object
Base class providing
Registry
-based validation.Attributes
Methods
- add_synonym(synonym, force=False, save=None)¶
Add synonyms to a record.
- Parameters:
synonym (
str
|List
[str
] |Series
|array
)force (
bool
, default:False
)save (
bool
|None
, default:None
)
remove_synonym()
Remove synonyms.
See also
Examples
>>> import bionty as bt >>> bt.CellType.from_public(name="T cell").save() >>> lookup = bt.CellType.lookup() >>> record = lookup.t_cell >>> record.synonyms 'T-cell|T lymphocyte|T-lymphocyte' >>> record.add_synonym("T cells") >>> record.synonyms 'T cells|T-cell|T-lymphocyte|T lymphocyte'
- classmethod inspect(values, field=None, *, mute=False, organism=None, public_source=None)¶
Inspect if values are mappable to a field.
Being mappable means that an exact match exists.
- Parameters:
values (
List
[str
] |Series
|array
) – Values that will be checked against the field.field (
str
|DeferredAttribute
|None
, default:None
) – The field of values. Examples are'ontology_id'
to map against the source ID or'name'
to map against the ontologies field names.mute (
bool
, default:False
) – Mute logging.organism (
str
|Registry
|None
, default:None
) – An Organism name or record.public_source (
Registry
|None
, default:None
) – A PublicSource record.
- Return type:
See also
Examples
>>> import bionty as bt >>> bt.settings.organism = "human" >>> ln.save(bt.Gene.from_values(["A1CF", "A1BG", "BRCA2"], field="symbol")) >>> gene_symbols = ["A1CF", "A1BG", "FANCD1", "FANCD20"] >>> result = bt.Gene.inspect(gene_symbols, field=bt.Gene.symbol) ✅ 2 terms (50.00%) are validated 🔶 2 terms (50.00%) are not validated 🟠 detected synonyms to increase validated terms, standardize them via .standardize() >>> result.validated ['A1CF', 'A1BG'] >>> result.non_validated ['FANCD1', 'FANCD20']
.
- classmethod map_synonyms(synonyms, *, return_mapper=False, case_sensitive=False, keep='first', synonyms_field='synonyms', field=None, **kwargs)¶
{}.
- Return type:
list
[str
] |dict
[str
,str
]
- remove_synonym(synonym)¶
Remove synonyms from a record.
- Parameters:
synonym (
str
|List
[str
] |Series
|array
) – The synonym value.add_synonym()
Add synonyms
See also
Examples
>>> import bionty as bt >>> bt.CellType.from_public(name="T cell").save() >>> lookup = bt.CellType.lookup() >>> record = lookup.t_cell >>> record.synonyms 'T-cell|T lymphocyte|T-lymphocyte' >>> record.remove_synonym("T-cell") 'T lymphocyte|T-lymphocyte'
- set_abbr(value)¶
Set value for abbr field and add to synonyms.
- Parameters:
value (
str
) – A value for an abbreviation.add_synonym()
Add synonyms.
See also
Examples
>>> import bionty as bt >>> bt.ExperimentalFactor.from_public(name="single-cell RNA sequencing").save() >>> scrna = bt.ExperimentalFactor.filter(name="single-cell RNA sequencing").one() >>> scrna.abbr None >>> scrna.synonyms 'single-cell RNA-seq|single-cell transcriptome sequencing|scRNA-seq|single cell RNA sequencing' >>> scrna.set_abbr("scRNA") >>> scrna.abbr 'scRNA' >>> scrna.synonyms 'scRNA|single-cell RNA-seq|single cell RNA sequencing|single-cell transcriptome sequencing|scRNA-seq' >>> scrna.save()
- classmethod standardize(values, field=None, *, return_field=None, return_mapper=False, case_sensitive=False, mute=False, public_aware=True, keep='first', synonyms_field='synonyms', organism=None)¶
Maps input synonyms to standardized names.
- Parameters:
values (
Iterable
) – Identifiers that will be standardized.field (
str
|DeferredAttribute
|None
, default:None
) – The field representing the standardized names.return_field (
str
|None
, default:None
) – The field to return. Defaults to field.return_mapper (
bool
, default:False
) – IfTrue
, returns{input_value: standardized_name}
.case_sensitive (
bool
, default:False
) – Whether the mapping is case sensitive.mute (
bool
, default:False
) – Mute logging.public_aware (
bool
, default:True
) – Whether to standardize from Bionty reference. Defaults toTrue
for Bionty registries.keep (
Literal
['first'
,'last'
,False
], default:'first'
) –- When a synonym maps to multiple names, determines which duplicates to mark as
pd.DataFrame.duplicated
: "first"
: returns the first mapped standardized name"last"
: returns the last mapped standardized nameFalse
: returns all mapped standardized name.
When
keep
isFalse
, the returned list of standardized names will contain nested lists in case of duplicates.When a field is converted into return_field, keep marks which matches to keep when multiple return_field values map to the same field value.
- When a synonym maps to multiple names, determines which duplicates to mark as
synonyms_field (
str
, default:'synonyms'
) – A field containing the concatenated synonyms.organism (
str
|Registry
|None
, default:None
) – An Organism name or record.
- Return type:
list
[str
] |dict
[str
,str
]- Returns:
If
return_mapper
isFalse
– a list of standardized names. Otherwise, a dictionary of mapped values with mappable synonyms as keys and standardized names as values.add_synonym()
Add synonyms.
remove_synonym()
Remove synonyms.
See also
Examples
>>> import bionty as bt >>> bt.settings.organism = "human" >>> ln.save(bt.Gene.from_values(["A1CF", "A1BG", "BRCA2"], field="symbol")) >>> gene_synonyms = ["A1CF", "A1BG", "FANCD1", "FANCD20"] >>> standardized_names = bt.Gene.standardize(gene_synonyms) >>> standardized_names ['A1CF', 'A1BG', 'BRCA2', 'FANCD20']
.
- classmethod validate(values, field=None, *, mute=False, organism=None)¶
Validate values against existing values of a string field.
Note this is strict validation, only asserts exact matches.
- Parameters:
values (
List
[str
] |Series
|array
) – Values that will be validated against the field.field (
str
|DeferredAttribute
|None
, default:None
) – The field of values. Examples are'ontology_id'
to map against the source ID or'name'
to map against the ontologies field names.mute (
bool
, default:False
) – Mute logging.
- Return type:
ndarray
- Returns:
A vector of booleans indicating if an element is validated.
See also
Examples
>>> import bionty as bt >>> bt.settings.organism = "human" >>> ln.save(bt.Gene.from_values(["A1CF", "A1BG", "BRCA2"], field="symbol")) >>> gene_symbols = ["A1CF", "A1BG", "FANCD1", "FANCD20"] >>> bt.Gene.validate(gene_symbols, field=bt.Gene.symbol) ✅ 2 terms (50.00%) are validated 🔶 2 terms (50.00%) are not validated array([ True, True, False, False])
.