lamindb.Feature¶
- class lamindb.Feature(name: str, type: str | list[type[Registry]], unit: str | None, description: str | None, synonyms: str | None)¶
Bases:
Registry
,CanValidate
,TracksRun
,TracksUpdates
Dataset dimensions.
A feature is a random variable or, equivalently, dimension of a dataset. The
Feature
registry helps tomanage metadata of features
annotate datasets by whether they measured a feature
Learn more: Tutorial: Features & labels.
- Parameters:
name –
str
Name of the feature, typically, a column name.type –
str | list[Type[Registry]]
Data type (“number”, “cat”, “int”, “float”, “bool”, “datetime”). For categorical types, can define from which registry values are sampled, e.g.,cat[ULabel]
orcat[bionty.CellType]
.unit –
str | None = None
Unit of measure, ideally SI ("m"
,"s"
,"kg"
, etc.) or"normalized"
etc.description –
str | None = None
A description.synonyms –
str | None = None
Bar-separated synonyms.
Note
For more control, you can use
bionty
registries to manage basic biological entities like genes, proteins & cell markers. Or you define custom registries to manage high-level derived features like gene sets.See also
from_df()
Create feature records from DataFrame.
features
Feature manager of an artifact or collection.
ULabel
Universal labels.
FeatureSet
Feature sets.
Example
>>> ln.Feature("cell_type_by_expert", dtype="cat", description="Expert cell type annotation").save()
Hint
Features and labels denote two ways of using entities to organize data:
A feature qualifies what is measured, i.e., a numerical or categorical random variable
A label is a measured value, i.e., a category
Consider annotating a dataset by that it measured expression of 30k genes: genes relate to the dataset as feature identifiers through a feature set with 30k members. Now consider annotating the artifact by whether that it measured the knock-out of 3 genes: here, the 3 genes act as labels of the dataset.
Re-shaping data can introduce ambiguity among features & labels. If this happened, ask yourself what the joint measurement was: a feature qualifies variables in a joint measurement. The canonical data matrix lists jointly measured variables in the columns.
Fields
- created_at DateTimeField
Time of creation of record.
- created_by ForeignKey
Creator of record, a
User
.
- run ForeignKey
Last run that created or updated the record, a
Run
.
- updated_at DateTimeField
Time of last update to record.
- id AutoField
Internal id, valid only in one DB instance.
- uid CharField
Universal id, valid across DB instances.
- name CharField
Name of feature (required).
- dtype CharField
Data type (“number”, “cat”, “int”, “float”, “bool”, “datetime”).
For categorical types, can define from which registry values are sampled, e.g.,
cat[ULabel]
orcat[bionty.CellType]
.
- unit CharField
Unit of measure, ideally SI (
m
,s
,kg
, etc.) or ‘normalized’ etc. (optional).
- description TextField
A description.
- synonyms TextField
Bar-separated (|) synonyms (optional).
- previous_runs ManyToManyField
Sequence of runs that created or updated the record.
- feature_sets ManyToManyField
Feature sets linked to this feature.
Methods
- classmethod from_df(df, field=None)¶
Create Feature records for columns..
- Return type: