lamindb.Artifact¶

class lamindb.Artifact(data: UPathStr, key: str | None = None, description: str | None = None, is_new_version_of: Artifact | None = None, run: Run | None = None)¶

Bases: Registry, Data, IsVersioned, TracksRun, TracksUpdates

Artifacts: datasets & models stored as files, folders, or arrays.

Artifacts manage data in local or remote storage.

An artifact stores a dataset or model as either a file or a folder.

Some artifacts are array-like, e.g., when stored as .parquet, .h5ad, .zarr, or .tiledb.

For more info, see tutorial: Tutorial: Artifacts.

Parameters:

path – UPathStr A path to a local or remote folder or file.
key – str | None = None A relative path within default storage, e.g., "myfolder/myfile.fcs".
description – str | None = None A description.
version – str | None = None A version string.
is_new_version_of – Artifact | None = None A previous version of the artifact.
run – Run | None = None The run that creates the artifact.

See also

Storage: Storage locations for artifacts.
Collection: Collections of artifacts.
from_df(): Create an artifact from a DataFrame.
from_anndata(): Create an artifact from an AnnData.
from_dir(): Bulk create file-like artifacts from a directory.

Examples

Create an artifact from a file in the cloud:

>>> artifact = ln.Artifact("s3://my-bucket/my-folder/my-file.csv", description="My file")
>>> artifact.save()  # only metadata is saved

Create an artifact from a local filepath:

>>> artifact = ln.Artifact("./my_file.jpg", description="My image")
>>> artifact.save()

Make a new version of an artifact:

>>> # a non-versioned artifact
>>> artifact = ln.Artifact(df1, description="My dataframe")
>>> artifact.save()
>>> # version an artifact
>>> new_artifact = ln.Artifact(df2, is_new_version_of=artifact)
>>> assert new_artifact.stem_uid == artifact.stem_uid
>>> assert artifact.version == "1"
>>> assert new_artifact.version == "2"

Properties

path¶

Path.

Examples

File in cloud storage:

>>> ln.Artifact("s3://lamindb-ci/lndb-storage/pbmc68k.h5ad").save()
>>> artifact = ln.Artifact.filter(key="lndb-storage/pbmc68k.h5ad").one()
>>> artifact.path
S3Path('s3://lamindb-ci/lndb-storage/pbmc68k.h5ad')

File in local storage:

>>> ln.Artifact("./myfile.csv", description="myfile").save()
>>> artifact = ln.Artifact.filter(description="myfile").one()
>>> artifact.path
PosixPath('/home/runner/work/lamindb/lamindb/docs/guide/mydata/myfile.csv')

.

Fields

version CharField

Version (default None).

Defines version of a family of records characterized by the same stem_uid.

Consider using semantic versioning with Python versioning.

created_at DateTimeField: Time of creation of record.

created_by ForeignKey: Creator of record, a User.

updated_at DateTimeField: Time of last update to record.

id AutoField: Internal id, valid only in one DB instance.

uid CharField: A universal random id (20-char base62 ~ UUID), valid across DB instances.

description CharField: A description.

storage ForeignKey: Storage location (Storage), e.g., an S3 or GCP bucket or a local directory.

key CharField: Storage key, the relative path within the storage location.

suffix CharField

Path suffix or empty string if no canonical suffix exists.

This is either a file suffix (".csv", ".h5ad", etc.) or the empty string “”.

accessor CharField

Default backed or memory accessor, e.g., DataFrame, AnnData.

Soon, also: SOMA, MuData, zarr.Group, tiledb.Array, etc.

size BigIntegerField

Size in bytes.

Examples: 1KB is 1e3 bytes, 1MB is 1e6, 1GB is 1e9, 1TB is 1e12 etc.

hash CharField

Hash or pseudo-hash of artifact content.

Useful to ascertain integrity and avoid duplication.

hash_type CharField: Type of hash.

n_objects BigIntegerField

Number of objects.

Typically, this denotes the number of files in an artifact.

n_observations BigIntegerField

Number of observations.

Typically, this denotes the first array dimension.

transform ForeignKey: Transform whose run created the artifact.

run ForeignKey: Run that created the artifact.

visibility SmallIntegerField: Visibility of artifact record in queries & searches (0 default, 1 hidden, 2 trash).

key_is_virtual BooleanField: Indicates whether key is virtual or part of an actual file path.

ulabels ManyToManyField: The ulabels measured in the artifact (ULabel).

input_of ManyToManyField: Runs that use this artifact as an input.

previous_runs ManyToManyField: Sequence of runs that created or updated the record.

feature_sets ManyToManyField: The feature sets measured in the artifact (FeatureSet).

feature_values ManyToManyField: Non-categorical feature values for annotation.

Methods

backed(is_run_input=None)¶

Return a cloud-backed data object.

Return type:: AnnDataAccessor | BackedAccessor

Notes

For more info, see tutorial: Query arrays.

Examples

Read AnnData in backed mode from cloud:

>>> artifact = ln.Artifact.filter(key="lndb-storage/pbmc68k.h5ad").one()
>>> artifact.backed()
AnnData object with n_obs × n_vars = 70 × 765 backed at 's3://lamindb-ci/lndb-storage/pbmc68k.h5ad'

cache(is_run_input=None)¶

Download cloud artifact to local cache.

Follows synching logic: only caches an artifact if it’s outdated in the local cache.

Returns a path to a locally cached on-disk object (say, a .jpg file).

Return type:: Path

Examples

Sync file from cloud and return the local path of the cache:

>>> artifact.cache()
PosixPath('/home/runner/work/Caches/lamindb/lamindb-ci/lndb-storage/pbmc68k.h5ad')

delete(permanent=None, storage=None, using_key=None)¶

Delete.

A first call to .delete() puts an artifact into the trash (sets visibility to -1).

A second call permanently deletes the artifact.

FAQ: Storage FAQ

Parameters:

permanent (bool | None, default: None) – Permanently delete the artifact (skip trash).
storage (bool | None, default: None) – Indicate whether you want to delete the artifact in storage.

Return type:

None

Examples

For an Artifact object artifact, call:

>>> artifact.delete()

classmethod from_anndata(adata, key=None, description=None, run=None, version=None, is_new_version_of=None, **kwargs)¶

Create from AnnData, validate & link features.

Parameters:

adata (AnnData | str | Path) – An AnnData object or a path of AnnData-like.
key (str | None, default: None) – A relative path within default storage, e.g., "myfolder/myfile.h5ad".
description (str | None, default: None) – A description.
version (str | None, default: None) – A version string.
is_new_version_of (Artifact | None, default: None) – An old version of the artifact.
run (Run | None, default: None) – The run that creates the artifact.

Return type:

Artifact

See also

Collection(): Track collections.
Feature: Track features.

Examples

>>> import bionty as bt
>>> bt.settings.organism = "human"
>>> adata = ln.core.datasets.anndata_with_obs()
>>> artifact = ln.Artifact.from_anndata(adata, description="mini anndata with obs")
>>> artifact.save()

.

classmethod from_df(df, key=None, description=None, run=None, version=None, is_new_version_of=None, **kwargs)¶

Create from DataFrame, validate & link features.

For more info, see tutorial: Tutorial: Artifacts.

Parameters:

df (DataFrame) – A DataFrame object.
key (str | None, default: None) – A relative path within default storage, e.g., "myfolder/myfile.parquet".
description (str | None, default: None) – A description.
version (str | None, default: None) – A version string.
is_new_version_of (Artifact | None, default: None) – An old version of the artifact.
run (Run | None, default: None) – The run that creates the artifact.

Return type:

Artifact

See also

Collection(): Track collections.
Feature: Track features.

Examples

>>> df = ln.core.datasets.df_iris_in_meter_batch1()
>>> df.head()
  sepal_length sepal_width petal_length petal_width iris_organism_code
0        0.051       0.035        0.014       0.002                 0
1        0.049       0.030        0.014       0.002                 0
2        0.047       0.032        0.013       0.002                 0
3        0.046       0.031        0.015       0.002                 0
4        0.050       0.036        0.014       0.002                 0
>>> artifact = ln.Artifact.from_df(df, description="Iris flower collection batch1")
>>> artifact.save()

.

classmethod from_dir(path, key=None, *, run=None)¶

Create a list of artifact objects from a directory.

Hint

If you have a high number of files (several 100k) and don’t want to track them individually, create a single Artifact via Artifact(path) for them. See, e.g., RxRx: cell imaging.

Parameters:

path (str | Path) – Source path of folder.
key (str | None, default: None) – Key for storage destination. If None and directory is in a registered location, an inferred key will reflect the relative position. If None and directory is outside of a registered storage location, the inferred key defaults to path.name.
run (Run | None, default: None) – A Run object.

Return type:

list[Artifact]

Examples

>>> dir_path = ln.core.datasets.generate_cell_ranger_files("sample_001", ln.settings.storage)
>>> artifacts = ln.Artifact.from_dir(dir_path)
>>> ln.save(artifacts)

.

classmethod from_mudata(mdata, key=None, description=None, run=None, version=None, is_new_version_of=None, **kwargs)¶

Create from MuData, validate & link features.

Parameters:

mdata (MuData) – An MuData object.
key (str | None, default: None) – A relative path within default storage, e.g., "myfolder/myfile.h5mu".
description (str | None, default: None) – A description.
version (str | None, default: None) – A version string.
is_new_version_of (Artifact | None, default: None) – An old version of the artifact.
run (Run | None, default: None) – The run that creates the artifact.

Return type:

Artifact

See also

Collection(): Track collections.
Feature: Track features.

Examples

>>> import bionty as bt
>>> bt.settings.organism = "human"
>>> mdata = ln.core.datasets.mudata_papalexi21_subset()
>>> artifact = ln.Artifact.from_mudata(mdata, description="a mudata object")
>>> artifact.save()

.

load(is_run_input=None, stream=False, **kwargs)¶

Stage and load to memory.

Returns in-memory representation if possible, e.g., an AnnData object for an h5ad file.

Return type:: Any

Examples

Load as a DataFrame:

>>> df = ln.core.datasets.df_iris_in_meter_batch1()
>>> ln.Artifact.from_df(df, description="iris").save()
>>> artifact = ln.Artifact.filter(description="iris").one()
>>> artifact.load().head()
sepal_length sepal_width petal_length petal_width iris_organism_code
0        0.051       0.035        0.014       0.002                 0
1        0.049       0.030        0.014       0.002                 0
2        0.047       0.032        0.013       0.002                 0
3        0.046       0.031        0.015       0.002                 0
4        0.050       0.036        0.014       0.002                 0

Load as an AnnData:

>>> artifact.load()
AnnData object with n_obs × n_vars = 70 × 765

Fall back to cache() if no in-memory representation is configured:

>>> artifact.load()
PosixPath('/home/runner/work/lamindb/lamindb/docs/guide/mydata/.lamindb/jb7BY5UJoQVGMUOKiLcn.jpg')

replace(data, run=None, format=None)¶

Replace artifact content.

Parameters:

data (str | Path) – A file path.
run (Run | None, default: None) – The run that created the artifact gets auto-linked if ln.track() was called.

Return type:

None

Examples

Say we made a change to the content of an artifact, e.g., edited the image paradisi05_laminopathic_nuclei.jpg.

This is how we replace the old file in storage with the new file:

>>> artifact.replace("paradisi05_laminopathic_nuclei.jpg")
>>> artifact.save()

Note that this neither changes the storage key nor the filename.

However, it will update the suffix if it changes.

restore()¶

Restore from trash.

Return type:: None

Examples

For any Artifact object artifact, call:

>>> artifact.restore()

save(upload=None, **kwargs)¶

Save to database & storage.

Parameters:: upload (bool | None, default: None) – Trigger upload to cloud storage in instances with hybrid storage mode.
Return type:: None

Examples

>>> artifact = ln.Artifact("./myfile.csv", description="myfile")
>>> artifact.save()

stage(is_run_input=None)¶

Download cloud artifact to local cache.

Follows synching logic: only caches an artifact if it’s outdated in the local cache.

Returns a path to a locally cached on-disk object (say, a .jpg file).

Return type:: Path

Examples

Sync file from cloud and return the local path of the cache:

>>> artifact.cache()
PosixPath('/home/runner/work/Caches/lamindb/lamindb-ci/lndb-storage/pbmc68k.h5ad')