Note
Click here to download the full example code
Model TypesΒΆ
The objects you will use most frequently in dman
are models.
ModelclassΒΆ
We start with the most flexible model type, the modelclass
.
Like all models it extends a classical Python class to handle storables.
Internally it uses record
instances to do so. Hence reading up
on those in Using Records
could be helpful. Defining a modelclass
is similar to defining a
dataclass
. We will be creating one to store numpy arrays.
import dman
from dman.numeric import sarray, barray
import numpy as np
dman.log.default_config(level=dman.log.WARNING)
@dman.modelclass
class Container:
label: str
points: sarray[int]
values: barray
We will be working in a temporary directory
from tempfile import TemporaryDirectory
base = TemporaryDirectory().name
We can serialize the container like any other serializable type.
container = Container("experiment", np.arange(5), np.random.randn(4))
dman.save("container", container, base=base)
dman.tui.walk_directory(dman.mount("container", base=base), show_content=True)
π /tmp/tmpj2qbgsf6/cache/example3_models/container
β£ββ π 7cbb0c83-dc1a-4a63-b2e1-17107209fe1e.npy (160 bytes)
βββ π container.json (444 bytes)
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
{
"_ser__type": "Container",
"_ser__content": {
"label": "experiment",
"points": {
"_ser__type": "_num__sarray",
"_ser__content": "[0, 1, 2, 3, 4]"
},
"values": {
"_ser__type": "_ser__record",
"_ser__content": {
"target": "7cbb0c83-dc1a-4a63-b2e1-17107209fe1e.npy",
"sto_type": "_num__barray"
}
}
}
}
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Note that the contents of the container are serialized as if it were a dataclass.
However the barray
has been replaced by a record, pointing to a file.
container: Container = dman.load("container", base=base)
dman.tui.pprint(dman.record_fields(container))
{
β 'values': Record(UL[_num__barray], target=7cbb0c83-dc1a-4a63-b2e1-17107209fe1e.npy)
}
This record is not preloaded, so the value of the barray will only be loaded when the field is accessed.
print(container.values)
dman.tui.pprint(dman.record_fields(container))
[0.66948844 0.55365967 0.03692028 0.99875648]
{
β 'values': Record(_num__barray, target=7cbb0c83-dc1a-4a63-b2e1-17107209fe1e.npy)
}
So we know that the modelclass has an internal notion of records.
We can use this to specify the target of the barray
.
The most configurable option is to just set the record manually
container.values = dman.record(np.random.randn(5).view(barray), stem="barray")
dman.save("container", container, base=base)
dman.tui.walk_directory(dman.mount("container", base=base), show_content=True)
π /tmp/tmpj2qbgsf6/cache/example3_models/container
β£ββ π barray.npy (168 bytes)
βββ π container.json (414 bytes)
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
{
"_ser__type": "Container",
"_ser__content": {
"label": "experiment",
"points": {
"_ser__type": "_num__sarray",
"_ser__content": "[0, 1, 2, 3, 4]"
},
"values": {
"_ser__type": "_ser__record",
"_ser__content": {
"target": "barray.npy",
"sto_type": "_num__barray"
}
}
}
}
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Note
Note that the old file has been removed automatically, since the record
tracking it has been removed. This avoids cluttering your
.dman
directory with untracked files. We could turn this auto cleaning
behavior off as follows:
dman.params.model.auto_clean = False
When specifying the record we had to manually convert a numpy array to
a barray
. This happens automatically in the modelclass
. You can use
the dman.register_preset
method to do this for your own types.
- dman.register_preset(tp: Type, pre: Callable[[Any], Any])[source]
Register a preset method
pre
for a typetp
.When a field with that type is defined in a modelclass,
pre
is called before setting the field.Example
>>> register_preset(barray, lambda arg: >>> arg.view(barray) if isinstance(arg, np.ndarray) else arg >>> )
It will be useful to access the record configuration in other ways.
After all, for most instances of the modelclass we likely want the same
file names. Here the recordfield
comes in.
@dman.modelclass
class Container:
label: str
points: sarray[int]
values: barray = dman.recordfield(stem="barray")
We can see that the stem has been adjusted.
container = Container("experiment", np.arange(5), np.random.randn(4))
dman.save("container", container, base=base)
dman.tui.walk_directory(dman.mount("container", base=base), show_content=True)
π /tmp/tmpj2qbgsf6/cache/example3_models/container
β£ββ π barray.npy (160 bytes)
βββ π container.json (414 bytes)
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
{
"_ser__type": "Container",
"_ser__content": {
"label": "experiment",
"points": {
"_ser__type": "_num__sarray",
"_ser__content": "[0, 1, 2, 3, 4]"
},
"values": {
"_ser__type": "_ser__record",
"_ser__content": {
"target": "barray.npy",
"sto_type": "_num__barray"
}
}
}
}
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Specifying stems like this comes at a risk however. If we save two instances
of Container
to the same folder, the barray.npy
file will be reused.
c1 = Container("experiment", np.arange(5), np.random.randn(4))
c2 = Container("experiment", np.arange(5), np.random.randn(4))
_ = dman.save("list", [c1, c2], base=base)
[01/04/23 10:12:23] WARNING [@list.Container.Record | fs]: path.py:399
Overwritten previously stored object at
target "barray.npy".
By default dman
gives a warning and then overrides the file.
This implies that you should change your file hierarchy.
Later we will show how to do so correctly. You can also configure
dman
to resolve this issue in other ways.
One option is to automatically add an index to the file whenever this happens.
dman.params.store.on_retouch = "auto"
c1 = Container("experiment", np.arange(5), np.random.randn(4))
c2 = Container("experiment", np.arange(5), np.random.randn(4))
_ = dman.save("list", [c1, c2], base=base)
dman.tui.walk_directory(dman.mount("list", base=base))
π /tmp/tmpj2qbgsf6/cache/example3_models/list
β£ββ π barray.npy (160 bytes)
β£ββ π barray0.npy (160 bytes)
βββ π list.json (971 bytes)
- Other options are
'quit'
: The serialization process is cancelled.'prompt'
: Prompt the user for a file name.
The recordfield
has all the options of field
and record
combined.
Feel free to experiment with them. We can also configure stems globally.
@dman.modelclass(store_by_field=True)
class Container:
label: str
points: sarray[int]
values: barray
container = Container("experiment", np.arange(5), np.random.randn(4))
dman.save("fields", container, base=base)
dman.tui.walk_directory(
dman.mount("fields", base=base),
)
π /tmp/tmpj2qbgsf6/cache/example3_models/fields
β£ββ π fields.json (414 bytes)
βββ π values.npy (160 bytes)
The modelclass
decorator has all the options that dataclass
has and some additional ones.
- dman.modelclass(cls=None, /, *, name: Optional[str] = None, init=True, repr=True, eq=True, order=False, unsafe_hash=False, frozen=False, storable: bool = False, compact: bool = False, store_by_field: bool = False, cluster: bool = False, subdir: str = '', template: Optional[Any] = None, **kwargs) Callable[[Type[_T]], Type[_T]] [source]
Convert a class to a modelclass.
Returns the same class as was passed in, with dunder methods added based on the fields defined in the class. The class is automatically made
serializable
by adding__serialize__
and__deserialize__
.If an attribute
__no_serialize__
is added, then the field names listed as string within will not be included in the serialization.The arguments of the
dataclass
decorator are provided and some additional arguments are also available.- Parameters:
cls β Class to convert.
name (str, optional) β Name of the serializable (and storable if required). Defaults to class name.
init (bool, optional) β Add an
__init__
method. Defaults to True.repr (bool, optional) β Add a
__repr__
method. Defaults to True.eq (bool, optional) β Add a
__eq__
method. Defaults to True.order (bool, optional) β Add rich comparison methods. Defaults to False.
unsafe_hash (bool, optional) β Add a
__hash__
method. Defaults to False.frozen (bool, optional) β Fields may not be assigned after instance creation. Defaults to False.
storable (bool, optional) β Make the class storable with a
__write__
and__read__
. Defaults to False.compact (bool, optional) β Do not include serializable types during serialization, making the result more compact. Defaults to False.
store_by_field (bool, optional) β The stem of files is determined by the field name. Defaults to False.
cluster (bool, optional) β Each file is stored in a subfolder determined by the field name. Defaults to False.
subdir (str, optional) β Store the files in a common subfolder. Defaults to ββ.
template (Any, optional) β Template for serialization. Defaults to None.
We provide examples of some of the more advanced features at work below
1. subdirectories:
We showcase how subdirectories are determined in a modelclass
.
@dman.modelclass(cluster=True, subdir='data', store_by_field=True)
class Container:
root: barray = dman.recordfield(default_factory=lambda: np.ones(3))
inner: barray = dman.recordfield(default_factory=lambda: np.ones(3), subdir='override')
dman.save('subdirectories', Container(), base=base)
dman.tui.walk_directory(dman.mount('subdirectories', base=base))
π /tmp/tmpj2qbgsf6/cache/example3_models/subdirectories
β£ββ π data
β βββ π root
β βββ π root.npy (152 bytes)
β£ββ π override
β βββ π inner.npy (152 bytes)
βββ π subdirectories.json (477 bytes)
2. compact: We showcase how compact works. Note how no types are mentioned.
@dman.modelclass(compact=True)
class Person:
name: str = 'Cave Johnson'
age: int = 43
location: sarray = dman.field(default_factory=lambda: np.array([3.0, 5.0, -100.0]))
dman.tui.print_serializable(Person())
{
"_ser__type": "Person",
"_ser__content": {
"name": "Cave Johnson",
"age": 43,
"location": "[3.0, 5.0, -100.0]"
}
}
3. skipping serialization: One can designate certain fields to not be serialized.
@dman.modelclass
class Adder:
__no_serialize__ = ['ans']
x: int
y: int
ans: int = None
def eval(self):
self.ans = self.x + self.y
add = Adder(3.0, 5.0)
add.eval()
dman.tui.print_serializable(add)
{
"_ser__type": "Adder",
"_ser__content": {
"x": 3.0,
"y": 5.0
}
}
4. deciding between storing and serializing: Some objects can be both serialized and stored. This is how you can choose which option to use. We also showcase some other advanced features, like storable modelclasses and presets.
# This class is a storable and a serializable
@dman.modelclass(storable=True)
class Fragment:
value: str
# Presets can be used to automatically convert strings to fragments.
dman.register_preset(
Fragment, lambda obj: Fragment(obj) if isinstance(obj, str) else obj
)
# Specify fragment fields in a variety of ways.
@dman.modelclass(compact=True, store_by_field=True)
class Fragmenter:
frag0: Fragment = dman.recordfield()
frag1: Fragment = dman.field()
frag3: Fragment
frag4: Fragment = dman.serializefield()
dman.save('fragmenter', Fragmenter('stored', 'also stored', 'stored too', 'serialized'), base=base)
dman.tui.walk_directory(dman.mount('fragmenter', base=base), show_content=True)
π /tmp/tmpj2qbgsf6/cache/example3_models/fragmenter
β£ββ π frag0.json (25 bytes)
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β {
β "value": "stored"
β }
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β£ββ π frag1.json (30 bytes)
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β {
β "value": "also stored"
β }
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β£ββ π frag3.json (29 bytes)
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β {
β "value": "stored too"
β }
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βββ π fragmenter.json (430 bytes)
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
{
"_ser__type": "Fragmenter",
"_ser__content": {
"frag0": {
"target": "frag0.json",
"sto_type": "Fragment"
},
"frag1": {
"target": "frag1.json",
"sto_type": "Fragment"
},
"frag3": {
"target": "frag3.json",
"sto_type": "Fragment"
},
"frag4": {
"value": "serialized"
}
}
}
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Model ListΒΆ
After modelclasses we have some model type equivalents of basic Python types.
The first of which is the model list or mlist
. These are lists that
can contain storables using records as is the case with modelclasses.
They are used automatically by dman
.
a = np.ones(3).view(dman.barray)
b = np.zeros(3).view(dman.barray)
c = np.arange(3).view(dman.barray)
lst = [a, b]
dman.save('lst', lst, base=base)
dman.tui.walk_directory(dman.mount('lst', base=base), show_content=True)
π /tmp/tmpj2qbgsf6/cache/example3_models/lst
β£ββ π 943f15de-c92d-4ef3-b898-71b84268438a.npy (152 bytes)
β£ββ π b05c7c45-dada-497c-891a-8cda62b64098.npy (152 bytes)
βββ π lst.json (591 bytes)
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
{
"_ser__type": "_ser__mlist",
"_ser__content": {
"store": [
{
"_ser__type": "_ser__record",
"_ser__content": {
"target": "943f15de-c92d-4ef3-b898-71b84268438a.npy",
"sto_type": "_num__barray"
}
},
{
"_ser__type": "_ser__record",
"_ser__content": {
"target": "b05c7c45-dada-497c-891a-8cda62b64098.npy",
"sto_type": "_num__barray"
}
}
]
}
}
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
If we load the list from disk we can see that its type has changed.
lst: dman.mlist = dman.load('lst', base=base)
print(type(lst))
<class 'dman.model.modelclasses.mlist'>
The internal records can be accessed as follows:
for v in lst.store:
print(v)
Record(UL[_num__barray], target=943f15de-c92d-4ef3-b898-71b84268438a.npy)
Record(UL[_num__barray], target=b05c7c45-dada-497c-891a-8cda62b64098.npy)
You can directly configure a record using the record
method.
lst.record(c, stem='c')
dman.save('lst', lst, base=base)
dman.tui.walk_directory(dman.mount('lst', base=base))
π /tmp/tmpj2qbgsf6/cache/example3_models/lst
β£ββ π 943f15de-c92d-4ef3-b898-71b84268438a.npy (152 bytes)
β£ββ π b05c7c45-dada-497c-891a-8cda62b64098.npy (152 bytes)
β£ββ π c.npy (152 bytes)
βββ π lst.json (805 bytes)
If you want a storable version of an mlist
you can use smlist
.
Beyond being storable it acts identical to mlist
in every way.
Often you want to specify file names for the internal records incrementally.
Using the record
method each time is not convenient however.
Hence mruns
(and smruns
) are provided, which do so automatically.
runs = dman.mruns([a, b, c], stem='run', store_subdir=False)
dman.save('runs', runs, base=base)
dman.tui.walk_directory(dman.mount('runs', base=base))
π /tmp/tmpj2qbgsf6/cache/example3_models/runs
β£ββ π run-0.npy (152 bytes)
β£ββ π run-1.npy (152 bytes)
β£ββ π run-2.npy (152 bytes)
βββ π runs.json (825 bytes)
Here store_subdir=False
specifies that storables should be stored in
the root directory of the mruns
object. Usually if your storables create
more files it is better to set store_subdir=True
instead. Then each
storable is stored in its own directory.
Model DictionaryΒΆ
Similarly to model lists, dman
also provides the model dictionary mdict
(smdict
).
On a basic level, file names for storables are generated automatically.
dct = {'a': a, 'b': b}
dman.save('dct', dct, base=base)
dman.tui.walk_directory(dman.mount('dct', base=base), show_content=True)
π /tmp/tmpj2qbgsf6/cache/example3_models/dct
β£ββ π 39606a62-fd78-4e9a-9df2-c90cbc3f6d3c.npy (152 bytes)
β£ββ π 3e2d977a-474f-4b79-9ba9-c456f618b180.npy (152 bytes)
βββ π dct.json (601 bytes)
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
{
"_ser__type": "_ser__mdict",
"_ser__content": {
"store": {
"a": {
"_ser__type": "_ser__record",
"_ser__content": {
"target": "3e2d977a-474f-4b79-9ba9-c456f618b180.npy",
"sto_type": "_num__barray"
}
},
"b": {
"_ser__type": "_ser__record",
"_ser__content": {
"target": "39606a62-fd78-4e9a-9df2-c90cbc3f6d3c.npy",
"sto_type": "_num__barray"
}
}
}
}
}
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Standard dictionaries are converted to model dictionaries automatically whenever they contain storables.
dct: dman.mdict = dman.load('dct', base=base)
print(type(dct))
<class 'dman.model.modelclasses.mdict'>
Similarly to model lists you can access the internal records as follows
for k, v in dct.store.items():
print(k, v)
a Record(UL[_num__barray], target=3e2d977a-474f-4b79-9ba9-c456f618b180.npy)
b Record(UL[_num__barray], target=39606a62-fd78-4e9a-9df2-c90cbc3f6d3c.npy)
You can also specify records directly
dct.record('a', c, stem='c')
dman.save('dct', dct, base=base)
dman.tui.walk_directory(dman.mount('dct', base=base))
π /tmp/tmpj2qbgsf6/cache/example3_models/dct
β£ββ π 39606a62-fd78-4e9a-9df2-c90cbc3f6d3c.npy (152 bytes)
β£ββ π 3e2d977a-474f-4b79-9ba9-c456f618b180.npy (152 bytes)
β£ββ π c.npy (152 bytes)
βββ π dct.json (566 bytes)
Model dictionaries come with some additional settings that can aid in automatically generating suitable stems.
dct = dman.mdict.from_dict({'a': a, 'b': b, 'c': c}, store_by_key=True, store_subdir=True)
dman.save('dct2', dct, base=base)
dman.tui.walk_directory(dman.mount('dct2', base=base))
π /tmp/tmpj2qbgsf6/cache/example3_models/dct2
β£ββ π a
β βββ π a.npy (152 bytes)
β£ββ π b
β βββ π b.npy (152 bytes)
β£ββ π c
β βββ π c.npy (152 bytes)
βββ π dct2.json (816 bytes)
Total running time of the script: ( 0 minutes 0.120 seconds)