Model TypesΒΆ

The objects you will use most frequently in dman are models.

ModelclassΒΆ

We start with the most flexible model type, the modelclass. Like all models it extends a classical Python class to handle storables. Internally it uses record instances to do so. Hence reading up on those in Using Records could be helpful. Defining a modelclass is similar to defining a dataclass. We will be creating one to store numpy arrays.

import dman
from dman.numeric import sarray, barray
import numpy as np

dman.log.default_config(level=dman.log.WARNING)


@dman.modelclass
class Container:
    label: str
    points: sarray[int]
    values: barray

We will be working in a temporary directory

from tempfile import TemporaryDirectory

base = TemporaryDirectory().name

We can serialize the container like any other serializable type.

container = Container("experiment", np.arange(5), np.random.randn(4))
dman.save("container", container, base=base)
dman.tui.walk_directory(dman.mount("container", base=base), show_content=True)
πŸ“‚ /tmp/tmpj2qbgsf6/cache/example3_models/container
┣━━ πŸ“„ 7cbb0c83-dc1a-4a63-b2e1-17107209fe1e.npy (160 bytes)
┗━━ πŸ“„ container.json (444 bytes)
     ──────────────────────────────────────────────────────────────────────────
      {
        "_ser__type": "Container",
        "_ser__content": {
          "label": "experiment",
          "points": {
            "_ser__type": "_num__sarray",
            "_ser__content": "[0, 1, 2, 3, 4]"
          },
          "values": {
            "_ser__type": "_ser__record",
            "_ser__content": {
              "target": "7cbb0c83-dc1a-4a63-b2e1-17107209fe1e.npy",
              "sto_type": "_num__barray"
            }
          }
        }
      }
     ──────────────────────────────────────────────────────────────────────────

Note that the contents of the container are serialized as if it were a dataclass. However the barray has been replaced by a record, pointing to a file.

container: Container = dman.load("container", base=base)
dman.tui.pprint(dman.record_fields(container))
{
β”‚   'values': Record(UL[_num__barray], target=7cbb0c83-dc1a-4a63-b2e1-17107209fe1e.npy)
}

This record is not preloaded, so the value of the barray will only be loaded when the field is accessed.

print(container.values)
dman.tui.pprint(dman.record_fields(container))
[0.66948844 0.55365967 0.03692028 0.99875648]
{
β”‚   'values': Record(_num__barray, target=7cbb0c83-dc1a-4a63-b2e1-17107209fe1e.npy)
}

So we know that the modelclass has an internal notion of records. We can use this to specify the target of the barray. The most configurable option is to just set the record manually

container.values = dman.record(np.random.randn(5).view(barray), stem="barray")
dman.save("container", container, base=base)
dman.tui.walk_directory(dman.mount("container", base=base), show_content=True)
πŸ“‚ /tmp/tmpj2qbgsf6/cache/example3_models/container
┣━━ πŸ“„ barray.npy (168 bytes)
┗━━ πŸ“„ container.json (414 bytes)
     ──────────────────────────────────────────────────────────────────────────
      {
        "_ser__type": "Container",
        "_ser__content": {
          "label": "experiment",
          "points": {
            "_ser__type": "_num__sarray",
            "_ser__content": "[0, 1, 2, 3, 4]"
          },
          "values": {
            "_ser__type": "_ser__record",
            "_ser__content": {
              "target": "barray.npy",
              "sto_type": "_num__barray"
            }
          }
        }
      }
     ──────────────────────────────────────────────────────────────────────────

Note

Note that the old file has been removed automatically, since the record tracking it has been removed. This avoids cluttering your .dman directory with untracked files. We could turn this auto cleaning behavior off as follows:

dman.params.model.auto_clean = False

When specifying the record we had to manually convert a numpy array to a barray. This happens automatically in the modelclass. You can use the dman.register_preset method to do this for your own types.

dman.register_preset(tp: Type, pre: Callable[[Any], Any])[source]

Register a preset method pre for a type tp.

When a field with that type is defined in a modelclass, pre is called before setting the field.

Example

>>> register_preset(barray, lambda arg:
>>>    arg.view(barray) if isinstance(arg, np.ndarray) else arg
>>> )

It will be useful to access the record configuration in other ways. After all, for most instances of the modelclass we likely want the same file names. Here the recordfield comes in.

@dman.modelclass
class Container:
    label: str
    points: sarray[int]
    values: barray = dman.recordfield(stem="barray")

We can see that the stem has been adjusted.

container = Container("experiment", np.arange(5), np.random.randn(4))
dman.save("container", container, base=base)
dman.tui.walk_directory(dman.mount("container", base=base), show_content=True)
πŸ“‚ /tmp/tmpj2qbgsf6/cache/example3_models/container
┣━━ πŸ“„ barray.npy (160 bytes)
┗━━ πŸ“„ container.json (414 bytes)
     ──────────────────────────────────────────────────────────────────────────
      {
        "_ser__type": "Container",
        "_ser__content": {
          "label": "experiment",
          "points": {
            "_ser__type": "_num__sarray",
            "_ser__content": "[0, 1, 2, 3, 4]"
          },
          "values": {
            "_ser__type": "_ser__record",
            "_ser__content": {
              "target": "barray.npy",
              "sto_type": "_num__barray"
            }
          }
        }
      }
     ──────────────────────────────────────────────────────────────────────────

Specifying stems like this comes at a risk however. If we save two instances of Container to the same folder, the barray.npy file will be reused.

c1 = Container("experiment", np.arange(5), np.random.randn(4))
c2 = Container("experiment", np.arange(5), np.random.randn(4))
_ = dman.save("list", [c1, c2], base=base)
[01/04/23 10:12:23] WARNING  [@list.Container.Record | fs]:          path.py:399
                             Overwritten previously stored object at
                             target "barray.npy".

By default dman gives a warning and then overrides the file. This implies that you should change your file hierarchy. Later we will show how to do so correctly. You can also configure dman to resolve this issue in other ways.

One option is to automatically add an index to the file whenever this happens.

dman.params.store.on_retouch = "auto"
c1 = Container("experiment", np.arange(5), np.random.randn(4))
c2 = Container("experiment", np.arange(5), np.random.randn(4))
_ = dman.save("list", [c1, c2], base=base)
dman.tui.walk_directory(dman.mount("list", base=base))
πŸ“‚ /tmp/tmpj2qbgsf6/cache/example3_models/list
┣━━ πŸ“„ barray.npy (160 bytes)
┣━━ πŸ“„ barray0.npy (160 bytes)
┗━━ πŸ“„ list.json (971 bytes)
Other options are
  • 'quit': The serialization process is cancelled.

  • 'prompt': Prompt the user for a file name.

The recordfield has all the options of field and record combined. Feel free to experiment with them. We can also configure stems globally.

@dman.modelclass(store_by_field=True)
class Container:
    label: str
    points: sarray[int]
    values: barray


container = Container("experiment", np.arange(5), np.random.randn(4))
dman.save("fields", container, base=base)
dman.tui.walk_directory(
    dman.mount("fields", base=base),
)
πŸ“‚ /tmp/tmpj2qbgsf6/cache/example3_models/fields
┣━━ πŸ“„ fields.json (414 bytes)
┗━━ πŸ“„ values.npy (160 bytes)

The modelclass decorator has all the options that dataclass has and some additional ones.

dman.modelclass(cls=None, /, *, name: Optional[str] = None, init=True, repr=True, eq=True, order=False, unsafe_hash=False, frozen=False, storable: bool = False, compact: bool = False, store_by_field: bool = False, cluster: bool = False, subdir: str = '', template: Optional[Any] = None, **kwargs) Callable[[Type[_T]], Type[_T]][source]

Convert a class to a modelclass.

Returns the same class as was passed in, with dunder methods added based on the fields defined in the class. The class is automatically made serializable by adding __serialize__ and __deserialize__.

If an attribute __no_serialize__ is added, then the field names listed as string within will not be included in the serialization.

The arguments of the dataclass decorator are provided and some additional arguments are also available.

Parameters:
  • cls – Class to convert.

  • name (str, optional) – Name of the serializable (and storable if required). Defaults to class name.

  • init (bool, optional) – Add an __init__ method. Defaults to True.

  • repr (bool, optional) – Add a __repr__ method. Defaults to True.

  • eq (bool, optional) – Add a __eq__ method. Defaults to True.

  • order (bool, optional) – Add rich comparison methods. Defaults to False.

  • unsafe_hash (bool, optional) – Add a __hash__ method. Defaults to False.

  • frozen (bool, optional) – Fields may not be assigned after instance creation. Defaults to False.

  • storable (bool, optional) – Make the class storable with a __write__ and __read__. Defaults to False.

  • compact (bool, optional) – Do not include serializable types during serialization, making the result more compact. Defaults to False.

  • store_by_field (bool, optional) – The stem of files is determined by the field name. Defaults to False.

  • cluster (bool, optional) – Each file is stored in a subfolder determined by the field name. Defaults to False.

  • subdir (str, optional) – Store the files in a common subfolder. Defaults to β€œβ€.

  • template (Any, optional) – Template for serialization. Defaults to None.

We provide examples of some of the more advanced features at work below

1. subdirectories: We showcase how subdirectories are determined in a modelclass.

@dman.modelclass(cluster=True, subdir='data', store_by_field=True)
class Container:
    root: barray = dman.recordfield(default_factory=lambda: np.ones(3))
    inner: barray = dman.recordfield(default_factory=lambda: np.ones(3), subdir='override')

dman.save('subdirectories', Container(), base=base)
dman.tui.walk_directory(dman.mount('subdirectories', base=base))
πŸ“‚ /tmp/tmpj2qbgsf6/cache/example3_models/subdirectories
┣━━ πŸ“‚ data
┃   ┗━━ πŸ“‚ root
┃       ┗━━ πŸ“„ root.npy (152 bytes)
┣━━ πŸ“‚ override
┃   ┗━━ πŸ“„ inner.npy (152 bytes)
┗━━ πŸ“„ subdirectories.json (477 bytes)

2. compact: We showcase how compact works. Note how no types are mentioned.

@dman.modelclass(compact=True)
class Person:
    name: str = 'Cave Johnson'
    age: int = 43
    location: sarray = dman.field(default_factory=lambda: np.array([3.0, 5.0, -100.0]))

dman.tui.print_serializable(Person())
{
  "_ser__type": "Person",
  "_ser__content": {
    "name": "Cave Johnson",
    "age": 43,
    "location": "[3.0, 5.0, -100.0]"
  }
}

3. skipping serialization: One can designate certain fields to not be serialized.

@dman.modelclass
class Adder:
    __no_serialize__ = ['ans']
    x: int
    y: int
    ans: int = None

    def eval(self):
        self.ans = self.x + self.y

add = Adder(3.0, 5.0)
add.eval()
dman.tui.print_serializable(add)
{
  "_ser__type": "Adder",
  "_ser__content": {
    "x": 3.0,
    "y": 5.0
  }
}

4. deciding between storing and serializing: Some objects can be both serialized and stored. This is how you can choose which option to use. We also showcase some other advanced features, like storable modelclasses and presets.

# This class is a storable and a serializable
@dman.modelclass(storable=True)
class Fragment:
    value: str

# Presets can be used to automatically convert strings to fragments.
dman.register_preset(
    Fragment, lambda obj: Fragment(obj) if isinstance(obj, str) else obj
)

# Specify fragment fields in a variety of ways.
@dman.modelclass(compact=True, store_by_field=True)
class Fragmenter:
    frag0: Fragment = dman.recordfield()
    frag1: Fragment = dman.field()
    frag3: Fragment
    frag4: Fragment = dman.serializefield()


dman.save('fragmenter', Fragmenter('stored', 'also stored', 'stored too', 'serialized'), base=base)
dman.tui.walk_directory(dman.mount('fragmenter', base=base), show_content=True)
πŸ“‚ /tmp/tmpj2qbgsf6/cache/example3_models/fragmenter
┣━━ πŸ“„ frag0.json (25 bytes)
┃    ──────────────────────────────────────────────────────────────────────────
┃     {
┃       "value": "stored"
┃     }
┃    ──────────────────────────────────────────────────────────────────────────
┣━━ πŸ“„ frag1.json (30 bytes)
┃    ──────────────────────────────────────────────────────────────────────────
┃     {
┃       "value": "also stored"
┃     }
┃    ──────────────────────────────────────────────────────────────────────────
┣━━ πŸ“„ frag3.json (29 bytes)
┃    ──────────────────────────────────────────────────────────────────────────
┃     {
┃       "value": "stored too"
┃     }
┃    ──────────────────────────────────────────────────────────────────────────
┗━━ πŸ“„ fragmenter.json (430 bytes)
     ──────────────────────────────────────────────────────────────────────────
      {
        "_ser__type": "Fragmenter",
        "_ser__content": {
          "frag0": {
            "target": "frag0.json",
            "sto_type": "Fragment"
          },
          "frag1": {
            "target": "frag1.json",
            "sto_type": "Fragment"
          },
          "frag3": {
            "target": "frag3.json",
            "sto_type": "Fragment"
          },
          "frag4": {
            "value": "serialized"
          }
        }
      }
     ──────────────────────────────────────────────────────────────────────────

Model ListΒΆ

After modelclasses we have some model type equivalents of basic Python types. The first of which is the model list or mlist. These are lists that can contain storables using records as is the case with modelclasses. They are used automatically by dman.

a = np.ones(3).view(dman.barray)
b = np.zeros(3).view(dman.barray)
c = np.arange(3).view(dman.barray)

lst = [a, b]
dman.save('lst', lst, base=base)
dman.tui.walk_directory(dman.mount('lst', base=base), show_content=True)
πŸ“‚ /tmp/tmpj2qbgsf6/cache/example3_models/lst
┣━━ πŸ“„ 943f15de-c92d-4ef3-b898-71b84268438a.npy (152 bytes)
┣━━ πŸ“„ b05c7c45-dada-497c-891a-8cda62b64098.npy (152 bytes)
┗━━ πŸ“„ lst.json (591 bytes)
     ──────────────────────────────────────────────────────────────────────────
      {
        "_ser__type": "_ser__mlist",
        "_ser__content": {
          "store": [
            {
              "_ser__type": "_ser__record",
              "_ser__content": {
                "target": "943f15de-c92d-4ef3-b898-71b84268438a.npy",
                "sto_type": "_num__barray"
              }
            },
            {
              "_ser__type": "_ser__record",
              "_ser__content": {
                "target": "b05c7c45-dada-497c-891a-8cda62b64098.npy",
                "sto_type": "_num__barray"
              }
            }
          ]
        }
      }
     ──────────────────────────────────────────────────────────────────────────

If we load the list from disk we can see that its type has changed.

lst: dman.mlist = dman.load('lst', base=base)
print(type(lst))
<class 'dman.model.modelclasses.mlist'>

The internal records can be accessed as follows:

for v in lst.store:
    print(v)
Record(UL[_num__barray], target=943f15de-c92d-4ef3-b898-71b84268438a.npy)
Record(UL[_num__barray], target=b05c7c45-dada-497c-891a-8cda62b64098.npy)

You can directly configure a record using the record method.

lst.record(c, stem='c')
dman.save('lst', lst, base=base)
dman.tui.walk_directory(dman.mount('lst', base=base))
πŸ“‚ /tmp/tmpj2qbgsf6/cache/example3_models/lst
┣━━ πŸ“„ 943f15de-c92d-4ef3-b898-71b84268438a.npy (152 bytes)
┣━━ πŸ“„ b05c7c45-dada-497c-891a-8cda62b64098.npy (152 bytes)
┣━━ πŸ“„ c.npy (152 bytes)
┗━━ πŸ“„ lst.json (805 bytes)

If you want a storable version of an mlist you can use smlist. Beyond being storable it acts identical to mlist in every way.

Often you want to specify file names for the internal records incrementally. Using the record method each time is not convenient however. Hence mruns (and smruns) are provided, which do so automatically.

runs = dman.mruns([a, b, c], stem='run', store_subdir=False)
dman.save('runs', runs, base=base)
dman.tui.walk_directory(dman.mount('runs', base=base))
πŸ“‚ /tmp/tmpj2qbgsf6/cache/example3_models/runs
┣━━ πŸ“„ run-0.npy (152 bytes)
┣━━ πŸ“„ run-1.npy (152 bytes)
┣━━ πŸ“„ run-2.npy (152 bytes)
┗━━ πŸ“„ runs.json (825 bytes)

Here store_subdir=False specifies that storables should be stored in the root directory of the mruns object. Usually if your storables create more files it is better to set store_subdir=True instead. Then each storable is stored in its own directory.

Model DictionaryΒΆ

Similarly to model lists, dman also provides the model dictionary mdict (smdict). On a basic level, file names for storables are generated automatically.

dct = {'a': a, 'b': b}
dman.save('dct', dct, base=base)
dman.tui.walk_directory(dman.mount('dct', base=base), show_content=True)
πŸ“‚ /tmp/tmpj2qbgsf6/cache/example3_models/dct
┣━━ πŸ“„ 39606a62-fd78-4e9a-9df2-c90cbc3f6d3c.npy (152 bytes)
┣━━ πŸ“„ 3e2d977a-474f-4b79-9ba9-c456f618b180.npy (152 bytes)
┗━━ πŸ“„ dct.json (601 bytes)
     ──────────────────────────────────────────────────────────────────────────
      {
        "_ser__type": "_ser__mdict",
        "_ser__content": {
          "store": {
            "a": {
              "_ser__type": "_ser__record",
              "_ser__content": {
                "target": "3e2d977a-474f-4b79-9ba9-c456f618b180.npy",
                "sto_type": "_num__barray"
              }
            },
            "b": {
              "_ser__type": "_ser__record",
              "_ser__content": {
                "target": "39606a62-fd78-4e9a-9df2-c90cbc3f6d3c.npy",
                "sto_type": "_num__barray"
              }
            }
          }
        }
      }
     ──────────────────────────────────────────────────────────────────────────

Standard dictionaries are converted to model dictionaries automatically whenever they contain storables.

dct: dman.mdict = dman.load('dct', base=base)
print(type(dct))
<class 'dman.model.modelclasses.mdict'>

Similarly to model lists you can access the internal records as follows

for k, v in dct.store.items():
    print(k, v)
a Record(UL[_num__barray], target=3e2d977a-474f-4b79-9ba9-c456f618b180.npy)
b Record(UL[_num__barray], target=39606a62-fd78-4e9a-9df2-c90cbc3f6d3c.npy)

You can also specify records directly

dct.record('a', c, stem='c')
dman.save('dct', dct, base=base)
dman.tui.walk_directory(dman.mount('dct', base=base))
πŸ“‚ /tmp/tmpj2qbgsf6/cache/example3_models/dct
┣━━ πŸ“„ 39606a62-fd78-4e9a-9df2-c90cbc3f6d3c.npy (152 bytes)
┣━━ πŸ“„ 3e2d977a-474f-4b79-9ba9-c456f618b180.npy (152 bytes)
┣━━ πŸ“„ c.npy (152 bytes)
┗━━ πŸ“„ dct.json (566 bytes)

Model dictionaries come with some additional settings that can aid in automatically generating suitable stems.

dct = dman.mdict.from_dict({'a': a, 'b': b, 'c': c}, store_by_key=True, store_subdir=True)
dman.save('dct2', dct, base=base)
dman.tui.walk_directory(dman.mount('dct2', base=base))
πŸ“‚ /tmp/tmpj2qbgsf6/cache/example3_models/dct2
┣━━ πŸ“‚ a
┃   ┗━━ πŸ“„ a.npy (152 bytes)
┣━━ πŸ“‚ b
┃   ┗━━ πŸ“„ b.npy (152 bytes)
┣━━ πŸ“‚ c
┃   ┗━━ πŸ“„ c.npy (152 bytes)
┗━━ πŸ“„ dct2.json (816 bytes)

Total running time of the script: ( 0 minutes 0.120 seconds)

Gallery generated by Sphinx-Gallery