Note
Click here to download the full example code
Getting StartedΒΆ
This example illustrates basic usage of dman
.
OverviewΒΆ
The dman
packages allows for convenient data storing and loading.
The focus is on human readability, hence dman
is mostly a file
hierarchy manager. It does allow storing some types to files by default,
but for others you have to define yourself how files are read from and written
to disk. More details are provided in other example. For the full
overview of introducing new types into dman
see
Defining Storables and
Defining Serializables. In all
likelihood however you will not need to do so too often. Especially
if your data is represented in terms of numpy
arrays.
We will be using those in this example.
This example starts with some basic python
types and how to store them
to disk. We then show how dman
extends on these types, allowing
for specifying file paths. Finally we show how modelclasses
,
the dman
extension of a dataclass
can be used.
If you want some code that can run out-of-the-box instead of a detailed introduction to the important components, you can take a look at the other examples listed under Design Patterns.
To run this example you will require numpy
and rich
.
Basic typesΒΆ
You can store most basic python
types. Specifically those
that can be handled by json
.
# clear any old data
import dman, shutil, os
if os.path.isdir(dman.mount()):
shutil.rmtree(dman.mount())
config = {'mode': 'automatic', 'size': 5}
data = [i**2 for i in range(config['size'])]
dman.save('result', {'config': config, 'data': data})
{'config': {'mode': 'automatic', 'size': 5}, 'data': [0, 1, 4, 9, 16]}
If you receive an error about .dman
not existing. This means you
have to create one by executing dman init
in your terminal or creating
a .dman
folder manually in your project root.
Files will, by default, be stored in this folder:
dman.tui.walk_directory(dman.mount(), show_content=True)
π .dman/cache/examples:misc:example0_common
βββ π result
βββ π result.json (148 bytes)
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
{
"config": {
"mode": "automatic",
"size": 5
},
"data": [
0,
1,
4,
9,
16
]
}
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
By default dman
can also handle numpy
arrays.
import numpy as np
dman.save('result', {'config': config, 'data': np.arange(config['size'])**2})
dman.tui.walk_directory(dman.mount(), show_content=True)
π .dman/cache/examples:misc:example0_common
βββ π result
βββ π result.json (176 bytes)
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
{
"config": {
"mode": "automatic",
"size": 5
},
"data": {
"_ser__type": "_num__ndarray",
"_ser__content": "[0, 1, 4, 9, 16]"
}
}
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
We mentioned that dman
is a file hierarchy manager.
Already some convenience is provided since we didnβt need to specify a path
for our data. This path has been determined automatically by dman.mount()
internally based on the name of the script. Of course if you want
to read the file from a different script this can be inconvenient.
So you can specify the generator yourself.
dman.save('result', 'content', generator='example_common')
dman.tui.walk_directory(dman.mount(generator='example_common'), show_content=True)
π .dman/example_common
βββ π result
βββ π result.json (9 bytes)
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
"content"
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
The signature of dman.mount()
is similar to that of dman.save()
,
dman.load()
and dman.track()
. Hence if you want to know
where your files go, you can always use it.
We can also load files of course:
print(dman.load('result', generator='example_common'))
content
If the generator
is not specified then loading only works
when executed in the same script as the one where save
was called
with the default generator
.
print(dman.load('result'))
{'config': {'mode': 'automatic', 'size': 5}, 'data': array([ 0, 1, 4, 9, 16])}
Finally we can do both at the same time
dman.save('updated', {'original': 0})
with dman.track('updated', default_factory=dict) as data:
data['value'] = 42
dman.tui.walk_directory(dman.mount('updated'), show_content=True)
π .dman/cache/examples:misc:example0_common/updated
βββ π updated.json (38 bytes)
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
{
"original": 0,
"value": 42
}
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Creating a File HierarchyΒΆ
Beyond just automatically determining a mount
point to save files
to, dman
also allows creating a file hierarchy within this folder.
To do so we need to use model types. Letβs start with the model version
of dictionaries and of lists. We will also use barray
,
which is the first storable
type. It can be written and read from disk.
The second storable is smdict
, which is simply a dictionary
that will be stored to a separate file.
from dman.numeric import barray
config = dman.smdict.from_dict(config)
data = (np.arange(config['size'])**2).view(barray)
files = dman.mdict(store_by_key=True)
files.update(config=config, data=data)
dman.save('files', files)
dman.tui.walk_directory(dman.mount('files'), show_content=True)
π .dman/cache/examples:misc:example0_common/files
β£ββ π config.json (71 bytes)
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β {
β "store": {
β "mode": "automatic",
β "size": 5
β }
β }
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β£ββ π data.npy (168 bytes)
βββ π files.json (578 bytes)
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
{
"_ser__type": "_ser__mdict",
"_ser__content": {
"store": {
"config": {
"_ser__type": "_ser__record",
"_ser__content": {
"target": "config.json",
"sto_type": "_sto__smdict"
}
},
"data": {
"_ser__type": "_ser__record",
"_ser__content": {
"target": "data.npy",
"sto_type": "_num__barray"
}
}
},
"store_by_key": true
}
}
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Now three files have been created
files.json
contains meta-data, describing the content of the other files.config.json
is what became of ourconfig
object.data.npy
stores the contents ofdata
.
Let us consider a more interesting example, using dman.mruns
. This object
acts like a list, but creates file names for storables automatically.
with dman.track('runs', default_factory=dman.mruns_factory(store_subdir=False)) as runs:
runs: dman.mruns = runs # for type hinting
runs.clear() # remove all previous runs
for i in range(3):
runs.append(np.random.uniform(size=i).view(barray))
dman.tui.walk_directory(dman.mount('runs'), show_content=True)
π .dman/cache/examples:misc:example0_common/runs
β£ββ π run-0.npy (128 bytes)
β£ββ π run-1.npy (136 bytes)
β£ββ π run-2.npy (144 bytes)
βββ π runs.json (825 bytes)
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
{
"_ser__type": "_ser__mruns",
"_ser__content": {
"stem": "run",
"run_count": 3,
"store_subdir": false,
"store": [
{
"_ser__type": "_ser__record",
"_ser__content": {
"target": "run-0.npy",
"sto_type": "_num__barray"
}
},
{
"_ser__type": "_ser__record",
"_ser__content": {
"target": "run-1.npy",
"sto_type": "_num__barray"
}
},
{
"_ser__type": "_ser__record",
"_ser__content": {
"target": "run-2.npy",
"sto_type": "_num__barray"
}
}
]
}
}
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
If you donβt care about file names, dman
can generate them automatically:
with dman.track('auto', default_factory=list) as lst:
lst.clear()
lst.extend([np.random.uniform(size=i).view(barray) for i in range(3)])
dman.tui.walk_directory(dman.mount('auto'), show_content=True)
π .dman/cache/examples:misc:example0_common/auto
β£ββ π 0cdfb3bf-799d-4ef8-998c-f6f7955ddb42.npy (144 bytes)
β£ββ π 307e91fd-c680-4718-85fa-8efdcd043dea.npy (136 bytes)
β£ββ π 6c28b730-efe5-4247-8e13-fd2d1e429967.npy (128 bytes)
βββ π auto.json (840 bytes)
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
{
"_ser__type": "_ser__mlist",
"_ser__content": {
"store": [
{
"_ser__type": "_ser__record",
"_ser__content": {
"target": "6c28b730-efe5-4247-8e13-fd2d1e429967.npy",
"sto_type": "_num__barray"
}
},
{
"_ser__type": "_ser__record",
"_ser__content": {
"target": "307e91fd-c680-4718-85fa-8efdcd043dea.npy",
"sto_type": "_num__barray"
}
},
{
"_ser__type": "_ser__record",
"_ser__content": {
"target": "0cdfb3bf-799d-4ef8-998c-f6f7955ddb42.npy",
"sto_type": "_num__barray"
}
}
]
}
}
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Warning
Both specifying file names and having them be automatically generated
have advantages and disadvantages. When specifying the file names
you risk overwriting existing data, however dman
will give a warning by default. See Mounting and Targets
for more info. Importantly if you want dman
to prompt you for a new
filename whenever it risks overwriting an existing file use:
dman.params.store.on_retouch = 'prompt'
When not specifying file names, files will likely not be removed.
Instead dman
keeps creating new files (unless if you use track
correctly
as illustrated above).
ModelclassesΒΆ
We finally briefly illustrate the usage of modelclass
.
from dman.numeric import sarray
@dman.modelclass(compact=True, storable=True)
class Config:
description: str
size: int
@dman.modelclass(storable=True, store_by_field=True)
class Data:
values: sarray[int]
output: barray = None
cfg = Config('Experiment generating numbers', 25)
data = Data(np.logspace(0, 3, cfg.size))
data.output = np.random.uniform(size=cfg.size)
The modelclass
automatically converts the numpy arrays to the
specified types:
print(
f'{type(data.values)=}',
f'{type(data.values[0])=}',
f'{type(data.output)=}',
sep='\n'
)
type(data.values)=<class 'dman.numeric.sarray[int]'>
type(data.values[0])=<class 'numpy.int64'>
type(data.output)=<class 'dman.numeric.barray'>
We can now save the result
dman.save(
'model',
dman.mdict.from_dict({'cfg': cfg, 'data': data}, store_by_key=True)
)
dman.tui.walk_directory(dman.mount('model'), show_content=True)
π .dman/cache/examples:misc:example0_common/model
β£ββ π cfg.json (70 bytes)
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β {
β "description": "Experiment generating numbers",
β "size": 25
β }
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β£ββ π data.json (366 bytes)
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β {
β "values": {
β "_ser__type": "_num__sarray",
β "_ser__content": "[1, 1, 1, 2, 3, 4, 5, 7, 10, 13, 17, 23, 31, 42, 5
β },
β "output": {
β "_ser__type": "_ser__record",
β "_ser__content": {
β "target": "output.npy",
β "sto_type": "_num__barray"
β }
β }
β }
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β£ββ π model.json (559 bytes)
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β {
β "_ser__type": "_ser__mdict",
β "_ser__content": {
β "store": {
β "cfg": {
β "_ser__type": "_ser__record",
β "_ser__content": {
β "target": "cfg.json",
β "sto_type": "Config"
β }
β },
β "data": {
β "_ser__type": "_ser__record",
β "_ser__content": {
β "target": "data.json",
β "sto_type": "Data"
β }
β }
β },
β "store_by_key": true
β }
β }
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βββ π output.npy (328 bytes)
More information on how to use modelclass
can be found in
Model Types.
Total running time of the script: ( 0 minutes 0.090 seconds)