Note

Click here to download the full example code

Getting Started¶

This example illustrates basic usage of dman.

Overview¶

The dman packages allows for convenient data storing and loading. The focus is on human readability, hence dman is mostly a file hierarchy manager. It does allow storing some types to files by default, but for others you have to define yourself how files are read from and written to disk. More details are provided in other example. For the full overview of introducing new types into dman see Defining Storables and Defining Serializables. In all likelihood however you will not need to do so too often. Especially if your data is represented in terms of numpy arrays. We will be using those in this example.

This example starts with some basic python types and how to store them to disk. We then show how dman extends on these types, allowing for specifying file paths. Finally we show how modelclasses, the dman extension of a dataclass can be used.

If you want some code that can run out-of-the-box instead of a detailed introduction to the important components, you can take a look at the other examples listed under Design Patterns.

To run this example you will require numpy and rich.

Basic types¶

You can store most basic python types. Specifically those that can be handled by json.

# clear any old data
import dman, shutil, os
if os.path.isdir(dman.mount()):
    shutil.rmtree(dman.mount())

config = {'mode': 'automatic', 'size': 5}
data = [i**2 for i in range(config['size'])]

dman.save('result', {'config': config, 'data': data})

{'config': {'mode': 'automatic', 'size': 5}, 'data': [0, 1, 4, 9, 16]}

If you receive an error about .dman not existing. This means you have to create one by executing dman init in your terminal or creating a .dman folder manually in your project root. Files will, by default, be stored in this folder:

dman.tui.walk_directory(dman.mount(), show_content=True)

📂 .dman/cache/examples:misc:example0_common
┗━━ 📂 result
    ┗━━ 📄 result.json (148 bytes)
         ──────────────────────────────────────────────────────────────────────
          {
            "config": {
              "mode": "automatic",
              "size": 5
            },
            "data": [
              0,
              1,
              4,
              9,
              16
            ]
          }
         ──────────────────────────────────────────────────────────────────────

By default dman can also handle numpy arrays.

import numpy as np
dman.save('result', {'config': config, 'data': np.arange(config['size'])**2})
dman.tui.walk_directory(dman.mount(), show_content=True)

📂 .dman/cache/examples:misc:example0_common
┗━━ 📂 result
    ┗━━ 📄 result.json (176 bytes)
         ──────────────────────────────────────────────────────────────────────
          {
            "config": {
              "mode": "automatic",
              "size": 5
            },
            "data": {
              "_ser__type": "_num__ndarray",
              "_ser__content": "[0, 1, 4, 9, 16]"
            }
          }
         ──────────────────────────────────────────────────────────────────────

We mentioned that dman is a file hierarchy manager. Already some convenience is provided since we didn’t need to specify a path for our data. This path has been determined automatically by dman.mount() internally based on the name of the script. Of course if you want to read the file from a different script this can be inconvenient. So you can specify the generator yourself.

dman.save('result', 'content', generator='example_common')
dman.tui.walk_directory(dman.mount(generator='example_common'), show_content=True)

📂 .dman/example_common
┗━━ 📂 result
    ┗━━ 📄 result.json (9 bytes)
         ──────────────────────────────────────────────────────────────────────
          "content"
         ──────────────────────────────────────────────────────────────────────

The signature of dman.mount() is similar to that of dman.save(), dman.load() and dman.track(). Hence if you want to know where your files go, you can always use it.

We can also load files of course:

print(dman.load('result', generator='example_common'))

content

If the generator is not specified then loading only works when executed in the same script as the one where save was called with the default generator.

print(dman.load('result'))

{'config': {'mode': 'automatic', 'size': 5}, 'data': array([ 0,  1,  4,  9, 16])}

Finally we can do both at the same time

dman.save('updated', {'original': 0})
with dman.track('updated', default_factory=dict) as data:
    data['value'] = 42
dman.tui.walk_directory(dman.mount('updated'), show_content=True)

📂 .dman/cache/examples:misc:example0_common/updated
┗━━ 📄 updated.json (38 bytes)
     ──────────────────────────────────────────────────────────────────────────
      {
        "original": 0,
        "value": 42
      }
     ──────────────────────────────────────────────────────────────────────────

Creating a File Hierarchy¶

Beyond just automatically determining a mount point to save files to, dman also allows creating a file hierarchy within this folder.

To do so we need to use model types. Let’s start with the model version of dictionaries and of lists. We will also use barray, which is the first storable type. It can be written and read from disk. The second storable is smdict, which is simply a dictionary that will be stored to a separate file.

from dman.numeric import barray
config = dman.smdict.from_dict(config)
data = (np.arange(config['size'])**2).view(barray)

files = dman.mdict(store_by_key=True)
files.update(config=config, data=data)

dman.save('files', files)
dman.tui.walk_directory(dman.mount('files'), show_content=True)

📂 .dman/cache/examples:misc:example0_common/files
┣━━ 📄 config.json (71 bytes)
┃    ──────────────────────────────────────────────────────────────────────────
┃     {
┃       "store": {
┃         "mode": "automatic",
┃         "size": 5
┃       }
┃     }
┃    ──────────────────────────────────────────────────────────────────────────
┣━━ 📄 data.npy (168 bytes)
┗━━ 📄 files.json (578 bytes)
     ──────────────────────────────────────────────────────────────────────────
      {
        "_ser__type": "_ser__mdict",
        "_ser__content": {
          "store": {
            "config": {
              "_ser__type": "_ser__record",
              "_ser__content": {
                "target": "config.json",
                "sto_type": "_sto__smdict"
              }
            },
            "data": {
              "_ser__type": "_ser__record",
              "_ser__content": {
                "target": "data.npy",
                "sto_type": "_num__barray"
              }
            }
          },
          "store_by_key": true
        }
      }
     ──────────────────────────────────────────────────────────────────────────

Now three files have been created

files.json contains meta-data, describing the content of the other files.
config.json is what became of our config object.
data.npy stores the contents of data.

Let us consider a more interesting example, using dman.mruns. This object acts like a list, but creates file names for storables automatically.

with dman.track('runs', default_factory=dman.mruns_factory(store_subdir=False)) as runs:
    runs: dman.mruns = runs     # for type hinting
    runs.clear()                # remove all previous runs
    for i in range(3):
        runs.append(np.random.uniform(size=i).view(barray))

dman.tui.walk_directory(dman.mount('runs'), show_content=True)

📂 .dman/cache/examples:misc:example0_common/runs
┣━━ 📄 run-0.npy (128 bytes)
┣━━ 📄 run-1.npy (136 bytes)
┣━━ 📄 run-2.npy (144 bytes)
┗━━ 📄 runs.json (825 bytes)
     ──────────────────────────────────────────────────────────────────────────
      {
        "_ser__type": "_ser__mruns",
        "_ser__content": {
          "stem": "run",
          "run_count": 3,
          "store_subdir": false,
          "store": [
            {
              "_ser__type": "_ser__record",
              "_ser__content": {
                "target": "run-0.npy",
                "sto_type": "_num__barray"
              }
            },
            {
              "_ser__type": "_ser__record",
              "_ser__content": {
                "target": "run-1.npy",
                "sto_type": "_num__barray"
              }
            },
            {
              "_ser__type": "_ser__record",
              "_ser__content": {
                "target": "run-2.npy",
                "sto_type": "_num__barray"
              }
            }
          ]
        }
      }
     ──────────────────────────────────────────────────────────────────────────

If you don’t care about file names, dman can generate them automatically:

with dman.track('auto', default_factory=list) as lst:
    lst.clear()
    lst.extend([np.random.uniform(size=i).view(barray) for i in range(3)])
dman.tui.walk_directory(dman.mount('auto'), show_content=True)

📂 .dman/cache/examples:misc:example0_common/auto
┣━━ 📄 0cdfb3bf-799d-4ef8-998c-f6f7955ddb42.npy (144 bytes)
┣━━ 📄 307e91fd-c680-4718-85fa-8efdcd043dea.npy (136 bytes)
┣━━ 📄 6c28b730-efe5-4247-8e13-fd2d1e429967.npy (128 bytes)
┗━━ 📄 auto.json (840 bytes)
     ──────────────────────────────────────────────────────────────────────────
      {
        "_ser__type": "_ser__mlist",
        "_ser__content": {
          "store": [
            {
              "_ser__type": "_ser__record",
              "_ser__content": {
                "target": "6c28b730-efe5-4247-8e13-fd2d1e429967.npy",
                "sto_type": "_num__barray"
              }
            },
            {
              "_ser__type": "_ser__record",
              "_ser__content": {
                "target": "307e91fd-c680-4718-85fa-8efdcd043dea.npy",
                "sto_type": "_num__barray"
              }
            },
            {
              "_ser__type": "_ser__record",
              "_ser__content": {
                "target": "0cdfb3bf-799d-4ef8-998c-f6f7955ddb42.npy",
                "sto_type": "_num__barray"
              }
            }
          ]
        }
      }
     ──────────────────────────────────────────────────────────────────────────

Warning

Both specifying file names and having them be automatically generated have advantages and disadvantages. When specifying the file names you risk overwriting existing data, however dman will give a warning by default. See Mounting and Targets for more info. Importantly if you want dman to prompt you for a new filename whenever it risks overwriting an existing file use:

dman.params.store.on_retouch = 'prompt'

When not specifying file names, files will likely not be removed. Instead dman keeps creating new files (unless if you use track correctly as illustrated above).

Modelclasses¶

We finally briefly illustrate the usage of modelclass.

from dman.numeric import sarray

@dman.modelclass(compact=True, storable=True)
class Config:
    description: str
    size: int

@dman.modelclass(storable=True, store_by_field=True)
class Data:
    values: sarray[int]
    output: barray = None

cfg = Config('Experiment generating numbers', 25)
data = Data(np.logspace(0, 3, cfg.size))
data.output = np.random.uniform(size=cfg.size)

The modelclass automatically converts the numpy arrays to the specified types:

print(
    f'{type(data.values)=}',
    f'{type(data.values[0])=}',
    f'{type(data.output)=}',
    sep='\n'
)

type(data.values)=<class 'dman.numeric.sarray[int]'>
type(data.values[0])=<class 'numpy.int64'>
type(data.output)=<class 'dman.numeric.barray'>

We can now save the result

dman.save(
    'model',
    dman.mdict.from_dict({'cfg': cfg, 'data': data}, store_by_key=True)
)
dman.tui.walk_directory(dman.mount('model'), show_content=True)

📂 .dman/cache/examples:misc:example0_common/model
┣━━ 📄 cfg.json (70 bytes)
┃    ──────────────────────────────────────────────────────────────────────────
┃     {
┃       "description": "Experiment generating numbers",
┃       "size": 25
┃     }
┃    ──────────────────────────────────────────────────────────────────────────
┣━━ 📄 data.json (366 bytes)
┃    ──────────────────────────────────────────────────────────────────────────
┃     {
┃       "values": {
┃         "_ser__type": "_num__sarray",
┃         "_ser__content": "[1, 1, 1, 2, 3, 4, 5, 7, 10, 13, 17, 23, 31, 42, 5
┃       },
┃       "output": {
┃         "_ser__type": "_ser__record",
┃         "_ser__content": {
┃           "target": "output.npy",
┃           "sto_type": "_num__barray"
┃         }
┃       }
┃     }
┃    ──────────────────────────────────────────────────────────────────────────
┣━━ 📄 model.json (559 bytes)
┃    ──────────────────────────────────────────────────────────────────────────
┃     {
┃       "_ser__type": "_ser__mdict",
┃       "_ser__content": {
┃         "store": {
┃           "cfg": {
┃             "_ser__type": "_ser__record",
┃             "_ser__content": {
┃               "target": "cfg.json",
┃               "sto_type": "Config"
┃             }
┃           },
┃           "data": {
┃             "_ser__type": "_ser__record",
┃             "_ser__content": {
┃               "target": "data.json",
┃               "sto_type": "Data"
┃             }
┃           }
┃         },
┃         "store_by_key": true
┃       }
┃     }
┃    ──────────────────────────────────────────────────────────────────────────
┗━━ 📄 output.npy (328 bytes)

More information on how to use modelclass can be found in Model Types.

Total running time of the script: ( 0 minutes 0.090 seconds)

Gallery generated by Sphinx-Gallery