Skip to content

eliegoudout/paramclasses

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

OS Independant python versions license MIT pypi pipeline status codecov mypy typed Ruff uv

ParamClass

# Install from PyPI
pip install paramclasses
Table of Contents
  1. πŸ‘©β€πŸ« Rationale
  2. 🧐 Overview
  3. πŸ‘©β€πŸ’» Subclassing API
  4. πŸ€“ Advanced
  5. πŸš€ Contributing
  6. βš–οΈ License

1. Rationale πŸ‘©β€πŸ«

Parameter-holding classes vs. inheritance...

For a parameter-holding class, like dataclasses, it would be nice to embark some inherited functionality -- e.g. params property to access current (param, value) pairs, missing_params for unassigned parameter keys,... Such inheritance would allow to factor out specialized functionality for context-dependant methods -- e.g. fit, reset, plot, etc... However, such subclassing comes with a risk of attributes conflicts, especially for libraries or exposed APIs, when users do not necessarily know every "read-only" (or "protected") attributes from base classes.

Our solution 😌

To solve this problem, we propose a base ParamClass and an @protected decorator, which robustly protects any target attribute -- not only parameters -- from being accidentally overriden when subclassing, at runtime. If a subclass tries to override an attribute protected by one of its parents, a detailed ProtectedError will be raised and class definition will fail.

Why not use @dataclass(frozen=True) or typing.final?

First of all, the @dataclass(frozen=True) decorator only applies protection to instances. Besides, it targets all attributes indifferently. Morover, it does not protect against deletion or direct vars(instance) manipulation. Finally, protection is not inherited, thus subclasses need to use the decorator again, while being cautious not to silently override previously protected attributes.

The typing alternatives @final and Final are designed for type checkers only, which we do not want to rely on. From python 3.11 onwards, final does add a __final__ flag when possible, but it will not affect immutable objects.

We also mention this recent PEP draft considering attribute-level protection, again for type checkers and without considering subclassing protection.

Disclaimer

Note that the protection provided by paramclasses is very robust for practical use, but it should not be considered a security feature.

Back to Table of ContentsπŸ‘†

2. Overview 🧐

Defining a paramclass

A paramclass is simply defined by subclassing ParamClass directly or another paramclass. Similarly to dataclasses, parameters are identified as any annotated attribute and instantiation logic is automatically built-in -- though it can be extended. In our context, "default" means the current class value, which may change after the instantiation of an object.

from paramclasses import ParamClass

class A(ParamClass):
    parameter_with_a__default_value: ... = "default value"
    parameter_with_no_default_value: ...
    not_a_parameter = "not a parameter"
    def an_actual_method(self): ...
    def a_method_turned_into_a_parameter(self): ...
    a_method_turned_into_a_parameter: ...

Instances have natural __str__ and __repr__ methods -- which can be overriden in subclasses --, the former displaying only nondefault or missing parameter values.

>>> print(A(parameter_with_a__default_value="nondefault value"))  # Calls `__str__`
A(parameter_with_a__default_value='nondefault value', parameter_with_no_default_value=?)

One accesses current parameters dict and missing parameters of an instance with the properties params and missing_params respectively.

>>> from pprint import pprint
>>> pprint(A().params)
{'a_method_turned_into_a_parameter': <function A.a_method_turned_into_a_parameter at 0x11067b9a0>,
 'parameter_with_a__default_value': 'default value',
 'parameter_with_no_default_value': ?}
>>> A().missing_params
('parameter_with_no_default_value',)

Note that A().a_method_turned_into_a_parameter is not a bound method -- see Descriptor parameters.

Back to Table of ContentsπŸ‘†

Protecting attributes with @protected

Say we define the following BaseEstimator class.

from paramclasses import ParamClass, protected

class BaseEstimator(ParamClass):
    @protected
    def fit(self, data): ...  # Some fitting logic

Then, we are guaranteed that no subclass can redefine fit.

>>> class Estimator(BaseEstimator):
...     fit = True  # This should FAIL
... 
<traceback>
ProtectedError: 'fit' is protected by 'BaseEstimator'

This runtime protection can be applied to all methods, properties, attributes -- with protected(value) --, etc... during class definition but not after. It is "robust" in the sense that breaking the designed behaviour, though possible, requires -- to our knowledge -- obscure patterns.

Back to Table of ContentsπŸ‘†

Seamless attributes interactions

Parameters can be assigned values like any other attribute -- unless specifically protected -- with instance.attr = value. It is also possible to set multiple parameters at once with keyword arguments during instantiation, or after with set_params.

class A(ParamClass):
    x: ...      # Parameter without default value
    y: ... = 0  # Parameter with default value `0`
    z: ... = 0  # Parameter with default value `0`
    t = 0       # Non-parameter attribute
>>> a = A(y=1); a.t = 1; a    # Instantiation assignments
A(x=?, y=1, z=0)              # Shows every parameter with "?" for missing values
>>> A().set_params(x=2, y=2)  # `set_params` assignments
>>> A().y = 1                 # Usual assignment
>>> del A(x=0).x              # Usual deletion
>>> A.y = 1                   # Class-level assignment/deletion works...
>>> print(a)
A(x=?)                        # ... and default value gets updated -- otherwise would show `A(x=?, y=1)`
>>> a.set_params(t=0)         # Should FAIL: Non-parameters cannot be assigned with `set_params`
<traceback>
AttributeError: Invalid parameters: {'t'}. Operation cancelled

Back to Table of ContentsπŸ‘†

Expected getattr, setattr and delattr behaviour

Table of Expected Behaviour
Operation on
Class or instance
Parameters Non-Parameters
Protected Unprotected Protected Unprotected
getattr Bypass Descriptors* Bypass Descriptors Vanilla* Vanilla
setattr ProtectedError Bypass Descriptors ProtectedError Vanilla
delattr ProtectedError Bypass Descriptors ProtectedError Vanilla
*On instance, getattr should ignore and remove any vars(instance) entry.

Vanilla means that there should be no discernable difference compared to standard classes.

Back to Table of ContentsπŸ‘†

Additional functionalities

Callback on parameters updates

Whenever an instance is assigned a value -- instantiation, set_params, dotted assignment -- the callback

def _on_param_will_be_set(self, attr: str, future_val: object) -> None

is triggered. For example, it can be used to unfit and estimator on specific modifications. As suggested by the name and signature, the callback operates just before the future_val assignment. There is currently no counterpart for parameter deletion. This could be added upon motivated interest.

Back to Table of ContentsπŸ‘†

Instantiation logic with __post_init__

Similarly to dataclasses, a __post_init__ method can be defined to complete instantiation after the initial setting of parameter values. It must have signature

def __post_init__(self, *args: object, **kwargs: object) -> None

and is called as follows by __init__.

# Close equivalent to actual implementation
@protected
def __init__(self, args: list[object] = [], kwargs: dict[str, object] = {}, /, **param_values: object) -> None:
        self.set_params(**param_values)
        self.__post_init__(*args, **kwargs)

Since parameter values are set before __post_init__ is called, they are accessible when it executes. Note that even if a paramclass does not define __post_init__, its bases might, in which case it is used.

Additionally, both @staticmethod and @classmethod decorators are supported decorators for __post_init__ declaration. In other cases, the __signature__ property may fail.

Back to Table of ContentsπŸ‘†

Abstract methods

The base ParamClass already inherits ABC functionalities, so @abstractmethod can be used.

from abc import abstractmethod

class A(ParamClass):
    @abstractmethod
    def next(self): ...
>>> A()
<traceback>
TypeError: Can't instantiate abstract class A with abstract method next

Back to Table of ContentsπŸ‘†

3. Subclassing API πŸ‘©β€πŸ’»

As seen in Additional functionalities, three methods may be implemented by subclasses.

# ===================== Subclasses may override these ======================
def _on_param_will_be_set(self, attr: str, future_val: object) -> None:
    """Call before parameter assignment."""

def __repr__(self) -> str:
    """Show all params, e.g. `A(x=1, z=?)`."""

def __str__(self) -> str:
    """Show all nondefault or missing, e.g. `A(z=?)`."""

# ===================== Subclasses may introduce these =====================
def __post_init__(self, *args: object, **kwargs: object) -> None:
    """Init logic, after parameters assignment."""

Furthermore, as a last resort, developers may occasionally wish to use the following module attributes.

  • IMPL: Current value is "__paramclass_impl_". Use getattr(paramclass or instance, IMPL) to get a NamedTuple instance with annotations and protected fields. Both are mapping proxies of, respectively, (param, annotation) and (protected attribute, owner) pairs. Note that the annotations are fixed at class creation and never updated. The string IMPL acts as special protected key for paramclasses' namespaces, to leave annotations and protected available to users. We purposefully chose a would-be-mangled name to further decrease the odds of natural conflict.
  • MISSING: The object representing the "missing value", used for string representations and checking parameter values.
# Recommended way of using `IMPL`
from paramclasses import IMPL, ParamClass

getattr(ParamClass, IMPL).annotations  # mappingproxy({})
getattr(ParamClass, IMPL).protected    # mappingproxy({'__paramclass_impl_': None, '__dict__': None, '__init__': <class 'paramclasses.paramclasses.RawParamClass'>, '__getattribute__': <class 'paramclasses.paramclasses.RawParamClass'>, '__setattr__': <class 'paramclasses.paramclasses.RawParamClass'>, '__delattr__': <class 'paramclasses.paramclasses.RawParamClass'>, 'set_params': <class 'paramclasses.paramclasses.ParamClass'>, 'params': <class 'paramclasses.paramclasses.ParamClass'>, 'missing_params': <class 'paramclasses.paramclasses.ParamClass'>})
# Works on subclasses and instances too

When subclassing an external UnknownClass, one can check whether it is a paramclass with isparamclass.

from paramclasses import isparamclass

isparamclass(UnknownClass)  # Returns a boolean

Finally, it is possible to subclass RawParamClass directly -- unique parent class of ParamClass --, when set_params, params and missing_params are not necessary. In this case, use signature isparamclass(UnknownClass, raw=True).

Back to Table of ContentsπŸ‘†

4. Advanced πŸ€“

Post-creation protection

It is not allowed and will be ignored with a warning.

class A(ParamClass):
    x: int = 1
>>> A.x = protected(2)  # Assignment should WORK, protection should FAIL
<stdin>:1: UserWarning: Cannot protect attribute 'x' after class creation. Ignored
>>> a = A(); a
A(x=2)                  # Assignment did work
>>> a.x = protected(3)  # Assignment should WORK, protection should FAIL
<stdin>:1: UserWarning: Cannot protect attribute 'x' on instance assignment. Ignored
>>> a.x
3                       # First protection did fail, new assignment did work
>>> del a.x; a
A(x=2)                  # Second protection did fail

Back to Table of ContentsπŸ‘†

Descriptor parameters

TLDR: using descriptors for parameter values is fine if you know what to expect.

import numpy as np

class Operator(ParamClass):
    op: ... = np.cumsum

Operator().op([0, 1, 2])  # array([0, 1, 3])

This behaviour is similar to dataclasses' but is not trivial:

class NonParamOperator:
    op: ... = np.cumsum
>>> NonParamOperator().op([0, 1, 2])  # Should FAIL
<traceback>
TypeError: 'list' object cannot be interpreted as an integer
>>> NonParamOperator().op
<bound method cumsum of <__main__.NonParamOperator object at 0x13a10e7a0>>

Note how NonParamOperator().op is a bound method. What happened here is that since np.cumsum is a data descriptor -- like all function, property or member_descriptor objects for example --, the function np.cumsum(a, axis=None, dtype=None, out=None) interpreted NonParamOperator() to be the array a, and [0, 1, 2] to be the axis.

To avoid this kind of surprises we chose, for parameters only, to bypass the get/set/delete descriptor-specific behaviours, and treat them as usual attributes. Contrary to dataclasses, by also bypassing descriptors for set/delete operations, we allow property-valued parameters, for example.

class A(ParamClass):
    x: property = property(lambda _: ...)  # Should WORK

@dataclass
class B:
    x: property = property(lambda _: ...)  # Should FAIL
>>> A()  # paramclass
A()
>>> B()  # dataclass
<traceback>
AttributeError: can't set attribute 'x'

This should not be a very common use case anyway.

Back to Table of ContentsπŸ‘†

Multiple inheritance

With paramclass bases

Multiple inheritance is not a problem. Default values will be retrieved as expect following the MRO, but there's one caveat: protected attributes should be consistant between the bases. For example, if A.x is not protected while B.x is, one cannot take (A, B) for bases.

class A(ParamClass):
    x: int = 0

class B(ParamClass):
    x: int = protected(1)

class C(B, A): ...  # Should WORK

class D(A, B): ...  # Should FAIL
>>> class C(B, A): ...  # Should WORK
... 
>>> class D(A, B): ...  # Should FAIL
... 
<traceback>
ProtectedError: 'x' protection conflict: 'A', 'B'
Inheriting from non-paramclasses

It is possible to inherit from a mix of paramclasses and non-paramclasses, with the two following limitations.

  1. Because type(ParamClass) only inherits from ABCMeta, non-paramclass bases must be either vanilla classes or abstract classes.
  2. Behaviour is not guaranteed for non-paramclass bases with an IMPL-named attribute -- see Subclassing API.
  3. The MRO of classes created with multiple inheritance should always have all of its paramclasses in front of non-paramclasses (see #28). This is enforced since failing to do so will raise a TypeError:
>>> class A(int, ParamClass): ...
...
<traceback>
TypeError: Invalid method resolution order (MRO) for bases int, ParamClass: nonparamclass 'int' would come before paramclass 'ParamClass'

Back to Table of ContentsπŸ‘†

@protected vs. super()

It is not recommended to use super() inside a @protected method definition, when the protection aims at "locking down" its behaviour. Indeed, one can never assume the MRO of future subclasses will ressemble that of the method-defining class.

For example, picture the following inheritance schemes.

class A(RawParamClass): ...
class B(RawParamClass): ...
class C(B, A): ...

In this situation, the MRO of C would be C -> B -> A -> RawParamClass -> object. As such, if B was to redefine __repr__ using super() and @protected, repr(C()) would call A.__repr__, which can behave arbitrarily despite B.__repr__ being @protected. Instead, it is recommended to call RawParamClass.__repr__ directly.

Back to Table of ContentsπŸ‘†

Using __slots__

Before using __slots__ with ParamClass, please note the following.

  1. Since ParamClass uses __dict__, any paramclass will too.
  2. You cannot slot a previously protected attribute -- since it would require updating its class value.
  3. Since parameters' get/set/delete interactions bypass descriptors, using __slots__ on them will not yield the usual behaviour.
  4. The overhead from ParamClass functionality would nullify any __slots__ optimization in most cases anyway.

Back to Table of ContentsπŸ‘†

Breaking ParamClass protection scheme

There is no such thing as "perfect attribute protection" in Python. As such ParamClass only provides protection against natural behaviour -- and even unnatural to a large extent. Below are some knonwn anti-patterns to break it, representing discouraged behaviour. If you find other elementary ways, please report them in an issue.

  1. Using type.__setattr__/type.__delattr__ directly on paramclasses.
  2. Modifying @protected -- huh?
  3. Modifying or subclassing type(ParamClass) -- requires evil dedication.
  4. Messing with mappingproxy, which is not really immutable.

Back to Table of ContentsπŸ‘†

Type checkers

There are currently some known issues regarding static type checking. The implementation of a mypy plugin may solve these in a not-so-far future. In the mean time, it is advised to check the link to understand false positives and negatives that may occur.

Any contribution regarding this fix is very welcome!

Back to Table of ContentsπŸ‘†

5. Contributing πŸš€

Questions, issues, discussions and pull requests are welcome! Please do not hesitate to contact me.

Developing with uv

The project is developed with uv which simplifies soooo many things!

# Installing `uv` on Linux and macOS
curl -LsSf https://astral.sh/uv/install.sh | sh
# Using `uv` command may require restarting the bash session

After having installed uv, you can independently use all of the following without ever worrying about installing python or dependencies, or creating virtual environments.

uvx ruff check                        # Check linting
uvx ruff format --diff                # Check formatting
uv run mypy                           # Run mypy
uv pip install -e . && uv run pytest  # Run pytest
uv run python                         # Interactive session in virtual environment

Back to Table of ContentsπŸ‘†

6. License βš–οΈ

This package is distributed under the MIT License.

Back to Table of ContentsπŸ‘†

About

Parameter-holding classes with robust subclassing protection

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages