Stop Writing `__init__` Methods

[Source](https://blog.glyph.im/2025/04/stop-writing-init-methods.html) ## The History Before dataclasses were added to Python in version 3.7 — in June of 2018 — the `__init__` special method had an important use. If you had a class representing a data structure — for example a `2DCoordinate`, with `x` and `y` attributes — you would want to be able to construct it as `2DCoordinate(x=1, y=2)`, which would require you to add an `__init__` method with `x` and `y` parameters. The other options available at the time all had pretty bad problems: 1. You could remove `2DCoordinate` from your public API and instead expose a `make_2d_coordinate` function and make it non-importable, but then how would you document your return or parameter types? 2. You could document the `x` and `y` attributes and make the user assign each one themselves, but then `2DCoordinate()` would return an invalid object. 3. You could default your coordinates to 0 with class attributes, and while that would fix the problem with option 2, this would now require all `2DCoordinate` objects to be not just mutable, but mutated at every call site. 4. You could fix the problems with option 1 by adding a new *abstract* class that you could expose in your public API, but this would explode the complexity of every new public class, no matter how simple. To make matters worse, `typing.Protocol` didn’t even arrive until Python 3.8, so, in the pre-3.7 world this would condemn you to using concrete inheritance and declaring multiple classes even for the most basic data structure imaginable. Also, an `__init__` method *that does nothing but assign a few attributes* doesn’t have any significant problems, so it is an obvious choice in this case. Given all the problems that I just described with the alternatives, it makes sense that it became the obvious *default* choice, in most cases. However, by accepting “define a custom `__init__` ” as the *default* way to allow users to create your objects, we make a habit of beginning *every* class with a pile of *arbitrary code* that gets executed every time it is instantiated. Wherever there is arbitrary code, there are arbitrary problems. ## The Problems Let’s consider a data structure more complex than one that simply holds a couple of attributes. We will create one that represents a reference to some I/O in the external world: a `FileReader`. Of course Python has [its own open-file object abstraction](https://docs.python.org/3.13/library/io.html#io.FileIO), but I will be ignoring that for the purposes of the example. Let’s assume a world where we have the following functions, in an imaginary `fileio` module: - `open(path: str) -> int` - `read(fileno: int, length: int)` - `close(fileno: int)` Our hypothetical `fileio.open` returns an integer representing a file descriptor [^1], `fileio.read` allows us to read `length` bytes from an open file descriptor, and `fileio.close` closes that file descriptor, invalidating it for future use. With the habit that we have built from writing thousands of `__init__` methods, we might want to write our `FileReader` class like this: ``` class FileReader: def __init__(self, path: str) -> None: self._fd = fileio.open(path) def read(self, length: int) -> bytes: return fileio.read(self._fd, length) def close(self) -> None: fileio.close(self._fd) ``` For our initial use-case, this is fine. Client code creates a `FileReader` by doing something like `FileReader("./config.json")`, which always creates a `FileReader` that maintains its file descriptor `int` internally as private state. This is as it should be; we don’t want user code to see or mess with `_fd`, as that might violate `FileReader` ’s invariants. All the necessary work to construct a valid `FileReader` — i.e. the call to `open` — is always taken care of for you by `FileReader.__init__`. However, additional requirements will creep in, and as they do,`FileReader.__init__` becomes increasingly awkward. Initially we only care about `fileio.open`, but later, we may have to deal with a library that has its own reasons for managing the call to `fileio.open` by itself, and wants to give us an `int` that we use as our `_fd`, we now have to resort to weird workarounds like: ``` def reader_from_fd(fd: int) -> FileReader: fr = object.__new__(FileReader) fr._fd = fd return fr ``` Now, all those nice properties that we got from trying to force object construction to give us a valid object are gone. `reader_from_fd` ’s type signature, which takes a plain `int`, has no way of even suggesting to client code how to ensure that it has passed in the right *kind* of `int`. Testing is much more of a hassle, because we have to patch in our own copy of `fileio.open` any time we want an instance of a `FileReader` in a test without doing any real-life file I/O, even if we could (for example) share a single file descriptor among many `FileReader` s for testing purposes. All of this also assumes a `fileio.open` that is *synchronous*. Although for literal file I/O this is more of a [hypothetical](https://stackoverflow.com/questions/87892/what-is-the-status-of-posix-asynchronous-i-o-aio) concern, there are many types of networked resource which are really only available via an asynchronous (and thus: potentially slow, potentially error-prone) API. If you’ve ever found yourself wanting to type `async def __init__(self): ...` then you have seen this limitation in practice. Comprehensively describing *all* the possible problems with this approach would end up being a book-length treatise on a philosophy of object oriented design, so I will sum up by saying that the *cause* of all these problems is the same: we are inextricably linking the act of *creating a data structure* with *whatever side-effects are* *most often* *associated* with that data structure. If they are “often” associated with it, then by definition they are not “always” associated with it, and all the cases where they *aren’t* associated become unweildy and potentially broken. Defining an `__init__` is an anti-pattern, and we need a replacement for it. ## The Solutions I believe this tripartite assemblage of design techniques will address the problems raised above: - using `dataclass` to define attributes, - replacing behavior that previously would have previously been in `__init__` with a new classmethod that does the same thing, and - using precise types to describe what a valid instance looks like. To begin, let’s refactor `FileReader` into a `dataclass`. This does get us an `__init__` method, but it *won’t* be one an arbitrary one we define ourselves; it will get the useful constraint enforced on it that it will just assign attributes. ``` @dataclass class FileReader: _fd: int def read(self, length: int) -> bytes: return fileio.read(self._fd, length) def close(self) -> None: fileio.close(self._fd) ``` Except... oops. In fixing the problems that we created with our custom `__init__` that calls `fileio.open`, we have re-introduced several problems that it solved: 1. We have removed all the convenience of `FileReader("path")`. Now the user needs to import the low-level `fileio.open` again, making the most common type of construction both more verbose and less discoverable; if we want users to know how to build a `FileReader` in a practical scenario, we will have to add something in our documentation to point at a separate module entirely. 2. There’s no enforcement of the validity of `_fd` as a file descriptor; it’s just some integer, which the user could easily pass an incorrect instance of, with no error. In isolation, `dataclass` by itself can’t solve all our problems, so let’s add in the second technique. ### Using classmethod factories to create objects We don’t want to require any additional imports, or require users to go looking at any other modules — or indeed anything other than `FileReader` itself — to figure out how to create a `FileReader` for its intended usage. Luckily we have a tool that can easily address all of these concerns at once:`@classmethod`. Let’s define a `FileReader.open` class method: ``` from typing import Self @dataclass class FileReader: _fd: int @classmethod def open(cls, path: str) -> Self: return cls(fileio.open(path)) ``` Now, your callers can replace `FileReader("path")` with `FileReader.open("path")`, and get all the same benefits. Additionally, if we needed to `await fileio.open(...)`, and thus we needed its signature to be `@classmethod async def open`, we are freed from the constraint of `__init__` as a special method. There is nothing that would prevent a `@classmethod` from being `async`, or indeed, from having any other modification to its return value, such as returning a `tuple` of related values rather than just the object being constructed. ### Using NewType to address object validity Next, let’s address the slightly trickier issue of enforcing object validity. Our type signature calls this thing an `int`, and indeed, that is unfortunately what the lower-level `fileio.open` gives us, and that’s beyond our control. But for our *own* purposes, we can be more precise in our definitions, using [`NewType`](https://docs.python.org/3.13/library/typing.html#newtype): ``` from typing import NewType FileDescriptor = NewType("FileDescriptor", int) ``` There are a few different ways to address the underlying library, but for the sake of brevity and to illustrate that this can be done with *zero* run-time overhead, let’s just *insist* to Mypy that we have versions of `fileio.open`,`fileio.read`, and `fileio.write` which actually already take `FileDescriptor` integers rather than regular ones. ``` from typing import Callable _open: Callable[[str], FileDescriptor] = fileio.open # type:ignore[assignment] _read: Callable[[FileDescriptor, int], bytes] = fileio.read _close: Callable[[FileDescriptor], None] = fileio.close ``` We do of course have to slightly adjust `FileReader`, too, but the changes are very small. Putting it all together, we get: ``` from typing import Self @dataclass class FileReader: _fd: FileDescriptor @classmethod def open(cls, path: str) -> Self: return cls(_open(path)) def read(self, length: int) -> bytes: return _read(self._fd, length) def close(self) -> None: _close(self._fd) ``` Note that the main technique here is not *necessarily* using `NewType` specifically, but rather aligning an instance’s property of “has all attributes set” as closely as possible with an instance’s property of “fully valid instance of its class”; `NewType` is just a handy tool to enforce any necessary constraints on the places where you need to use a primitive type like `int`,`str` or `bytes`. ## In Summary - The New Best Practice From now on, when you’re defining a new Python class: - Make it a dataclass [^2]. - Use its default `__init__` method [^3]. - Add `@classmethod` s to provide your users convenient and discoverable ways to build your objects. - Require that *all* dependencies be satisfied by attributes, so you always start with a valid object. - Use `typing.NewType` to enforce any constraints on primitive data types (like `int` and `str`) which might have magical external attributes, like needing to come from a particular library, needing to be random, and so on. If you define all your classes this way, you will get all the benefits of a custom `__init__` method: - All consumers of your data structures will receive valid objects, because an object with all its attributes populated correctly is inherently valid. - Users of your library will be presented with convenient ways to create your objects that do as much work as is necessary to make them easy to use, and they can discover these just by looking at the methods on your class itself. Along with some nice new benefits: - You will be future-proofed against new requirements for different ways that users may need to construct your object. - If there are already multiple ways to instantiate your class, you can now give each of them a meaningful name; no need to have monstrosities like `def __init__(self, maybe_a_filename: int | str | None = None):` - Your test suite can always construct an object by satisfying all its dependencies; no need to monkey-patch anything when you can always call the type and never do any I/O or generate any side effects. Before dataclasses, it was always a bit weird that such a basic feature of the Python language — giving data to a data structure to make it valid — required overriding a method with 4 underscores in its name. `__init__` stuck out like a sore thumb. Other such methods like `__add__` or even `__repr__` were inherently customizing esoteric attributes of classes. For many years now, that historical language wart has been resolved. `@dataclass`, `@classmethod`, and `NewType` give you everything you need to build classes which are convenient, idiomatic, flexible, testable, and robust. --- ## Acknowledgments Thank you to [my patrons](https://blog.glyph.im/pages/patrons.html) who are supporting my writing on this blog. If you like what you’ve read here and you’d like to read more of it, or you’d like to support my [various open-source endeavors](https://github.com/glyph/), you can [support my work as a sponsor](https://blog.glyph.im/pages/patrons.html)! I am also [available for consulting work](https://blog.glyph.im/2025/04/) if you think your organization could benefit from expertise on topics like “but what *is* a ‘class’, really?”. --- [^1]: If you aren’t already familiar, a “file descriptor” is an integer which has meaning only within your program; you tell the operating system to open a file, it says “I have opened file 7 for you”, and then whenever you refer to “7” it is that file, until you `close(7)`. [^2]: Or an [attrs class](https://blog.glyph.im/2016/08/attrs.html), if you’re nasty. [^3]: Unless you have a really good reason to, of course. Backwards compatibility, or compatibility with another library, might be good reasons to do that. Or certain types of data-consistency validation which cannot be expressed within the type system. The most common example of these would be a class that requires consistency *between* two different fields, such as a “range” object where `start` must always be less than `end`. There are always exceptions to these types of rules. Still, it’s pretty much *never* a good idea to do any I/O in `__init__`, and nearly all of the remaining stuff that may *sometimes* be a good idea in edge-cases can be achieved with a [`__post_init__`](https://docs.python.org/3.13/library/dataclasses.html#dataclasses.__post_init__) rather than writing a literal `__init__`.