Skip to content
AstroPaper
Go back

Python dataclass Explained

Edit page

What is @dataclass?

@dataclass is a decorator that automatically generates boilerplate methods for a class — primarily __init__, but also __repr__ and __eq__.

It is syntax sugar. The two versions below are functionally identical:

# With @dataclass
from dataclasses import dataclass, field
import threading

@dataclass
class ToolUseContext:
    messages: list
    cwd: str
    abort_signal: threading.Event = field(default_factory=threading.Event)
    options: dict = field(default_factory=dict)
# Plain class — equivalent behaviour
import threading

class ToolUseContext:
    def __init__(self, messages, cwd):
        self.messages = messages
        self.cwd = cwd
        self.abort_signal = threading.Event()
        self.options = {}

Both let you write:

ctx = ToolUseContext(messages=[], cwd="/home/user/project")

@dataclass just saves you from writing the __init__ by hand.

What does it buy you?

FeatureWhat it generates
__init__Constructor with one parameter per annotated field
__repr__Readable string like ToolUseContext(messages=[], cwd='.')
__eq__Equality comparison by field values (opt-in)

For most use cases, __init__ and __repr__ are the only ones that matter.

The field() function

When a field needs a default value that is mutable (a list, dict, or any object), you cannot write:

# WRONG — one Event shared across ALL instances
abort_signal: threading.Event = threading.Event()

Python evaluates the default once at class definition time, so every instance would share the same object. Instead, field(default_factory=...) tells @dataclass to call the factory fresh for each new instance:

abort_signal: threading.Event = field(default_factory=threading.Event)

This generates:

self.abort_signal = threading.Event()  # new Event per instance

A real example: ToolUseContext

In an AI agent, each user turn needs a shared execution environment — the current directory, the conversation history, a cancellation signal. Rather than passing these as separate arguments to every function:

# Without ToolUseContext — signature bloat
def call(self, args, messages, cwd, abort_signal, options):
    ...

They are bundled into one object:

# With ToolUseContext — stable signature
def call(self, args: dict, ctx: ToolUseContext) -> ToolResult:
    ...

When you later need to add a new field to the environment, you add it to ToolUseContext once. Every tool that needs it reads ctx.whatever. Tools that don’t need it ignore it. The call() signature never changes.

@dataclass is the right tool here because ToolUseContext is a pure data holder — it has no behaviour of its own, just fields. That is exactly the case @dataclass was designed for.

When to use @dataclass

✅ Use it when the class is a data holder — its purpose is to group related fields together, and it has little or no behaviour.

✅ Use it when you want clean __repr__ output for debugging without writing it yourself.

❌ Don’t use it when the class has complex initialisation logic that a generated __init__ cannot express.

❌ Don’t use it just because it looks shorter — a plain class is equally valid and sometimes clearer.

Summary

@dataclass reads the class-level type annotations and generates an __init__ (and __repr__) for you. It is syntax sugar — no new runtime behaviour, no magic. The plain-class equivalent is always available if you need more control. Use field(default_factory=...) any time a default value is mutable.


Edit page
Share this post on:

Previous Post
Terminal, TUI, and Python — A Concept Guide
Next Post
uv: The Modern Python Package and Project Manager