Class tasks

Stateful GPU services: load once, serve many. Lifecycle hooks, probes, and concurrency.

Define a class task

Decorate a class with @app.cls. Mark methods you want to call remotely with @gw.method(). The class is instantiated once per container; state persists across calls to the same instance.

python
1import gworker_client as gw2 3app = gw.App("inference-server")4 5@app.cls(gpu=gw.Gpu("A100"), min_workers=1)6class Embedder:7    @gw.on_enter()8    def setup(self):9        from sentence_transformers import SentenceTransformer10        self.model = SentenceTransformer("all-MiniLM-L6-v2")11 12    @gw.method()13    def embed(self, texts: list[str]) -> list[list[float]]:14        return self.model.encode(texts).tolist()

Lifecycle hooks

@gw.on_enter() runs once at container startup before the first request. @gw.on_exit() runs on graceful shutdown. Use on_enter to load model weights so every request hits a warm model.

python
1@app.cls(gpu=gw.Gpu("H100"))2class LLMServer:3    @gw.on_enter()4    async def load(self):5        import torch6        self.model = load_model("/models/llama-3")7        self.model.cuda()8 9    @gw.on_exit()10    async def unload(self):11        del self.model12        torch.cuda.empty_cache()13 14    @gw.method()15    async def generate(self, prompt: str) -> str:16        return self.model.generate(prompt)

Concurrent inputs

Let one container handle multiple inputs simultaneously. Good for I/O-bound tasks or inference servers where GPU utilisation is otherwise low.

python
1@app.cls(gpu=gw.Gpu("A100"))2@gw.concurrent(max_inputs=16)3class EmbedServer:4    @gw.on_enter()5    def load(self):6        self.model = load_embedding_model()7 8    @gw.method()9    async def embed(self, text: str) -> list[float]:10        return await self.model.aencode(text)

Batching

Coalesce concurrent inputs into a single GPU call. max_batch_size caps the batch; wait_ms is the coalescing window. The method receives a list and must return a list of the same length.

python
1@app.cls(gpu=gw.Gpu("A100"))2@gw.concurrent(max_inputs=128)3class BatchEmbedder:4    @gw.method()5    @gw.batched(max_batch_size=64, wait_ms=20)6    def embed(self, texts: list[str]) -> list[list[float]]:7        return self.model.encode(texts).tolist()

Health probes

Attach a liveness or readiness probe to a method. The platform calls it on a schedule; failures trigger a restart (liveness) or traffic drain (readiness).

python
1@app.cls(2    gpu=gw.Gpu("A100"),3    probes=[gw.Probe.on("health", kind="liveness", period_s=30, failures=3)],4)5class Model:6    @gw.method()7    def health(self) -> dict:8        return {"ok": self.model is not None}9 10    @gw.method()11    def predict(self, x: list[float]) -> float: ...

Runtime parameters

Mark class attributes with @gw.parameter() to inject them at call time without redeploying. Pass defaults for local development.

python
1@app.cls(gpu=gw.Gpu("A100"))2class Classifier:3    threshold: float = gw.parameter(default=0.5)4 5    @gw.method()6    def classify(self, text: str) -> str:7        score = self.model.predict(text)8        return "positive" if score > self.threshold else "negative"9 10# Call with a custom threshold11result = Classifier(threshold=0.7).classify.remote("great product!")