iOS 26.4, now in Release Candidate, introduces improved context window management for Apple's Foundation Models, helping developers work with the 4096-token context window limit. This encourages treating the context window as a constrained resource, which requires actively managing it like memory in a low-resource system to optimize its usage.
As with most large language models, the context window is a critical resource used to hold system instructions, user prompts, and model responses. Because Apple's Foundation Models run on-device, they offer a relatively small context window which can fill up quickly, especially in chat-like sessions where user prompts and LLM's responses continuously accumulate.
In such cases, the framework throws an .exceededContextWindowSize error, and the LLM won't be able to respond within the same session. To recover from this error, developers need to start a new session and reinitialize its state so it can effectively carry on with the existing workflow without impairing user experience.
In a previous technical note, Apple outlined practical strategies for developers to proactively deal with the context window limitation, such as splitting large tasks into multiple LLM sessions, asking the model to generate shorter answers, trimming prompts by summarizing them or retaining only the most relevant turns, and using tool calling efficiently.
To help developers track how the context window is being used, iOS 26.4 introduced a new contextSize property on SystemLanguageModel, which returns the available context capacity, along with a tokenCount(for:) method to measure how many token a given input consumes. While the current maximum is 4096 tokens, contextSize removes the need to hardcode that limit and tokenCount(for:) is the foundation for token bookkeeping, allowing apps to adapt dynamically.
Knowing the context window size and being able to calculate token consumption are essential, but they don't solve all the problems for developers, since managing token consumption is not a trivial task. In a practical article, Artem Novichkov demonstrates an effective approach.
Artem points out that you must account for all components contributing to the context, including the system prompt and user instructions, but also how tool usage affects the context window size, which can be surprising:
When you use tools, their definitions (name, description, and argument schema) are serialized and sent alongside your instructions. This increases the token count significantly.
Note that Artem refers to the tokenUsage(for:) method in its article, which appears to have been renamed to tokenCount(for:) in the latest RC release. He also highlights that these new additions to the Foundation Models framework are marked with [@backDeployed](https://www.hackingwithswift.com/swift/5.8/function-back-deployment)(before: iOS 26.4, macOS 26.4, visionOS 26.4), making them available on all iOS versions that support the framework.