Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ


Choose your language

InfoQ Homepage News GraphQL Syntax Used for a Novel Approach to Schema Validation and Code Generation

GraphQL Syntax Used for a Novel Approach to Schema Validation and Code Generation

This item in japanese

Nav Inc. has created an open-source schema definition and code generator that uses GraphQL syntax to define events and message formats. GraphQL was chosen for its expressiveness and familiarity among developers, but it is only used for its syntax; the Nav Schema Architecture (NSA) does not use the GraphQL runtime.

Using GraphQL allows a contract developer to describe both the data model and message format at the same time, rather than needing two sets of semantics. This is useful when an attribute may be optional on the underlying data model, but required when that model is used in a specific message.

The primary purpose of NSA is to generate code and schemas in multiple languages, all based on the root definition using GraphQL. The outputs can be other schema languages, such as protobuf or JSON Schema, or code, with Go, Ruby, and Python currently supported.

The benefit of a common data model comes from the ability to easily disseminate its implementation across multiple teams and services. A build pipeline will watch for schema changes on a feature branch, then launch a secondary pipeline to generate the output for all target languages. That output is then committed back to the feature branch, where a developer can review the changes before merging to the main branch. All relevant, language-specific output packages are rebuilt, versioned and tagged.

InfoQ met with some of the developers at Nav of the project to better understand the problems they were trying to solve and the benefits they've seen from this approach.

InfoQ: Contract-first development is not a new idea, but we'll more often see OpenAPI and JSON Schema used to define the contract. What drove your decision to use GraphQL syntax as the primary source of truth for the contract, and then derive contracts from it?

Nav development team: There are a few reasons why we decided to use GraphQL. The difference between GraphQL and other systems like OpenAPI and JSONSchema is that GraphQL contains means to define both a common data model and message schema, two hemispheres of the same problem. An effective system must allow an easy way to define both. GraphQL is a payload description language that solves the problem of defining payloads with validation rules and message schemas in a single Domain Specific Language. The language includes a GraphQL-based type system just like any Interface Definition Language. This type system has support for items such as scalars, objects, enumerations, and basic validation for values of those types. We use this type system to define payloads and custom validation rules (e.g. data formats, ranges of allowable values, regex matching, and required attributes). A message contract is just a message schema definition that is based on a payload type. When defining a message contract, one can choose which fields to include in the message contract from the payload type.

Another reason is that GraphQL syntax is human readable and much simpler to work with compared to JSON Schema. This facilitates the communication between teams.

We use NSA to generate language specific message structures as well as JSON and Protobuf schemas all from a single GraphQL Common Information Model. Therefore, in addition to code generation, NSA is being used to transform GraphQL to JSON/Protobuf schemas.

InfoQ: Is your system architecture primarily using async messaging, or is it request-response? Would NSA be applicable to either approach?

Nav development team: Our system architecture currently leverages the NSA project asynchronously, publishing events to AWS Eventbridge and consuming from AWS Simple Queue Service. The same NSA output code is used to validate messages in producers before serializing to Eventbridge messages, and also to validate deserialized messages from SQS in consumers.

However, NSA could just as easily be used within request/response systems. As with AWS Eventbridge and SQS, NSA output structures can be serialized to and from JSON or any other structured data format. In fact, one output target of NSA is Google’s Protocol Buffers.

NSA places the emphasis on validation, decoupled from endpoint management. There are no references to endpoints, subscribers, or publishers in NSA. Output code from NSA can be used by any adapter which itself manages the method of transmission.

InfoQ: What other designs had you considered, and how did you decide this was the best approach? Specifically, did you look at using OpenAPI/AsyncAPI or protobuf as the syntax for code generation?

Nav development team: In our current architecture, there is no need to utilize redundant asynchronous tooling such as AsyncAPI.

AsyncAPI can have any message payloads, so the NSA generated output can be used as an AsyncAPI message schema. We indirectly use Protobuf message definitions as an output target of NSA.

AsyncAPI attempts to handle the transports which is unnecessary in conjunction with AWS EventBridge. Furthermore, coupling validation with transportation logic would convolute our system and keeping a separation of concerns makes development easier.

InfoQ: Are the GraphQL schemas stored in separate repos, or are they with one of the producers or consumers?

Nav development team: The GraphQL schemas are currently stored in the same repository as the processor and the subsequently generated code. Because the generated code concerns itself only with message validation, it is used as a dependency by many libraries and applications within Nav (be they producer or consumer or a simple documentation tool).

While our project lives as a monorepo, this need not be the case. One could separate their project into multiple repos by responsibility, one or more repos could contain GraphQL and its type extensions that are eventually merged into a single schema as parser input. Another repo could house the parser itself, which could connect with one or many code generation repos as submodules. A fourth layer of repos could contain the generated code, one repo per language, with all of the necessary validation, testing, and packaging logic. Finally, these packages which contain no logic around transmission mechanisms could be consumed by client libraries.

Developers from Nav who participated in this discussion included Daniel Zemichael, Michal Scienski, Jovon McCloud, and Jeff Warner.

Editor's note: An earlier version of this article contained partial responses to the questions. The responses were updated on 2022-05-07. InfoQ regrets this error.

About the Author

Rate this Article