Redpanda Schema Registry

In Redpanda, the messages exchanged between producers and consumers contain raw bytes. Schemas enable producers and consumers to share the information needed to serialize and deserialize those messages. They register and retrieve the schemas they use in the Schema Registry to ensure data verification.

Schemas are versioned, and the registry supports configurable compatibility modes between schema versions. When a producer or a consumer requests to register a schema change, the registry checks for schema compatibility and returns an error for an incompatible change. Compatibility modes can ensure that data flowing through a system is well-structured and easily evolves.

The Schema Registry is built directly into the Redpanda binary. It runs out of the box with Redpanda’s default configuration, and it requires no new binaries to install and no new services to deploy or maintain. You can use it with the Schema Registry API or Redpanda Console.

Schema terminology

Schema: A schema is an external mechanism to describe the structure of data and its encoding. Producer clients and consumer clients use a schema as an agreed-upon format for sending and receiving messages. Schemas enable a loosely coupled, data-centric architecture that minimizes dependencies in code, between teams, and between producers and consumers.

Subject: A subject is a logical grouping for schemas. When data formats are updated, a new version of the schema can be registered under the same subject, allowing for backward and forward compatibility. A subject may have more than one schema version assigned to it, with each schema having a different numeric ID.

Serialization format: A serialization format defines how data is converted into bytes that are transmitted and stored. Serialization, by producers, converts an event into bytes. Redpanda then stores these bytes in topics. Deserialization, by consumers, converts the bytes of arrays back into the desired data format. Redpanda’s Schema Registry supports Avro, Protobuf, and JSON serialization formats.

Redpanda design overview

Every broker allows mutating REST calls, so there’s no need to configure leadership or failover strategies. Schemas are stored in a compacted topic, and the registry uses optimistic concurrency control at the topic level to detect and avoid collisions.

The Schema Registry publishes record metadata to an internal topic, _schemas, as its backend store. By default, _schemas is protected from deletion and configuration changes by Kafka clients. See the kafka_nodelete_topics cluster property.

Redpanda Schema Registry uses the default port 8081.

Schema examples

To experiment with schemas from applications, see the clients in redpanda-labs.

For a basic end-to-end example, the following Protobuf schema contains information about products: a unique ID, name, price, and category. It has a schema ID of 1, and the Topic name strategy, with a topic of Orders. (The Topic strategy is suitable when you want to group schemas by the topics to which they are associated.)

syntax = "proto3";

message Product {
  int32 ProductID = 1;
  string ProductName = 2;
  double Price = 3;
  string Category = 4;
}

The producer then does something like this:

from kafka import KafkaProducer
from productpy import Product  # This imports the prototyped schema

# Create a Kafka producer
producer = KafkaProducer(bootstrap_servers='your_kafka_brokers')

# Create a Product message
product_message = Product(
    ProductID=123,
    ProductName="Example Product",
    Price=45.99,
    Category="Electronics"
)

# Produce the Product message to the "Orders" topic
producer.send('Orders', key='product_key', value=product_message.SerializeToString())

To add an additional field for product variants, like size or color, the new schema (version 2, ID 2) would look like this:

syntax = "proto3";

message Product {
  int32 ProductID = 1;
  string ProductName = 2;
  double Price = 3;
  string Category = 4;
  repeated string Variants = 5;
}

You would want the compatibility setting to accommodate adding new fields without breakage. Adding an optional new field to a schema is inherently backward-compatible. New consumers can process events written with the new schema, and older consumers can ignore it.

JSON Schema

All CRUD operations are supported for the JSON Schema (json-schema), and Redpanda supports all published JSON Schema specifications, which include:

  • draft-04

  • draft-06

  • draft-07

  • 2019-09

  • 2020-12

Limitations

Schemas are held in subjects. Subjects have a compatibility configuration associated with them, either directly specified by a user, or inherited by the default. See PUT /config and PUT/config/{subject} in the Schema Registry API.

If you have inserted a second schema into a subject where the compatibility level is anything but NONE, then any JSON Schema containing the following items are rejected:

  • $ref

  • $defs (definitions prior to draft 2019-09)

  • dependentSchemas / dependentRequired (dependencies prior to draft 2019-09)

  • prefixItems

Consequently, you cannot structure a complex schema using these features.

Additionally, you cannot have schema ID validation with JSON schemas if the subject name strategy is not TopicNameStrategy.